fix(instagram-embed): use facebookexternalhit UA & don't cache empty shells#806
Conversation
Googlebot is IP-verified by Instagram, so the proxy returned an empty JS shell (just splash-screen + has-finished-comet-page) from hosts outside Google's IP ranges, e.g. Cloudflare/Vercel. facebookexternalhit is Meta's own crawler and is not IP-verified. Fixes #794
Instagram serves a JS-only shell (splash-screen + has-finished-comet-page, no post markup) when it can't render server-side — removed/private posts, unverified bot UAs. Previously cached for 10min, hiding the real post even after upstream recovered. Throw inside the cached function on shell detection so nitro skips the write and the next request refetches.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
commit: |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughReplaces the previous cached embed fetcher with a defineCachedFunction-backed fetch that $fetches Instagram embed HTML, builds a header-aware cache key, runs isEmbedShell(html) to detect JS-only shell pages, and throws a 502 for shells so they are not cached. Cache name/version, maxAge, swr, and staleMaxAge were updated. The embed request User-Agent was changed to Meta's facebookexternalhit/... while keeping Accept: text/html. Tests for isEmbedShell were added. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
packages/script/src/runtime/server/instagram-embed.ts (2)
28-45: ⚡ Quick winVary the embed cache key on headers too.
This fetcher takes caller-supplied headers, but
getKeyonly keys onurl. Since Instagram's response is explicitly UA-dependent here, different header sets can collide and reuse the wrong cached HTML.Suggested tweak
{ name: 'nuxt-scripts-instagram-embed', maxAge: 600, swr: true, staleMaxAge: 600, - getKey: (url: string) => url, + getKey: (url: string, headers: Record<string, string>) => { + const parts = [url] + for (const [key, value] of Object.entries(headers).sort(([a], [b]) => a.localeCompare(b))) + parts.push(`${key}=${value}`) + return parts.join('|') + }, },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/script/src/runtime/server/instagram-embed.ts` around lines 28 - 45, The cache key for cachedEmbedFetch currently only uses the URL (getKey), causing collisions when callers pass different headers; update the getKey implementation used by defineCachedFunction (referencing cachedEmbedFetch and its getKey) to incorporate headers as well — e.g., serialize the headers into a stable string (JSON.stringify with sorted keys or a utility that produces a deterministic header fingerprint) and include that with the URL to form the cache key so responses vary by both URL and headers.
18-23: ⚡ Quick winMatch post-content classes inside full class lists.
HAS_POST_CONTENT_REonly matches when the entireclassvalue is exactlyEmbedorEmbeddedMedia. That misses multi-class elements and.EmbeddedMediaImage, which Lines 197-198 already treat as valid embed markup.Suggested tweak
-const HAS_POST_CONTENT_RE = /class="(?:Embed|EmbeddedMedia)"/ +const HAS_POST_CONTENT_RE = /class="[^"]*\b(?:Embed|EmbeddedMedia|EmbeddedMediaImage)\b[^"]*"/🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/script/src/runtime/server/instagram-embed.ts` around lines 18 - 23, HAS_POST_CONTENT_RE currently only matches when the entire class attribute equals "Embed" or "EmbeddedMedia", so update the regex used in HAS_POST_CONTENT_RE (and used by isEmbedShell) to detect those tokens inside multi-class lists and to include "EmbeddedMediaImage"; e.g., change it to search the class attribute for word-boundary-separated tokens (Embed, EmbeddedMedia, EmbeddedMediaImage) rather than exact-match the whole value so elements with multiple classes are correctly recognized as post content.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/script/src/runtime/server/instagram-embed.ts`:
- Around line 28-45: The cache key for cachedEmbedFetch currently only uses the
URL (getKey), causing collisions when callers pass different headers; update the
getKey implementation used by defineCachedFunction (referencing cachedEmbedFetch
and its getKey) to incorporate headers as well — e.g., serialize the headers
into a stable string (JSON.stringify with sorted keys or a utility that produces
a deterministic header fingerprint) and include that with the URL to form the
cache key so responses vary by both URL and headers.
- Around line 18-23: HAS_POST_CONTENT_RE currently only matches when the entire
class attribute equals "Embed" or "EmbeddedMedia", so update the regex used in
HAS_POST_CONTENT_RE (and used by isEmbedShell) to detect those tokens inside
multi-class lists and to include "EmbeddedMediaImage"; e.g., change it to search
the class attribute for word-boundary-separated tokens (Embed, EmbeddedMedia,
EmbeddedMediaImage) rather than exact-match the whole value so elements with
multiple classes are correctly recognized as post content.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 66b9ef8a-0eb2-493c-9b76-5df3dd2fb876
📒 Files selected for processing (1)
packages/script/src/runtime/server/instagram-embed.ts
Move the shell-response check into the shared utils module and add unit tests covering: pure shell, real post that also contains the comet sentinel, real post with Embed wrapper, and unrelated HTML.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/script/src/runtime/server/utils/instagram-embed.ts`:
- Around line 9-13: The current HAS_POST_CONTENT_RE only matches exact
class="Embed" or class="EmbeddedMedia" which misses class lists and
single-quoted attributes causing isEmbedShell (which uses SHELL_BODY_RE and
HAS_POST_CONTENT_RE) to misclassify embeds; update HAS_POST_CONTENT_RE to match
a class= attribute quoted with single or double quotes containing word
boundaries for Embed or EmbeddedMedia anywhere in the class list (e.g., match
class=(["'])...\\b(Embed|EmbeddedMedia)\\b...\\1) so isEmbedShell correctly
detects post content even when classes are combined or quoted differently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 423020fc-827c-46b2-9050-e0221aea4a71
📒 Files selected for processing (3)
packages/script/src/runtime/server/instagram-embed.tspackages/script/src/runtime/server/utils/instagram-embed.tstest/unit/instagram-embed.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/script/src/runtime/server/instagram-embed.ts
- Bump cache name to `nuxt-scripts-instagram-embed-v2` to evict any empty-shell entries cached under v1 before the fix. - Include headers in the cache key — Instagram's response is UA-dependent, so different header sets must not share a cached entry. - Broaden `HAS_POST_CONTENT_RE` to match Embed/EmbeddedMedia/EmbeddedMediaImage as tokens inside any class list (single- or double-quoted), and accept single-quoted shell sentinels. Addresses CodeRabbit feedback on #806.
🔗 Linked issue
Resolves #794
❓ Type of change
📚 Description
The Instagram embed on scripts.nuxt.com was rendering as an empty box. Instagram IP-verifies the Googlebot UA, so requests from hosts outside Google's ranges (Cloudflare/Vercel) got a JS-only shell instead of the SSR'd post — and that shell was then served from the 10min cache.
Switched the proxy fetch UA to
facebookexternalhit/1.1(Meta's own crawler, not IP-verified), and added a shell-response check that throws inside the cached function so nitro skips the write — a degraded response no longer poisons the cache.