Skip to content

fix(instagram-embed): use facebookexternalhit UA & don't cache empty shells#806

Merged
harlan-zw merged 4 commits into
mainfrom
fix/instagram-embed-ua
May 27, 2026
Merged

fix(instagram-embed): use facebookexternalhit UA & don't cache empty shells#806
harlan-zw merged 4 commits into
mainfrom
fix/instagram-embed-ua

Conversation

@harlan-zw
Copy link
Copy Markdown
Collaborator

@harlan-zw harlan-zw commented May 27, 2026

🔗 Linked issue

Resolves #794

❓ Type of change

  • 📖 Documentation
  • 🐞 Bug fix
  • 👌 Enhancement
  • ✨ New feature
  • 🧹 Chore
  • ⚠️ Breaking change

📚 Description

The Instagram embed on scripts.nuxt.com was rendering as an empty box. Instagram IP-verifies the Googlebot UA, so requests from hosts outside Google's ranges (Cloudflare/Vercel) got a JS-only shell instead of the SSR'd post — and that shell was then served from the 10min cache.

Switched the proxy fetch UA to facebookexternalhit/1.1 (Meta's own crawler, not IP-verified), and added a shell-response check that throws inside the cached function so nitro skips the write — a degraded response no longer poisons the cache.

harlan-zw added 2 commits May 27, 2026 18:05
Googlebot is IP-verified by Instagram, so the proxy returned an empty
JS shell (just splash-screen + has-finished-comet-page) from hosts
outside Google's IP ranges, e.g. Cloudflare/Vercel. facebookexternalhit
is Meta's own crawler and is not IP-verified.

Fixes #794
Instagram serves a JS-only shell (splash-screen + has-finished-comet-page,
no post markup) when it can't render server-side — removed/private posts,
unverified bot UAs. Previously cached for 10min, hiding the real post even
after upstream recovered.

Throw inside the cached function on shell detection so nitro skips the
write and the next request refetches.
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
scripts-playground Ready Ready Preview, Comment May 27, 2026 8:43am

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 27, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@nuxt/scripts@806

commit: 6175d57

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5fdaba3d-a08d-46ac-9ffa-2027e0283269

📥 Commits

Reviewing files that changed from the base of the PR and between 77eb5c6 and 6175d57.

📒 Files selected for processing (3)
  • packages/script/src/runtime/server/instagram-embed.ts
  • packages/script/src/runtime/server/utils/instagram-embed.ts
  • test/unit/instagram-embed.test.ts

📝 Walkthrough

Walkthrough

Replaces the previous cached embed fetcher with a defineCachedFunction-backed fetch that $fetches Instagram embed HTML, builds a header-aware cache key, runs isEmbedShell(html) to detect JS-only shell pages, and throws a 502 for shells so they are not cached. Cache name/version, maxAge, swr, and staleMaxAge were updated. The embed request User-Agent was changed to Meta's facebookexternalhit/... while keeping Accept: text/html. Tests for isEmbedShell were added.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: switching to a Meta crawler UA and preventing empty shell caching.
Description check ✅ Passed The description explains the bug (Googlebot IP-verification issue), the solution (facebookexternalhit UA + shell detection), and references the linked issue.
Linked Issues check ✅ Passed The PR addresses issue #794 by fixing the Instagram embed rendering through UA switching and shell-response prevention.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the Instagram embed issue: UA update, shell detection, cache configuration, and comprehensive tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/instagram-embed-ua

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
packages/script/src/runtime/server/instagram-embed.ts (2)

28-45: ⚡ Quick win

Vary the embed cache key on headers too.

This fetcher takes caller-supplied headers, but getKey only keys on url. Since Instagram's response is explicitly UA-dependent here, different header sets can collide and reuse the wrong cached HTML.

Suggested tweak
   {
     name: 'nuxt-scripts-instagram-embed',
     maxAge: 600,
     swr: true,
     staleMaxAge: 600,
-    getKey: (url: string) => url,
+    getKey: (url: string, headers: Record<string, string>) => {
+      const parts = [url]
+      for (const [key, value] of Object.entries(headers).sort(([a], [b]) => a.localeCompare(b)))
+        parts.push(`${key}=${value}`)
+      return parts.join('|')
+    },
   },
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/script/src/runtime/server/instagram-embed.ts` around lines 28 - 45,
The cache key for cachedEmbedFetch currently only uses the URL (getKey), causing
collisions when callers pass different headers; update the getKey implementation
used by defineCachedFunction (referencing cachedEmbedFetch and its getKey) to
incorporate headers as well — e.g., serialize the headers into a stable string
(JSON.stringify with sorted keys or a utility that produces a deterministic
header fingerprint) and include that with the URL to form the cache key so
responses vary by both URL and headers.

18-23: ⚡ Quick win

Match post-content classes inside full class lists.

HAS_POST_CONTENT_RE only matches when the entire class value is exactly Embed or EmbeddedMedia. That misses multi-class elements and .EmbeddedMediaImage, which Lines 197-198 already treat as valid embed markup.

Suggested tweak
-const HAS_POST_CONTENT_RE = /class="(?:Embed|EmbeddedMedia)"/
+const HAS_POST_CONTENT_RE = /class="[^"]*\b(?:Embed|EmbeddedMedia|EmbeddedMediaImage)\b[^"]*"/
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/script/src/runtime/server/instagram-embed.ts` around lines 18 - 23,
HAS_POST_CONTENT_RE currently only matches when the entire class attribute
equals "Embed" or "EmbeddedMedia", so update the regex used in
HAS_POST_CONTENT_RE (and used by isEmbedShell) to detect those tokens inside
multi-class lists and to include "EmbeddedMediaImage"; e.g., change it to search
the class attribute for word-boundary-separated tokens (Embed, EmbeddedMedia,
EmbeddedMediaImage) rather than exact-match the whole value so elements with
multiple classes are correctly recognized as post content.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/script/src/runtime/server/instagram-embed.ts`:
- Around line 28-45: The cache key for cachedEmbedFetch currently only uses the
URL (getKey), causing collisions when callers pass different headers; update the
getKey implementation used by defineCachedFunction (referencing cachedEmbedFetch
and its getKey) to incorporate headers as well — e.g., serialize the headers
into a stable string (JSON.stringify with sorted keys or a utility that produces
a deterministic header fingerprint) and include that with the URL to form the
cache key so responses vary by both URL and headers.
- Around line 18-23: HAS_POST_CONTENT_RE currently only matches when the entire
class attribute equals "Embed" or "EmbeddedMedia", so update the regex used in
HAS_POST_CONTENT_RE (and used by isEmbedShell) to detect those tokens inside
multi-class lists and to include "EmbeddedMediaImage"; e.g., change it to search
the class attribute for word-boundary-separated tokens (Embed, EmbeddedMedia,
EmbeddedMediaImage) rather than exact-match the whole value so elements with
multiple classes are correctly recognized as post content.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 66b9ef8a-0eb2-493c-9b76-5df3dd2fb876

📥 Commits

Reviewing files that changed from the base of the PR and between da57b6e and 5ce7b2f.

📒 Files selected for processing (1)
  • packages/script/src/runtime/server/instagram-embed.ts

Move the shell-response check into the shared utils module and add unit
tests covering: pure shell, real post that also contains the comet
sentinel, real post with Embed wrapper, and unrelated HTML.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/script/src/runtime/server/utils/instagram-embed.ts`:
- Around line 9-13: The current HAS_POST_CONTENT_RE only matches exact
class="Embed" or class="EmbeddedMedia" which misses class lists and
single-quoted attributes causing isEmbedShell (which uses SHELL_BODY_RE and
HAS_POST_CONTENT_RE) to misclassify embeds; update HAS_POST_CONTENT_RE to match
a class= attribute quoted with single or double quotes containing word
boundaries for Embed or EmbeddedMedia anywhere in the class list (e.g., match
class=(["'])...\\b(Embed|EmbeddedMedia)\\b...\\1) so isEmbedShell correctly
detects post content even when classes are combined or quoted differently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 423020fc-827c-46b2-9050-e0221aea4a71

📥 Commits

Reviewing files that changed from the base of the PR and between 5ce7b2f and 77eb5c6.

📒 Files selected for processing (3)
  • packages/script/src/runtime/server/instagram-embed.ts
  • packages/script/src/runtime/server/utils/instagram-embed.ts
  • test/unit/instagram-embed.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/script/src/runtime/server/instagram-embed.ts

Comment thread packages/script/src/runtime/server/utils/instagram-embed.ts Outdated
- Bump cache name to `nuxt-scripts-instagram-embed-v2` to evict any
  empty-shell entries cached under v1 before the fix.
- Include headers in the cache key — Instagram's response is UA-dependent,
  so different header sets must not share a cached entry.
- Broaden `HAS_POST_CONTENT_RE` to match Embed/EmbeddedMedia/EmbeddedMediaImage
  as tokens inside any class list (single- or double-quoted), and accept
  single-quoted shell sentinels.

Addresses CodeRabbit feedback on #806.
@harlan-zw harlan-zw merged commit 87b1342 into main May 27, 2026
19 checks passed
@harlan-zw harlan-zw deleted the fix/instagram-embed-ua branch May 27, 2026 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs: instagram embed looks broken

1 participant