Skip to content

Fix: scrub instead of detach for framework-rendered DOM#176

Merged
twschiller merged 3 commits into
mainfrom
fix/scrub-not-detach-framework-nodes
Jun 5, 2026
Merged

Fix: scrub instead of detach for framework-rendered DOM#176
twschiller merged 3 commits into
mainfrom
fix/scrub-not-detach-framework-nodes

Conversation

@twschiller
Copy link
Copy Markdown
Contributor

Summary

Four rules — meta-injection-strip, noscript-strip, html-comment-strip,
hidden-text-strip — called .remove() on DOM nodes the page framework had
rendered and was tracking for unmount. When React Router (and equivalents in
Vue / Svelte / Astro / htmx head-merge) tried to reconcile those nodes out
on route change, removeChild threw inside the commit phase and stranded
the route mid-render.

User-visible symptom on the demo site: clicking the RiverMart logo from a
product detail page updated the URL but never repainted the home content,
because React 19 hoists <title> and three <meta> tags from
ProductDetail.tsx into <head>, meta-injection-strip detached them on
match, and React's next commit tried to removeChild nodes whose
parentNode was already null.

Each rule now blanks its data carrier (attribute value, textContent, or
Text-node data) rather than detaching the carrier node. The element
references the framework holds stay valid; agent-readable content still
goes to empty.

  • meta-injection-strip — blanks content="" on matching <meta> (was
    element.remove()).
  • noscript-strip — blanks <noscript> children via textContent = ""
    (was element.remove()).
  • html-comment-strip — blanks comment.data only when the data matches
    INJECTION_PATTERNS (was: unconditional removal of every comment outside
    <script> / <style> / <noscript>). The injection-match gate
    naturally preserves React Suspense markers ($, /$, $?, …) without
    an explicit allowlist.
  • hidden-text-strip — walks SHOW_TEXT and blanks each Text node's data
    (was element.remove()). Element / comment nodes inside the hidden
    subtree stay where the framework put them.

Property-based tests pin the load-bearing invariants:

  • html-comment-strip.property.test.ts — across all known injection
    fixtures, comments are blanked but stay attached; across the framework
    marker set ($, /$, $?, $!, [, ], build stamps, license
    headers, dev TODOs), comments are left untouched. Catches any future
    widening of INJECTION_PATTERNS that would re-introduce the navigation
    crash via a different path.
  • hidden-text-strip.property.test.ts — across the cross-product of
    hidden-CSS triggers × child-subtree shapes, descendant text is blanked
    AND every element node inside the hidden box stays attached. Drift back
    to element.remove() or replaceChildren() would fail this as a class,
    not just one specific case.

Docs and skill catalogs updated to describe each rule's current scrub
behavior and to broaden the strip verb's taxonomy entry.

Coverage trade-offs

Switching from "detach the carrier" to "blank the carrier" trades framework
safety for some coverage reduction. Four specific false-negative paths to
be aware of:

  1. meta-injection-strip — in-place content updates. The subtree
    watcher observes id / class attribute changes only, not content.
    If a framework or page script later writes a new poisoned value to
    content= on the same <meta> node we blanked, the new value is
    visible to agents until the next route change or subtree addition
    re-triggers a scan. Pre-fix, the meta was detached, so any framework
    write landed on a node nobody was reading.

  2. noscript-strip — in-place children replacement inside a kept
    <noscript>.
    The watcher fires on the new children, but
    stripNoscript's querySelectorAll walks downward — it doesn't
    recognize that the added subtree's ancestor is a <noscript> we
    already blanked. Pre-fix, the noscript was gone, so any new content had
    to bring its own <noscript> wrapper. Real-world likelihood is very
    low: frameworks essentially never re-render noscript children at
    runtime.

  3. html-comment-strip — comments not matching INJECTION_PATTERNS now
    survive.
    Largest coverage delta in the PR. The rule went from "strip
    every comment outside <script> / <style> / <noscript>" to "scrub
    only comments whose data matches the injection pattern set" (currently
    11 patterns). Novel injection phrasings, off-pattern prose, and any
    benign-looking comments carrying instruction-shaped text that used to
    be removed now stay visible. The narrowing is what makes React Suspense
    markers safe automatically (they don't match any pattern) — but it caps
    coverage at the recognized shapes.

  4. hidden-text-strip — non-text payloads inside hidden subtrees
    survive.
    The rule used to remove the wrapper, taking image alt,
    aria-label, title, SVG <title> / <desc>, and data-*
    attributes with it. Now it only blanks Text nodes. Attribute-shaped
    payloads inside a hidden box are still caught by
    attribute-injection-sanitize and svg-text-strip if they match
    those rules' pattern sets, but the broad "wipe everything inside the
    box" sweep is gone.

Follow-ups (not in this PR)

  • Extend OBSERVED_ATTRIBUTES in subtree-watcher.ts to include content
    and add an upward closest(\"noscript\") check in stripNoscript to
    close (1) and (2).
  • Decide whether html-comment-strip should restore broad-sweep behavior
    with a framework-marker allowlist (skip $ / /$ / [ / ]-prefixed
    data) to close (3). Kept out of this PR because the allowlist
    enumeration is its own design discussion.
  • Audit INJECTION_PATTERNS coverage on attribute-targeted rules now that
    hidden-text-strip no longer back-stops them (4).

Test plan

  • bun run check in extension/ clean
  • bun run test in extension/: 1380 / 1380 passing across 77 suites
  • Manual repro on the demo site: navigate from a product detail page
    back to home via the RiverMart logo; confirm Home renders (was: URL
    updated but content frozen on Product Detail)
  • Toggle each of the four rules off in the popup on the demo site;
    navigation should still work either way
  • Sanity check on a real React 19 SSR site with metadata hoisting
    (any Next.js 14+ deploy)

🤖 Generated with Claude Code

Four rules — meta-injection-strip, noscript-strip, html-comment-strip,
hidden-text-strip — called .remove() on DOM nodes that the page framework
had rendered and was tracking for unmount. When React Router (and equivalents
in Vue / Svelte / Astro / htmx head-merge) tried to reconcile those nodes
out on route change, removeChild threw inside the commit phase and stranded
the route mid-render. User-visible symptom on the demo site: clicking the
RiverMart logo from a product detail page updated the URL but never repainted
the home content.

Switched each rule to blank its data carrier (attribute value, textContent,
or Text-node data) rather than detach the carrier node. Element references
the framework holds stay valid, agent-readable content still goes to empty.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agent-browser-shield-demo-site Ready Ready Preview, Comment Jun 5, 2026 9:27pm

Request Review

Copy link
Copy Markdown

@unblocked unblocked Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found.

About Unblocked

Unblocked has been set up to automatically review your team's pull requests to identify genuine bugs and issues.

📖 Documentation — Learn more in our docs.

💬 Ask questions — Mention @unblocked to request a review or summary, or ask follow-up questions.

👍 Give feedback — React to comments with 👍 or 👎 to help us improve.

⚙️ Customize — Adjust settings in your preferences.

Comment on lines +41 to +42
if (content !== null && content.length > 0 && containsInjection(content)) {
element.setAttribute("content", "");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exported description at line 99 still reads "Remove <meta> tags and clear <title> text whose content carries prompt-injection patterns." — but the behavior now blanks the content attribute rather than removing the element. The other three rules in this PR (hidden-text-strip, html-comment-strip, noscript-strip) all had their description property updated from "Remove" to "Blank". This one was missed.

At line 99 update the description to something like:

    "Blank <meta> content and <title> text whose value carries prompt-injection patterns.",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bf0a0be — the exported description now reads "Blank content and <title> text whose value carries prompt-injection patterns.", matching what the other three converted rules adopted. Good catch — this is exactly the kind of drift the rule sync was supposed to prevent. — Claude Code, on behalf of @twschiller

Sync meta-injection-strip's exported description with its current
behavior — the rule blanks the content attribute now, not the element.
Matches the wording the other three converted rules adopted in the prior
commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reflow docs through mdformat — pre-commit owns markdown wrapping, and
my edits in the previous commits used different line widths and table
column padding. Pure formatting, no content changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@twschiller twschiller added the bug Something isn't working label Jun 5, 2026
@twschiller twschiller merged commit 66620a5 into main Jun 5, 2026
7 checks passed
@twschiller twschiller deleted the fix/scrub-not-detach-framework-nodes branch June 5, 2026 21:31
twschiller added a commit that referenced this pull request Jun 5, 2026
…uts in attribute-injection-sanitize

These ARIA attributes surface to the accessibility tree as agent-readable
name / description / value / shortcut text, so they're a quiet carrier for
injection payloads. Pre-#176 they fell to hidden-text-strip's wrapper
detachment; after #176 nothing was scrubbing them. Append the four names
to CANDIDATE_ATTRIBUTES; the selector and scrub loop pick them up
unchanged. Tests, property-test allowlist, and rule docs updated.

Closes #182

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant