Fix: detect PII / encoded payloads split across sibling text nodes (#203) by twschiller · Pull Request #210 · pixiebrix/agent-browser-shield

twschiller · 2026-06-07T12:37:02Z

Summary

Closes the last Critical bypass tracked in Audit: rule-bypass findings (red-team pass) #203: PII / encoded payloads / secrets rendered with one wrapper element per fragment (e.g. React -per-digit-group cards) used to slip past the per-text-node regex.
Same change also closes Audit: rule-bypass findings (red-team pass) #203 / item 7 (encoded-payload split-across-whitespace) because both rules share the inline-text-redact factory.

Approach

collectTextNodesWithInlineGroups (new) tags each text node with an inline-formatting-context id. Group id bumps on entering/leaving a block-level element, on  , and on shadow-root boundaries. The existing collectTextNodesShadowPiercing becomes a thin wrapper that drops the group tag — no callers of the non-group walker change.
walkTextNodeGroupsChunked (new sibling of walkTextNodesChunked) threads the group ids through the same chunked-yield driver.
defineInlineTextRedactRule switches to the new walker. Each chunk is bucketed by group; single-node buckets keep the existing single-node fast path; multi-node buckets concatenate values, run collectMatches against the concatenation, and route each match to either replaceMatchesInTextNode (within-node) or replaceMatchAcrossTextNodes (new).
replaceMatchAcrossTextNodes scrubs the carrier — first node keeps its prefix, last node keeps its suffix, interior nodes have their nodeValue blanked. Wrapping inline elements stay in the DOM (per the "scrub, don't detach framework-owned nodes" rule). Reveal restores the matched characters as a single text node at the placeholder's position.
Per-node minLength no longer filters at the walker level (it would drop legitimate cross-node candidates whose individual nodes sit below the floor); the group's concatenated length is checked instead, preserving the cheap early-out for prose nodes.

Demo site

New "3a. Alternate payment" block on Checkout.tsx exercises five span-split shapes (card, SSN, phone, JWT, base64 payload) so the before/after diff visualizes the closed gap.

Test plan

bun run check (extension)
bun run typecheck (extension)
bun run test (extension — 1716 tests, all green; 16 new cross-node tests across pii / secrets / encoded-payload / placeholder helper + 4 new property tests)
bun run check (demo-site)
pre-commit run --all-files
Spot-check demo site in browser: card / SSN / phone / JWT / encoded payload on the new section 3a should each render as […hidden] chips, and reveal-on-click should restore the original characters.

Notes / accepted trade-offs

Reveal collapses the matched-text split into one text node — the original per-span breakdown isn't reconstructed. Visual characters are the same; wrapper elements stay.
Cross-chunk-boundary cross-node matches are not detected (>100 text nodes in a single inline-formatting context). Vanishingly rare with the default chunkSize=100.
Late-inserted siblings that complete an existing cross-node match aren't picked up by the incremental watcher (its scan root is the new subtree, not the joined context). Same limitation as the pre-existing per-node case missing late additions to existing text nodes.

🤖 Generated with Claude Code

) The inline-text-redact factory (pii-redact / secrets-redact / encoded-payload-redact) used to scan one text node at a time, so a card number rendered as `4111 1111 ...` slipped past every regex — the bypass tracked as Critical-3 in the #203 audit. Group text nodes by inline-formatting context during the walk (block elements, ` `, and shadow roots start a new group) and run collectMatches against the concatenated group value. Matches that straddle a node boundary land via a new `replaceMatchAcrossTextNodes` helper that scrubs the carrier text but leaves wrapping inline elements in place — same "scrub, don't detach framework-owned nodes" pattern the rest of the codebase already follows. Closes #203 item 3 (PII / Critical) and item 7 (encoded-payload split-across-whitespace / High) with one shared change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-06-07T12:37:08Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agent-browser-shield-demo-site	Ready	Preview, Comment	Jun 7, 2026 6:33pm

unblocked

1 issue found.

About Unblocked

Unblocked has been set up to automatically review your team's pull requests to identify genuine bugs and issues.

📖 Documentation — Learn more in our docs.

💬 Ask questions — Mention @unblocked to request a review or summary, or ask follow-up questions.

👍 Give feedback — React to comments with 👍 or 👎 to help us improve.

⚙️ Customize — Adjust settings in your preferences.

unblocked · 2026-06-07T12:40:42Z

+  for (const match of matches.toReversed()) {
+    const start = locateStart(layout, match.start);
+    const end = locateEnd(layout, match.end);
+    if (start.index === end.index) {
+      const node = layout.nodes[start.index];
+      if (!node) {
+        continue;
+      }
+      replaceMatchesInTextNode(
+        node,
+        [{ start: start.offset, end: end.offset, label: match.label }],
+        ruleId,
+      );
+      continue;
+    }


When two or more non-overlapping matches both resolve to the same start.index === end.index (i.e., both sit entirely within the same text node in a multi-node bucket), replaceMatchesInTextNode is called once per match with a single-element array. The first call (rightmost, due to toReversed()) detaches the original text node from the DOM via parentNode.replaceChild(fragment, textNode). Every subsequent call for the same node finds textNode.parentNode === null, so the ?.replaceChild is a no-op and the match is silently dropped — leaking the sensitive data it was supposed to redact.

Example: SSN: 123-45-6789 Card: 4111 1111 1111 1111 end — the card match is processed first and detaches node 0; the SSN match then fails silently.

Fix: group all single-node matches by their node index and pass each group as a batch to replaceMatchesInTextNode, which already handles multiple matches correctly in a single call. Something like:

const singleNodeMatches = new Map<number, InlineMatch[]>(); const crossNodeMatches: typeof matches = []; for (const match of matches) { const start = locateStart(layout, match.start); const end = locateEnd(layout, match.end); if (start.index === end.index) { let arr = singleNodeMatches.get(start.index); if (!arr) { arr = []; singleNodeMatches.set(start.index, arr); } arr.push({ start: start.offset, end: end.offset, label: match.label }); } else { crossNodeMatches.push(match); } } // Apply cross-node matches last-to-first (existing logic) for (const match of crossNodeMatches.toReversed()) { ... } // Apply batched single-node matches (one replaceMatchesInTextNode call per node) for (const [index, nodeMatches] of singleNodeMatches) { const node = layout.nodes[index]; if (node) { replaceMatchesInTextNode(node, nodeMatches, ruleId); } }

Fixed in 201f44c — confirmed the bug reproduces (SSN + card in one span of a two-span bucket: only the rightward match landed; the SSN leaked) and that the same root cause covered a related shape (cross-node phone + single-node card sharing a node, where the phone's lastNode truncation silently no-op'd because card had already detached it, leaving 4567 visible).

The sequential-application model is fundamentally racy here — any operation that detaches a text node breaks every later operation targeting it, and any operation that truncates a boundary node shifts every later offset in that node. Rather than try to order around it, I batched: new replaceMatchesAcrossTextNodes (plural) snapshots every node's original nodeValue up front, plans each affected node's content from that snapshot, and applies one replaceChild per node. Each text node is mutated exactly once, so no offsets ever go stale.

Regression tests cover both shapes (your example + the cross-node variant). Full suite is green (1718 tests).

— Claude Code, on behalf of @twschiller

…tion unblocked[bot] flagged that sequential per-match application in processBucket detached text nodes mid-loop, so a second single-node match in the same node silently dropped. Same root cause covers a related case where a cross-node match shares its lastNode with a single-node match — sequential application either dropped the match or shifted offsets so the wrong characters were redacted. Replace the per-match loop with a plural `replaceMatchesAcrossTextNodes` that snapshots every original `nodeValue` up front, plans each affected node's new content from that snapshot, then applies one replaceChild (or value-blank for fully-consumed interior nodes) per node. Each text node is mutated exactly once, so no offsets ever go stale. Regression tests cover both shapes: - two single-node matches sharing a text node in a multi-node bucket - cross-node match + single-node match sharing a node Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

unblocked Bot reviewed Jun 7, 2026

View reviewed changes

vercel Bot deployed to Preview June 7, 2026 18:33 View deployment

twschiller merged commit 0da74ef into main Jun 7, 2026
7 checks passed

twschiller deleted the fix/cross-node-inline-redact branch June 7, 2026 18:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: detect PII / encoded payloads split across sibling text nodes (#203)#210

Fix: detect PII / encoded payloads split across sibling text nodes (#203)#210
twschiller merged 2 commits into
mainfrom
fix/cross-node-inline-redact

twschiller commented Jun 7, 2026

Uh oh!

vercel Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

unblocked Bot left a comment

Uh oh!

unblocked Bot Jun 7, 2026

Uh oh!

twschiller Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

twschiller commented Jun 7, 2026

Summary

Approach

Demo site

Test plan

Notes / accepted trade-offs

Uh oh!

vercel Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

unblocked Bot left a comment

Choose a reason for hiding this comment

About Unblocked

Uh oh!

unblocked Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

twschiller Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 7, 2026 •

edited

Loading