Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,15 @@ DOM. Five canonical verbs:
- `sanitize` — keeps the element and cleans attributes/text/state
(`json-ld-sanitize`, `attribute-injection-sanitize`, `confirmshame-sanitize`,
`checkout-checkbox-sanitize`)
- `strip` — removes the element/node from the DOM entirely (`noscript-strip`,
`html-comment-strip`, `hidden-text-strip`, `svg-sprite-strip`,
`svg-text-strip`, `meta-injection-strip`)
- `strip` — removes the agent-readable content from the DOM. Most strip rules
blank the data carrier (attribute value, text node, comment data) rather than
detach the element, so SPA frameworks (React 19 metadata, Vue Teleport, Astro
view transitions) keep live references to the rendered node and reconcile
cleanly on route change. `svg-sprite-strip` is the exception — its targets are
inline sprite definitions the page framework does not own, so it detaches the
element outright. (`noscript-strip`, `html-comment-strip`,
`hidden-text-strip`, `svg-sprite-strip`, `svg-text-strip`,
`meta-injection-strip`)

`-helper` is reserved for non-defensive agent affordances (`search-url-helper`).
See CONTRIBUTING.md → *Adding a new rule* → *Rule ID naming* for the longer
Expand Down
14 changes: 7 additions & 7 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,13 +111,13 @@ Walkthrough:
Rule IDs follow `<target>-<verb>`. Pick the verb that names what the rule
actually does to the DOM, not the threat it addresses:

| Verb | Use when the rule… | Examples |
| ---------- | -------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `annotate` | adds an agent-readable warning or affordance; page content unchanged | `cart-addon-annotate`, `roach-motel-annotate` |
| `hide` | visually conceals with `display: none`; element stays in the DOM | `ads-hide`, `cookie-banner-hide`, `chat-widget-hide`, `newsletter-modal-hide` |
| `redact` | replaces content with a click-to-reveal placeholder | `pii-redact`, `secrets-redact`, `comments-redact`, `prompt-injection-redact` |
| `sanitize` | keeps the element and cleans its attributes / text / form state | `json-ld-sanitize`, `attribute-injection-sanitize`, `confirmshame-sanitize`, `checkout-checkbox-sanitize` |
| `strip` | removes the element/node from the DOM entirely | `noscript-strip`, `html-comment-strip`, `hidden-text-strip`, `svg-sprite-strip`, `svg-text-strip`, `meta-injection-strip` |
| Verb | Use when the rule… | Examples |
| ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `annotate` | adds an agent-readable warning or affordance; page content unchanged | `cart-addon-annotate`, `roach-motel-annotate` |
| `hide` | visually conceals with `display: none`; element stays in the DOM | `ads-hide`, `cookie-banner-hide`, `chat-widget-hide`, `newsletter-modal-hide` |
| `redact` | replaces content with a click-to-reveal placeholder | `pii-redact`, `secrets-redact`, `comments-redact`, `prompt-injection-redact` |
| `sanitize` | keeps the element and cleans its attributes / text / form state | `json-ld-sanitize`, `attribute-injection-sanitize`, `confirmshame-sanitize`, `checkout-checkbox-sanitize` |
| `strip` | removes the agent-readable content from the DOM (usually by blanking the data carrier — attribute value, text node, comment data — so SPA framework references stay valid; `svg-sprite-strip` detaches the sprite element outright) | `noscript-strip`, `html-comment-strip`, `hidden-text-strip`, `svg-sprite-strip`, `svg-text-strip`, `meta-injection-strip` |

Picking between `hide` and `redact`: if the user can still meaningfully act on
the element when it's gone (e.g. a floating overlay they would never read),
Expand Down
72 changes: 43 additions & 29 deletions docs/src/content/docs/rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,28 +122,37 @@ content surface whose text the host page does not control.
- **ID:** `html-comment-strip`
- **Default:** on

Remove HTML comments from the page. Comments are invisible to humans but
readable by agents and can carry prompt-injection payloads. Comments inside
`<script>`/`<style>`/`<noscript>` are preserved. Removal is not reversible
within the current page load. HTML comments are explicitly enumerated as a
non-rendered carrier in Greshake et al. [[1]](#ref-greshake-2023).
Walk every HTML comment in the page and blank its data when the value matches
the prompt-injection pattern set (the same regex bundle used by
`prompt-injection-redact`). Comments are invisible to humans but readable by
agents and can carry prompt-injection payloads; the scrub neutralizes the
matching carrier while leaving the Comment node attached. Comments inside
`<script>`/`<style>`/`<noscript>` are preserved verbatim. Off-pattern comments —
license headers, build stamps, dev notes — are left alone, as are
framework-marker comments that SPA renderers use as Suspense or hydration
boundaries. The scrub is not reversible within the current page load. HTML
comments are explicitly enumerated as a non-rendered carrier in Greshake et al.
[[1]](#ref-greshake-2023).

#### Strip Noscript

- **ID:** `noscript-strip`
- **Default:** on

Remove every `<noscript>` element from the page. A browser-use agent runs in a
browser at all precisely because the site requires JavaScript — an operator who
could read the same data from the server directly would do that and skip the
browser entirely. With JS enabled, `<noscript>` content is, by definition, never
rendered to a human, but the markup still sits in the DOM and is still walked by
accessibility-tree and `innerText` consumers. That makes it a clean carrier for
prompt-injection payloads, fabricated authority claims, or fallback chrome the
agent may treat as load-bearing. `html-comment-strip` previously preserved
Comment nodes inside `<noscript>` so that SSR hydration markers and
conditional-CSS fragments survived; with this rule on, the surrounding noscript
element is removed outright, taking those comments with it.
Walk every `<noscript>` element in the page and blank its children. A
browser-use agent runs in a browser at all precisely because the site requires
JavaScript — an operator who could read the same data from the server directly
would do that and skip the browser entirely. With JS enabled, `<noscript>`
content is, by definition, never rendered to a human, but the markup still sits
in the DOM and is still walked by accessibility-tree and `innerText` consumers.
That makes it a clean carrier for prompt-injection payloads, fabricated
authority claims, or fallback chrome the agent may treat as load-bearing. The
`<noscript>` element itself stays attached so SPA frameworks (React 19 native
head metadata, Vue Teleport-to-head, etc.) that hold a live reference to the
node can still reconcile it on route change. `html-comment-strip` previously
preserved Comment nodes inside `<noscript>` so that SSR hydration markers and
conditional-CSS fragments survived; with this rule on, the surrounding
noscript's contents are blanked, taking those comments with them.

Same non-rendered-carrier class as Greshake et al. [[1]](#ref-greshake-2023);
the "renderer-and-reader disagree on what's visible" asymmetry is the one
Expand All @@ -156,14 +165,17 @@ formalized for zero-width characters in Boucher et al.
- **ID:** `hidden-text-strip`
- **Default:** on

Remove text that is invisible to humans (foreground matching background,
`visibility:hidden`, `opacity:0`, `font-size:0`, off-screen positioning,
zero-area clipping) but still readable by agents. Defends against "unseeable"
prompt injection. Screen-reader-only text is preserved (via `.sr-only`,
`.visually-hidden`, `.a-offscreen`, `.aok-offscreen`, MUI `visuallyHidden`, and
the 1×1 + `overflow:hidden` + `position:absolute` envelope) so a11y-tree
affordances like Amazon SERP prices stay intact. `display:none` is left alone so
collapsed menus and tab panels keep working.
Walk every element matching a hidden-CSS trigger (foreground matching
background, `visibility:hidden`, `opacity:0`, `font-size:0`, off-screen
positioning, zero-area clipping) and blank every text node inside it. Defends
against "unseeable" prompt injection. The element and its descendant element
nodes stay attached so SPA frameworks (React, Vue, Svelte, Astro) that hold live
references to the rendered nodes can still reconcile them on route change.
Screen-reader-only text is preserved (via `.sr-only`, `.visually-hidden`,
`.a-offscreen`, `.aok-offscreen`, MUI `visuallyHidden`, and the 1×1 +
`overflow:hidden` + `position:absolute` envelope) so a11y-tree affordances like
Amazon SERP prices stay intact. `display:none` is left alone so collapsed menus
and tab panels keep working.

Liao et al. (EIA) [[3]](#ref-liao-eia) demonstrates that web elements made
invisible via CSS — opacity, off-screen positioning, zero-area clipping — are
Expand Down Expand Up @@ -204,16 +216,18 @@ in the indirect-injection benchmarks.

Walk every `<meta>` element with a `content` attribute and every `<title>`
element. When the value matches the prompt-injection pattern set (the same regex
bundle as `prompt-injection-redact`), remove the `<meta>` element outright and
blank the `<title>` text. The rule does not gate on specific `name=` /
bundle as `prompt-injection-redact`), blank the `<meta>` element's `content`
attribute and blank the `<title>` text. Both elements stay attached so SPA
frameworks (React 19 native head metadata, react-helmet, Vue Teleport-to-head,
Astro view transitions) that hold a live reference to the hoisted node can still
reconcile it on route change. The rule does not gate on specific `name=` /
`property=` values — any meta whose content carries instruction-shaped text is
removed, covering `name="description"`, `name="keywords"`,
scrubbed, covering `name="description"`, `name="keywords"`,
`property="og:title"`, `property="og:description"`, `name="twitter:title"`,
`name="twitter:description"`, `name="twitter:image:alt"`, and the `article:*`
family. Meta tags without a content attribute are left alone. The rule scans
`document.head` in addition to the engine's `apply` root, since meta and title
normally live in `<head>` and SPA frameworks (React 19 native head metadata,
react-helmet) mutate `<head>` on route changes.
normally live in `<head>` and SPA frameworks mutate `<head>` on route changes.

Page metadata is invisible to a sighted human (it surfaces in the browser tab,
social-share unfurls, and search-result snippets, not in the rendered article
Expand Down
135 changes: 135 additions & 0 deletions extension/src/rules/__tests__/hidden-text-strip.property.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
// Copyright (c) 2026 PixieBrix, Inc.
// Licensed under PolyForm Shield 1.0.0 — see LICENSE.

// Property-based tests for `hidden-text-strip`. Pins the load-bearing
// framework-safety invariant: across the cross-product of (hidden-trigger
// style) × (child-subtree shape), the scrub must blank text but leave
// every element node attached. Drifting back to `element.remove()` would
// only fail one specific unit test; the property pins it as a class.

import fc from "fast-check";

import { hiddenTextStripRule } from "../hidden-text-strip";
import { FIXTURES } from "./injection-fixtures";

// CSS forms `detectHiddenByCss` recognizes. Each one alone qualifies an
// element for stripping — verified by the corresponding cases in the
// unit-test file. Color-match (`detectColorMatch`) is exercised
// separately below because it requires bg/fg setup on a wrapper.
const HIDDEN_STYLE = fc.constantFrom(
"visibility: hidden",
"opacity: 0",
"font-size: 0",
"position: absolute; left: -10000px",
"clip-path: inset(100%)",
);

const PAYLOAD = fc.constantFrom(
FIXTURES.HIDDEN_IGNORE_PRIOR,
FIXTURES.HIDDEN_WHITE_ON_WHITE,
FIXTURES.HIDDEN_LARGE_OFFSCREEN,
FIXTURES.HIDDEN_SMUGGLED,
);

// Number of nested element children the scrub must leave attached.
// React/Vue/Svelte hold references to every element they rendered — if
// any one of them gets detached, the framework's next unmount throws on a
// missing parent. Vary the count so the invariant holds across "empty
// hidden box", "single span", and "deeply populated subtree".
const CHILD_COUNT = fc.integer({ min: 0, max: 6 });

beforeEach(() => {
document.body.innerHTML = "";
});

afterEach(() => {
hiddenTextStripRule.teardown();
});

describe("hidden-text-strip (property)", () => {
it("blanks text and keeps every element node attached for any CSS-hidden trigger", () => {
fc.assert(
fc.property(
HIDDEN_STYLE,
PAYLOAD,
CHILD_COUNT,
(style, payload, childCount) => {
document.body.innerHTML = "";
const target = document.createElement("div");
target.id = "target";
target.setAttribute("style", style);
// Direct text on the wrapper qualifies the element for every
// CSS trigger (font-size:0 / color-match gate on own direct
// text; the rest gate only on having any non-empty text).
target.append(document.createTextNode(payload));

const markers: HTMLSpanElement[] = [];
for (let i = 0; i < childCount; i++) {
const marker = document.createElement("span");
marker.id = `marker-${i}`;
marker.textContent = `marker-${i}`;
target.append(marker);
markers.push(marker);
}
document.body.append(target);

hiddenTextStripRule.apply(document.body);

// Target stays attached: framework reconciliation can still
// unmount it on the next route change without throwing on a
// detached parent.
expect(target.isConnected).toBe(true);
expect(target.parentElement).toBe(document.body);
// Every text-bearing surface in the subtree is blanked: agents
// walking textContent / the a11y tree read nothing.
expect(target.textContent).toBe("");
// Every child element node survives in place. This is the
// invariant `replaceChildren()` / `element.remove()` violates.
for (const marker of markers) {
expect(marker.isConnected).toBe(true);
expect(marker.parentElement).toBe(target);
}
},
),
);
});

it("blanks color-matched text and keeps wrapper + descendants attached", () => {
// Color-match uses a different code path than the CSS triggers above —
// it walks ancestors to compute the effective background. Cover it
// separately so the framework-safety invariant is pinned across both
// detector families.
fc.assert(
fc.property(PAYLOAD, CHILD_COUNT, (payload, childCount) => {
document.body.innerHTML = "";
const wrapper = document.createElement("section");
wrapper.setAttribute("style", "background-color: rgb(20, 20, 20)");
const target = document.createElement("span");
target.id = "target";
target.setAttribute("style", "color: rgb(22, 22, 22)");
target.append(document.createTextNode(payload));

const markers: HTMLSpanElement[] = [];
for (let i = 0; i < childCount; i++) {
const marker = document.createElement("i");
marker.id = `marker-${i}`;
marker.textContent = `marker-${i}`;
target.append(marker);
markers.push(marker);
}
wrapper.append(target);
document.body.append(wrapper);

hiddenTextStripRule.apply(document.body);

expect(target.isConnected).toBe(true);
expect(target.textContent).toBe("");
expect(wrapper.isConnected).toBe(true);
for (const marker of markers) {
expect(marker.isConnected).toBe(true);
expect(marker.parentElement).toBe(target);
}
}),
);
});
});
Loading
Loading