Category
New defense rule (speculative / research-direction)
What problem does this solve?
Every other rule in the project assumes the page the agent reads is the same page a human reviewer would read. AI-targeted cloaking breaks that assumption: a site identifies an incoming request as agent traffic (UA, IP range, automation framework signatures) and serves a different, attacker-controlled version (with embedded injection or fabricated "facts") while humans get a benign page.
Recent work establishes this as an active threat against agentic browsers:
Search-engine cloaking has a long lineage (Wu & Davison, AIRWeb 2005; Chellapilla & Maykov, AIRWeb 2007); AI-targeted cloaking is the same primitive aimed at a new class of crawler.
Proposed solution
This is structurally hard from a content script alone — the script can't trivially make a "what would Chrome see?" comparison request. Three escalating options:
- Fingerprint surface flag. If the page reads bot-flag globals (
navigator.webdriver, common automation telltales, agent UA substrings), annotate that the operator is capable of distinguishing agents. Doesn't prove cloaking happened; flags capability.
- Out-of-band comparison fetch. Fire a
fetch(location.href, { headers: { 'User-Agent': '<chrome UA>' } }) from the content script (same-origin), diff against document.documentElement.outerHTML, annotate when text-content divergence exceeds a threshold.
- Background-script proxy fetch with normalized headers. Move (2) to the MV3 background using
declarativeNetRequest / webRequest to detect content-type and length divergence. Heavier infra, avoids same-origin awkwardness.
Note re repo convention "defenses against prompt injection / cross-origin trickery should strip the content": cloaking is one where content-script-side stripping is structurally infeasible — by the time the rule sees the DOM, the cloaked content is the page. Annotation is the best a content script can offer; full mitigation belongs upstream (request layer, vendor, or the agent itself).
Alternatives considered
- Defer entirely to the agent's anti-cloaking layer (model vendor, framework). Reasonable — the threat is real but the in-DOM defense surface is narrow.
- Browser-side header normalization. Strip distinguishing headers/UA so the agent looks like Chrome. Out of scope for an extension that targets defensive content rewriting — arms-races with JS-side fingerprinting.
Controlling false positives
This is the highest-FP-risk rule in the proposal set. Almost every defense-side signal has a benign explanation, so the rule must be conservative.
- Precise annotation phrasing. Distinguish between "this site can distinguish agent traffic" (option 1 — observable, low confidence) and "this site served different content under agent fingerprint vs. browser fingerprint" (option 2/3 — measurable, higher confidence). Never use the unqualified word "cloaking" in the annotation; the rule names a capability or measured divergence, not an intent.
- High divergence threshold for options 2/3. Normal A/B tests and per-request personalization commonly produce 5–15% text diff. The threshold for an annotation should be well above the noise floor — propose >40% Jaccard distance on stemmed token sets, or significant divergence in the prompt-injection pattern set hit count (i.e., the agent-flavored response has injection-shaped strings the Chrome-flavored response doesn't).
- Geofencing exclusion. Pages whose comparison fetch crosses a region boundary (different
Set-Cookie region, different currency, different language detection) are expected to diverge. Detect via response headers and skip.
- Legitimate bot-detection allowlist. Sites that fingerprint for anti-fraud (banks, payment processors, e-commerce checkout flows) reasonably distinguish agent traffic. Whitelist these origins from the option-1 capability flag — annotating "this bank's login page can detect agents" is true but unhelpful.
- CSP / network-level failure ≠ cloaking. Option 2 in particular: if the comparison fetch returns 403, 451, or a captcha challenge, that's a security control, not cloaking. Treat as "indeterminate" not "cloaked".
- Don't fire on cross-origin frames. The parent page can't determine what a frame's origin would serve to a different client; out of scope.
- Default-off, experimental. Same posture as
schema-trust-sanitize, cross-origin-frame-redact, trust-badge-annotate — ship as experimental candidate until per-host telemetry shows the FP rate is manageable.
- Per-host allow/deny lists from the start. Curate (similar to
roach-motel-annotate's site list) for hosts where the rule has known signal vs. known noise.
Open questions / risks
- Probably out of scope for a pure content-script rule. v1 is at most option (1); (2) and (3) require background-script work and may not pay rent.
- Defense vs. offense asymmetry. Detection from inside the agent's own browsing context is structurally weak — the attacker can flag the comparator request too.
Tagged Impact H / Complexity H.
Category
New defense rule (speculative / research-direction)
What problem does this solve?
Every other rule in the project assumes the page the agent reads is the same page a human reviewer would read. AI-targeted cloaking breaks that assumption: a site identifies an incoming request as agent traffic (UA, IP range, automation framework signatures) and serves a different, attacker-controlled version (with embedded injection or fabricated "facts") while humans get a benign page.
Recent work establishes this as an active threat against agentic browsers:
Search-engine cloaking has a long lineage (Wu & Davison, AIRWeb 2005; Chellapilla & Maykov, AIRWeb 2007); AI-targeted cloaking is the same primitive aimed at a new class of crawler.
Proposed solution
This is structurally hard from a content script alone — the script can't trivially make a "what would Chrome see?" comparison request. Three escalating options:
navigator.webdriver, common automation telltales, agent UA substrings), annotate that the operator is capable of distinguishing agents. Doesn't prove cloaking happened; flags capability.fetch(location.href, { headers: { 'User-Agent': '<chrome UA>' } })from the content script (same-origin), diff againstdocument.documentElement.outerHTML, annotate when text-content divergence exceeds a threshold.declarativeNetRequest/webRequestto detect content-type and length divergence. Heavier infra, avoids same-origin awkwardness.Note re repo convention "defenses against prompt injection / cross-origin trickery should strip the content": cloaking is one where content-script-side stripping is structurally infeasible — by the time the rule sees the DOM, the cloaked content is the page. Annotation is the best a content script can offer; full mitigation belongs upstream (request layer, vendor, or the agent itself).
Alternatives considered
Controlling false positives
This is the highest-FP-risk rule in the proposal set. Almost every defense-side signal has a benign explanation, so the rule must be conservative.
Set-Cookieregion, different currency, different language detection) are expected to diverge. Detect via response headers and skip.schema-trust-sanitize,cross-origin-frame-redact,trust-badge-annotate— ship as experimental candidate until per-host telemetry shows the FP rate is manageable.roach-motel-annotate's site list) for hosts where the rule has known signal vs. known noise.Open questions / risks
Tagged Impact H / Complexity H.