Skip to content

fix: security audit round 2 (v0.13.4.0)#640

Merged
garrytan merged 11 commits intomainfrom
garrytan/security-audit-round2
Mar 30, 2026
Merged

fix: security audit round 2 (v0.13.4.0)#640
garrytan merged 11 commits intomainfrom
garrytan/security-audit-round2

Conversation

@garrytan
Copy link
Copy Markdown
Owner

Summary

Security Hardening (Round 2)

  • Content trust boundary markers: all browse commands returning page content (text, html, links, forms, accessibility, console, dialog, snapshot, diff, resume, watch stop) now wrap output in --- BEGIN/END UNTRUSTED EXTERNAL CONTENT --- markers
  • Trust boundary escape prevention: URLs sanitized (newlines removed, length-capped), marker strings escaped in content so malicious pages can't forge the END marker
  • Extension sender validation: Chrome extension rejects messages from unknown senders and enforces a message type allowlist
  • CDP localhost-only binding: bin/chrome-cdp passes --remote-debugging-address=127.0.0.1 and --remote-allow-origins
  • Checksum-verified bun install: bootstrap downloads to temp file and verifies SHA-256 before executing

Cleanup

  • Removed Factory Droid support (--host factory, .factory/ skills, Factory CI checks)

Test Coverage

All new code paths have test coverage (100%). Tests: test/audit-compliance.test.ts verifies all 4 security fixes + browse/test/commands.test.ts tests trust boundary wrapping in chain commands.

Pre-Landing Review

No issues found.

Adversarial Review

Large-tier (3 passes: Claude structured, Claude adversarial, Codex structured). 5 FIXABLE findings addressed:

  • URL injection in trust markers (newline removal + length cap)
  • Content can forge END marker (zero-width space escaping)
  • resume command returned unwrapped snapshot
  • diff command returned unwrapped page content
  • watch stop returned unwrapped last snapshot

TODOS

No TODO items completed in this PR. ML Prompt Injection Classifier remains P0 for next PR.

Test plan

  • All audit compliance tests pass (232 tests, 0 failures)
  • Version matches across VERSION and package.json

🤖 Generated with Claude Code

garrytan and others added 7 commits March 28, 2026 23:14
Restrict Chrome CDP to localhost by adding --remote-debugging-address=127.0.0.1
and --remote-allow-origins to prevent network-accessible debugging sessions.

Clears 1 Socket anomaly (Chrome CDP session exposure).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add sender.id check and ALLOWED_TYPES allowlist to the Chrome extension's
message handler. Defense-in-depth against message spoofing from external
extensions or future externally_connectable changes.

Clears 2 Socket anomalies (extension permissions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace unverified curl|bash bun installation with checksum-verified
download-then-execute pattern. The install script is downloaded, sha256
verified against a known hash, then executed. Preserves the Bun-native
install path without adding a Node/npm dependency.

Clears Snyk W012 + 3 Socket anomalies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wrap page-content commands (text, html, links, forms, accessibility,
console, dialog, snapshot) with --- BEGIN/END UNTRUSTED EXTERNAL CONTENT ---
markers. Covers direct commands (server.ts), chain sub-commands, and
snapshot output (meta-commands.ts).

Adds PAGE_CONTENT_COMMANDS set and wrapUntrustedContent() helper in
commands.ts (single source of truth, DRY). Expands the SKILL.md trust
warning with explicit processing rules for agents.

Clears Snyk W011 (third-party content exposure).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Sanitize URLs in markers (remove newlines, cap at 200 chars) to prevent
  marker injection via history.pushState
- Escape marker strings in content (zero-width space) so malicious pages
  can't forge the END marker to break out of the untrusted block
- Wrap resume command snapshot with trust boundary markers
- Wrap diff command output with trust boundary markers
- Wrap watch stop last snapshot with trust boundary markers

Found by cross-model adversarial review (Claude + Codex).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Main landed v0.13.5.0 (Factory Droid support) while this branch removes
Factory Droid. Bumped to v0.13.6.0, preserved both 0.13.5.0 and 0.13.4.0
CHANGELOG entries from main history.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 29, 2026

E2E Evals: ✅ PASS

37/37 tests passed | $3.79 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 6/6 $0.3
e2e-deploy 5/5 $0.95
e2e-design 2/2 $0.33
e2e-plan 2/2 $0.18
e2e-qa-workflow 3/3 $0.82
e2e-review 1/1 $0.09
llm-judge 15/15 $0.3
e2e-qa-workflow 3/3 $0.82

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

garrytan and others added 4 commits March 29, 2026 15:42
Factory Droid support was removed in this branch. The .factory/ directory
was re-added by merging main (which had v0.13.5.0 Factory support).
Gitignore it so it stays out.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Main landed v0.13.5.1 (gitignore .factory) while this branch had v0.13.6.0.
Kept v0.13.6.0 (higher), added 0.13.5.1 CHANGELOG entry from main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Main landed v0.13.6.0 (GStack Learns) using the same version number as
this branch. Bumped to v0.13.7.0, kept both CHANGELOG entries in order.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Main landed v0.13.7.0 (Community Wave) using the same version as this
branch. Bumped to v0.13.8.0, kept both CHANGELOG entries in order.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit 3cda8de into main Mar 30, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant