test(agents): characterize bash/read/write/edit/fetch_url + Horton composition#4354
Merged
Conversation
3 tasks
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4354 +/- ##
==========================================
- Coverage 60.43% 55.83% -4.60%
==========================================
Files 293 245 -48
Lines 28089 24847 -3242
Branches 7448 6878 -570
==========================================
- Hits 16976 13874 -3102
+ Misses 11096 10959 -137
+ Partials 17 14 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…orton composition Captures today's behavior of the runtime tool surface as a regression baseline for the security cleanup tracked in plans/sandboxing-investigation.md. These are characterization tests (Feathers' term) — they assert what the code does now, including the known-broken parts, so each upcoming fix shows up as a one-line expectation change rather than a new test. What's captured: - bash: PATH and ANTHROPIC_API_KEY pass through to the spawned child (env scrubbing missing — bash.ts:23). - read/write/edit: '..' escape rejected by the path-prefix check, but symlinks inside cwd pointing outside are followed because there is no realpath resolution. - fetch_url: the factory accepts no host/allowlist option; loopback, RFC1918, and cloud-metadata IPs are all fetchable. - Horton: `assistantHandler` unconditionally appends `...mcp.tools()` with no allowlist (horton.ts:396), in addition to the documented built-in toolset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
icehaunter
approved these changes
May 19, 2026
ca40e43 to
083d429
Compare
This was referenced May 19, 2026
msfstef
added a commit
that referenced
this pull request
May 19, 2026
…scape The pre-existing cwd guard in `read-file.ts`, `write.ts`, and `edit.ts` was a string prefix check on the resolved-but-not-realpathed path. A symlink inside the working directory pointing outside was followed transparently — CVE-2025-53109/53110 class bypass. The characterization tests added in #4354 documented this behavior; this PR flips them to asserting the bypass is now blocked, and adds positive tests for intra-workspace symlinks (pnpm `.pnpm` trees, etc.) which keep working. For non-existent write/edit targets, `realpath` is called on the deepest existing ancestor and containment is checked from there — the file's final name has no bearing on the parent's real path. Known gap, documented in `path-guard.ts`: hardlinks across the cwd boundary share the inode but report a path inside cwd, so the realpath check passes them. A proper fix requires a sandbox; see plans/sandboxing-investigation.md §3.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef
added a commit
that referenced
this pull request
May 19, 2026
…IPs) The `fetch_url` tool previously had no host policy: literal RFC1918 addresses, loopback, and the cloud-metadata IP (169.254.169.254) were all fetchable, and the PR 0 characterization tests in #4354 documented that surface. This PR adds a default-deny check before the `fetch()` call. For literal-IP URLs the check is direct. For hostnames it runs `dns.lookup({ all: true })` and rejects if any resolved address is private — the "rebinding-style" mitigation in the design doc; not a full fix, see §3.6. New option: `createFetchUrlTool({ allowedHosts })`. Hostnames in the list bypass the check (case-insensitive literal match on the URL's hostname — no DNS, no IP comparison). This is the explicit opt-in for desktop dev flows like `localhost`. Pass `['localhost', '127.0.0.1']` in the desktop wiring after this lands; that change is not in this PR. Known gap, documented in source: DNS rebinding can return a different IP between the guard's lookup and the socket connect. A proper fix requires a custom undici dispatcher that pins the resolved address; out of scope here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef
added a commit
that referenced
this pull request
May 19, 2026
Previously `bash.ts` passed `env: { ...process.env }` wholesale to
`child_process.exec`, so any LLM-driven `echo \$ANTHROPIC_API_KEY` (or
other credential-shaped variable) would leak straight back into the
agent loop. The PR 0 characterization tests in #4354 documented this
leak; this PR replaces the wholesale passthrough with a filtered subset.
Default allowlist keeps git, npm, pnpm, and most CLI tools functional:
PATH, HOME, USER, SHELL, LANG, LC_ALL, LC_CTYPE, TERM, TMPDIR, TMP,
TEMP, XDG_CONFIG_HOME, XDG_CACHE_HOME, XDG_DATA_HOME.
New option `createBashTool(cwd, { allowedEnvKeys })` extends the default
list (it does not replace it). Customers can add `GITHUB_TOKEN` etc. on
purpose, but cannot accidentally shrink the safe defaults to a narrower
set. The extend-vs-replace decision is intentional: silently dropping
something safe like PATH would break tooling.
BREAKING: any code that relied on env-supplied credentials reaching
bash via process.env (Anthropic, OpenAI, GitHub, Brave, custom secrets)
silently stops working. This is the intended behavior — those keys
should not be reachable from LLM-controlled commands — but it requires
action. One-line mitigation:
createBashTool(cwd, { allowedEnvKeys: ['GITHUB_TOKEN'] })
Description string also stops claiming a sandbox (same change as #4355;
ensures this PR is self-contained if #4355 hasn't landed yet).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef
added a commit
that referenced
this pull request
May 19, 2026
…scape The pre-existing cwd guard in `read-file.ts`, `write.ts`, and `edit.ts` was a string prefix check on the resolved-but-not-realpathed path. A symlink inside the working directory pointing outside was followed transparently — CVE-2025-53109/53110 class bypass. The characterization tests added in #4354 documented this behavior; this PR flips them to asserting the bypass is now blocked, and adds positive tests for intra-workspace symlinks (pnpm `.pnpm` trees, etc.) which keep working. For non-existent write/edit targets, `realpath` is called on the deepest existing ancestor and containment is checked from there — the file's final name has no bearing on the parent's real path. Known gap, documented in `path-guard.ts`: hardlinks across the cwd boundary share the inode but report a path inside cwd, so the realpath check passes them. A proper fix requires a sandbox; see plans/sandboxing-investigation.md §3.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef
added a commit
that referenced
this pull request
May 19, 2026
Previously `bash.ts` passed `env: { ...process.env }` wholesale to
`child_process.exec`, so any LLM-driven `echo \$ANTHROPIC_API_KEY` (or
other credential-shaped variable) would leak straight back into the
agent loop. The PR 0 characterization tests in #4354 documented this
leak; this PR replaces the wholesale passthrough with a filtered subset.
Default allowlist keeps git, npm, pnpm, and most CLI tools functional:
PATH, HOME, USER, SHELL, LANG, LC_ALL, LC_CTYPE, TERM, TMPDIR, TMP,
TEMP, XDG_CONFIG_HOME, XDG_CACHE_HOME, XDG_DATA_HOME.
New option `createBashTool(cwd, { allowedEnvKeys })` extends the default
list (it does not replace it). Customers can add `GITHUB_TOKEN` etc. on
purpose, but cannot accidentally shrink the safe defaults to a narrower
set. The extend-vs-replace decision is intentional: silently dropping
something safe like PATH would break tooling.
BREAKING: any code that relied on env-supplied credentials reaching
bash via process.env (Anthropic, OpenAI, GitHub, Brave, custom secrets)
silently stops working. This is the intended behavior — those keys
should not be reachable from LLM-controlled commands — but it requires
action. One-line mitigation:
createBashTool(cwd, { allowedEnvKeys: ['GITHUB_TOKEN'] })
Description string also stops claiming a sandbox (same change as #4355;
ensures this PR is self-contained if #4355 hasn't landed yet).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef
added a commit
that referenced
this pull request
May 19, 2026
…scape The pre-existing cwd guard in `read-file.ts`, `write.ts`, and `edit.ts` was a string prefix check on the resolved-but-not-realpathed path. A symlink inside the working directory pointing outside was followed transparently — CVE-2025-53109/53110 class bypass. The characterization tests added in #4354 documented this behavior; this PR flips them to asserting the bypass is now blocked, and adds positive tests for intra-workspace symlinks (pnpm `.pnpm` trees, etc.) which keep working. For non-existent write/edit targets, `realpath` is called on the deepest existing ancestor and containment is checked from there — the file's final name has no bearing on the parent's real path. Known gap, documented in `path-guard.ts`: hardlinks across the cwd boundary share the inode but report a path inside cwd, so the realpath check passes them. A proper fix requires a sandbox; see plans/sandboxing-investigation.md §3.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef
added a commit
that referenced
this pull request
May 19, 2026
Previously `bash.ts` passed `env: { ...process.env }` wholesale to
`child_process.exec`, so any LLM-driven `echo \$ANTHROPIC_API_KEY` (or
other credential-shaped variable) would leak straight back into the
agent loop. The PR 0 characterization tests in #4354 documented this
leak; this PR replaces the wholesale passthrough with a filtered subset.
Default allowlist keeps git, npm, pnpm, and most CLI tools functional:
PATH, HOME, USER, SHELL, LANG, LC_ALL, LC_CTYPE, TERM, TMPDIR, TMP,
TEMP, XDG_CONFIG_HOME, XDG_CACHE_HOME, XDG_DATA_HOME.
New option `createBashTool(cwd, { allowedEnvKeys })` extends the default
list (it does not replace it). Customers can add `GITHUB_TOKEN` etc. on
purpose, but cannot accidentally shrink the safe defaults to a narrower
set. The extend-vs-replace decision is intentional: silently dropping
something safe like PATH would break tooling.
BREAKING: any code that relied on env-supplied credentials reaching
bash via process.env (Anthropic, OpenAI, GitHub, Brave, custom secrets)
silently stops working. This is the intended behavior — those keys
should not be reachable from LLM-controlled commands — but it requires
action. One-line mitigation:
createBashTool(cwd, { allowedEnvKeys: ['GITHUB_TOKEN'] })
Description string also stops claiming a sandbox (same change as #4355;
ensures this PR is self-contained if #4355 hasn't landed yet).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef
added a commit
that referenced
this pull request
May 20, 2026
Three integration adjustments after rebasing on origin/main: - Delete packages/agents-runtime/test/tool-path-symlink.test.ts. This was a characterization test from #4354 that documents pre-fix symlink-escape behavior with an explicit "update when realpath resolution lands" note. PR 6a's resolveSafePath helper is that fix; the file's expectations are now contradicted by sandbox-tool-symlink-safety.test.ts. - Trim packages/agents-runtime/test/bash-tool.test.ts: the two characterization tests from #4354 that documented the bash env-leak bug are removed. PR 6a fixed that bug; sandbox-tool-refactor.test.ts has the corresponding assertion ('does not forward arbitrary process.env to children'). The first test in the file (cwd + HOME exposure) stays. - Migrate packages/agents-runtime/test/fetch-url-ssrf.test.ts to the new createFetchUrlTool(sandbox, opts) signature. The assertions still hold for unrestrictedSandbox (NetPolicy SSRF protection is deferred); the test is now explicit about that scope. - Remove the @ts-expect-error directive on the dynamic e2b import in src/sandbox/remote/e2b.ts. With e2b now in the lockfile, TS resolves the package and the directive is unused. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Characterization tests (Feathers' term) for the runtime tool surface targeted by the security cleanup tracked in
plans/sandboxing-investigation.md. These assert what the code does today, including the broken parts, so each follow-up fix lands as a one-line expectation change rather than as new test infrastructure.This is PR 0 of a five-PR series. It is intentionally inert: no production code changes, no new test scaffolding, just additional vitest cases reusing the existing per-tool test idioms.
What's captured
bash-tool.test.ts) — parentPATHandANTHROPIC_API_KEYpass through to the spawned child (env scrubbing missing atbash.ts:23).tool-path-symlink.test.ts) —..escape rejected by the prefix check, but a symlink inside cwd pointing outside is followed transparently because norealpathis called.fetch-url-ssrf.test.ts) — loopback, RFC1918, and169.254.169.254are all fetchable;redirect: 'follow'is unconditional. Uses aglobalThis.fetchspy + injectedextractWithLLMidentity — no network, no flakes.horton-tool-composition.test.ts) — invokes the handler with a mockeduseAgent(same pattern ashorton-model-selection.test.ts) and asserts the built-in toolset is present and...mcp.tools()is unconditionally appended with no allowlist (horton.ts:396).Scoping notes
test/directories — not the conformance-test harness, which models server protocol rather than per-useAgenttool composition.Test plan
pnpm testpasses inpackages/agents-runtime(442/442)pnpm testpasses inpackages/agents(67/67)pnpm typecheckclean in both packagespnpm stylecheckclean in both packages🤖 Generated with Claude Code