Skip to content

test(agents): characterize bash/read/write/edit/fetch_url + Horton composition#4354

Merged
msfstef merged 1 commit into
mainfrom
msfstef/desktop-golden-test-agents
May 19, 2026
Merged

test(agents): characterize bash/read/write/edit/fetch_url + Horton composition#4354
msfstef merged 1 commit into
mainfrom
msfstef/desktop-golden-test-agents

Conversation

@msfstef
Copy link
Copy Markdown
Contributor

@msfstef msfstef commented May 19, 2026

Summary

Characterization tests (Feathers' term) for the runtime tool surface targeted by the security cleanup tracked in plans/sandboxing-investigation.md. These assert what the code does today, including the broken parts, so each follow-up fix lands as a one-line expectation change rather than as new test infrastructure.

This is PR 0 of a five-PR series. It is intentionally inert: no production code changes, no new test scaffolding, just additional vitest cases reusing the existing per-tool test idioms.

What's captured

  • bash (bash-tool.test.ts) — parent PATH and ANTHROPIC_API_KEY pass through to the spawned child (env scrubbing missing at bash.ts:23).
  • read / write / edit (tool-path-symlink.test.ts) — .. escape rejected by the prefix check, but a symlink inside cwd pointing outside is followed transparently because no realpath is called.
  • fetch_url (fetch-url-ssrf.test.ts) — loopback, RFC1918, and 169.254.169.254 are all fetchable; redirect: 'follow' is unconditional. Uses a globalThis.fetch spy + injected extractWithLLM identity — no network, no flakes.
  • Horton composition (horton-tool-composition.test.ts) — invokes the handler with a mocked useAgent (same pattern as horton-model-selection.test.ts) and asserts the built-in toolset is present and ...mcp.tools() is unconditionally appended with no allowlist (horton.ts:396).

Scoping notes

  • Plain vitest tests inside the existing per-package test/ directories — not the conformance-test harness, which models server protocol rather than per-useAgent tool composition.
  • No production code changes. No new test infrastructure.
  • All currently-broken behaviors are asserted as-is; comments at the top of each file flag what the upcoming PR will change.

Test plan

  • pnpm test passes in packages/agents-runtime (442/442)
  • pnpm test passes in packages/agents (67/67)
  • pnpm typecheck clean in both packages
  • pnpm stylecheck clean in both packages

🤖 Generated with Claude Code

@msfstef msfstef marked this pull request as ready for review May 19, 2026 12:41
@msfstef msfstef marked this pull request as draft May 19, 2026 12:45
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.83%. Comparing base (ee3ef0f) to head (083d429).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4354      +/-   ##
==========================================
- Coverage   60.43%   55.83%   -4.60%     
==========================================
  Files         293      245      -48     
  Lines       28089    24847    -3242     
  Branches     7448     6878     -570     
==========================================
- Hits        16976    13874    -3102     
+ Misses      11096    10959     -137     
+ Partials       17       14       -3     
Flag Coverage Δ
electric-telemetry ?
elixir ?
packages/agents 70.92% <ø> (ø)
packages/agents-mcp ?
packages/agents-runtime 81.27% <ø> (+0.53%) ⬆️
packages/agents-server 73.89% <ø> (-0.04%) ⬇️
packages/agents-server-ui 6.66% <ø> (ø)
packages/electric-ax 42.61% <ø> (?)
packages/experimental ?
packages/react-hooks ?
packages/start ?
packages/typescript-client ?
packages/y-electric ?
typescript 55.83% <ø> (-4.39%) ⬇️
unit-tests 55.83% <ø> (-4.60%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@msfstef msfstef requested review from icehaunter and kevin-dp May 19, 2026 12:49
…orton composition

Captures today's behavior of the runtime tool surface as a regression
baseline for the security cleanup tracked in
plans/sandboxing-investigation.md. These are characterization tests
(Feathers' term) — they assert what the code does now, including the
known-broken parts, so each upcoming fix shows up as a one-line
expectation change rather than a new test.

What's captured:
- bash: PATH and ANTHROPIC_API_KEY pass through to the spawned child
  (env scrubbing missing — bash.ts:23).
- read/write/edit: '..' escape rejected by the path-prefix check, but
  symlinks inside cwd pointing outside are followed because there is no
  realpath resolution.
- fetch_url: the factory accepts no host/allowlist option; loopback,
  RFC1918, and cloud-metadata IPs are all fetchable.
- Horton: `assistantHandler` unconditionally appends `...mcp.tools()`
  with no allowlist (horton.ts:396), in addition to the documented
  built-in toolset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@msfstef msfstef marked this pull request as ready for review May 19, 2026 12:59
@msfstef msfstef force-pushed the msfstef/desktop-golden-test-agents branch from ca40e43 to 083d429 Compare May 19, 2026 13:00
@msfstef msfstef merged commit 20fc6c1 into main May 19, 2026
26 of 27 checks passed
@msfstef msfstef deleted the msfstef/desktop-golden-test-agents branch May 19, 2026 13:34
msfstef added a commit that referenced this pull request May 19, 2026
…scape

The pre-existing cwd guard in `read-file.ts`, `write.ts`, and `edit.ts`
was a string prefix check on the resolved-but-not-realpathed path. A
symlink inside the working directory pointing outside was followed
transparently — CVE-2025-53109/53110 class bypass. The characterization
tests added in #4354 documented this behavior; this PR flips them to
asserting the bypass is now blocked, and adds positive tests for
intra-workspace symlinks (pnpm `.pnpm` trees, etc.) which keep working.

For non-existent write/edit targets, `realpath` is called on the deepest
existing ancestor and containment is checked from there — the file's
final name has no bearing on the parent's real path.

Known gap, documented in `path-guard.ts`: hardlinks across the cwd
boundary share the inode but report a path inside cwd, so the realpath
check passes them. A proper fix requires a sandbox; see
plans/sandboxing-investigation.md §3.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef added a commit that referenced this pull request May 19, 2026
…IPs)

The `fetch_url` tool previously had no host policy: literal RFC1918
addresses, loopback, and the cloud-metadata IP (169.254.169.254) were
all fetchable, and the PR 0 characterization tests in #4354 documented
that surface. This PR adds a default-deny check before the `fetch()`
call.

For literal-IP URLs the check is direct. For hostnames it runs
`dns.lookup({ all: true })` and rejects if any resolved address is
private — the "rebinding-style" mitigation in the design doc; not a
full fix, see §3.6.

New option: `createFetchUrlTool({ allowedHosts })`. Hostnames in the
list bypass the check (case-insensitive literal match on the URL's
hostname — no DNS, no IP comparison). This is the explicit opt-in for
desktop dev flows like `localhost`. Pass `['localhost', '127.0.0.1']`
in the desktop wiring after this lands; that change is not in this PR.

Known gap, documented in source: DNS rebinding can return a different
IP between the guard's lookup and the socket connect. A proper fix
requires a custom undici dispatcher that pins the resolved address;
out of scope here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef added a commit that referenced this pull request May 19, 2026
Previously `bash.ts` passed `env: { ...process.env }` wholesale to
`child_process.exec`, so any LLM-driven `echo \$ANTHROPIC_API_KEY` (or
other credential-shaped variable) would leak straight back into the
agent loop. The PR 0 characterization tests in #4354 documented this
leak; this PR replaces the wholesale passthrough with a filtered subset.

Default allowlist keeps git, npm, pnpm, and most CLI tools functional:
PATH, HOME, USER, SHELL, LANG, LC_ALL, LC_CTYPE, TERM, TMPDIR, TMP,
TEMP, XDG_CONFIG_HOME, XDG_CACHE_HOME, XDG_DATA_HOME.

New option `createBashTool(cwd, { allowedEnvKeys })` extends the default
list (it does not replace it). Customers can add `GITHUB_TOKEN` etc. on
purpose, but cannot accidentally shrink the safe defaults to a narrower
set. The extend-vs-replace decision is intentional: silently dropping
something safe like PATH would break tooling.

BREAKING: any code that relied on env-supplied credentials reaching
bash via process.env (Anthropic, OpenAI, GitHub, Brave, custom secrets)
silently stops working. This is the intended behavior — those keys
should not be reachable from LLM-controlled commands — but it requires
action. One-line mitigation:
  createBashTool(cwd, { allowedEnvKeys: ['GITHUB_TOKEN'] })

Description string also stops claiming a sandbox (same change as #4355;
ensures this PR is self-contained if #4355 hasn't landed yet).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef added a commit that referenced this pull request May 19, 2026
…scape

The pre-existing cwd guard in `read-file.ts`, `write.ts`, and `edit.ts`
was a string prefix check on the resolved-but-not-realpathed path. A
symlink inside the working directory pointing outside was followed
transparently — CVE-2025-53109/53110 class bypass. The characterization
tests added in #4354 documented this behavior; this PR flips them to
asserting the bypass is now blocked, and adds positive tests for
intra-workspace symlinks (pnpm `.pnpm` trees, etc.) which keep working.

For non-existent write/edit targets, `realpath` is called on the deepest
existing ancestor and containment is checked from there — the file's
final name has no bearing on the parent's real path.

Known gap, documented in `path-guard.ts`: hardlinks across the cwd
boundary share the inode but report a path inside cwd, so the realpath
check passes them. A proper fix requires a sandbox; see
plans/sandboxing-investigation.md §3.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef added a commit that referenced this pull request May 19, 2026
Previously `bash.ts` passed `env: { ...process.env }` wholesale to
`child_process.exec`, so any LLM-driven `echo \$ANTHROPIC_API_KEY` (or
other credential-shaped variable) would leak straight back into the
agent loop. The PR 0 characterization tests in #4354 documented this
leak; this PR replaces the wholesale passthrough with a filtered subset.

Default allowlist keeps git, npm, pnpm, and most CLI tools functional:
PATH, HOME, USER, SHELL, LANG, LC_ALL, LC_CTYPE, TERM, TMPDIR, TMP,
TEMP, XDG_CONFIG_HOME, XDG_CACHE_HOME, XDG_DATA_HOME.

New option `createBashTool(cwd, { allowedEnvKeys })` extends the default
list (it does not replace it). Customers can add `GITHUB_TOKEN` etc. on
purpose, but cannot accidentally shrink the safe defaults to a narrower
set. The extend-vs-replace decision is intentional: silently dropping
something safe like PATH would break tooling.

BREAKING: any code that relied on env-supplied credentials reaching
bash via process.env (Anthropic, OpenAI, GitHub, Brave, custom secrets)
silently stops working. This is the intended behavior — those keys
should not be reachable from LLM-controlled commands — but it requires
action. One-line mitigation:
  createBashTool(cwd, { allowedEnvKeys: ['GITHUB_TOKEN'] })

Description string also stops claiming a sandbox (same change as #4355;
ensures this PR is self-contained if #4355 hasn't landed yet).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef added a commit that referenced this pull request May 19, 2026
…scape

The pre-existing cwd guard in `read-file.ts`, `write.ts`, and `edit.ts`
was a string prefix check on the resolved-but-not-realpathed path. A
symlink inside the working directory pointing outside was followed
transparently — CVE-2025-53109/53110 class bypass. The characterization
tests added in #4354 documented this behavior; this PR flips them to
asserting the bypass is now blocked, and adds positive tests for
intra-workspace symlinks (pnpm `.pnpm` trees, etc.) which keep working.

For non-existent write/edit targets, `realpath` is called on the deepest
existing ancestor and containment is checked from there — the file's
final name has no bearing on the parent's real path.

Known gap, documented in `path-guard.ts`: hardlinks across the cwd
boundary share the inode but report a path inside cwd, so the realpath
check passes them. A proper fix requires a sandbox; see
plans/sandboxing-investigation.md §3.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef added a commit that referenced this pull request May 19, 2026
Previously `bash.ts` passed `env: { ...process.env }` wholesale to
`child_process.exec`, so any LLM-driven `echo \$ANTHROPIC_API_KEY` (or
other credential-shaped variable) would leak straight back into the
agent loop. The PR 0 characterization tests in #4354 documented this
leak; this PR replaces the wholesale passthrough with a filtered subset.

Default allowlist keeps git, npm, pnpm, and most CLI tools functional:
PATH, HOME, USER, SHELL, LANG, LC_ALL, LC_CTYPE, TERM, TMPDIR, TMP,
TEMP, XDG_CONFIG_HOME, XDG_CACHE_HOME, XDG_DATA_HOME.

New option `createBashTool(cwd, { allowedEnvKeys })` extends the default
list (it does not replace it). Customers can add `GITHUB_TOKEN` etc. on
purpose, but cannot accidentally shrink the safe defaults to a narrower
set. The extend-vs-replace decision is intentional: silently dropping
something safe like PATH would break tooling.

BREAKING: any code that relied on env-supplied credentials reaching
bash via process.env (Anthropic, OpenAI, GitHub, Brave, custom secrets)
silently stops working. This is the intended behavior — those keys
should not be reachable from LLM-controlled commands — but it requires
action. One-line mitigation:
  createBashTool(cwd, { allowedEnvKeys: ['GITHUB_TOKEN'] })

Description string also stops claiming a sandbox (same change as #4355;
ensures this PR is self-contained if #4355 hasn't landed yet).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
msfstef added a commit that referenced this pull request May 20, 2026
Three integration adjustments after rebasing on origin/main:

- Delete packages/agents-runtime/test/tool-path-symlink.test.ts. This was
  a characterization test from #4354 that documents pre-fix symlink-escape
  behavior with an explicit "update when realpath resolution lands" note.
  PR 6a's resolveSafePath helper is that fix; the file's expectations are
  now contradicted by sandbox-tool-symlink-safety.test.ts.

- Trim packages/agents-runtime/test/bash-tool.test.ts: the two
  characterization tests from #4354 that documented the bash env-leak bug
  are removed. PR 6a fixed that bug; sandbox-tool-refactor.test.ts has
  the corresponding assertion ('does not forward arbitrary process.env to
  children'). The first test in the file (cwd + HOME exposure) stays.

- Migrate packages/agents-runtime/test/fetch-url-ssrf.test.ts to the new
  createFetchUrlTool(sandbox, opts) signature. The assertions still hold
  for unrestrictedSandbox (NetPolicy SSRF protection is deferred); the
  test is now explicit about that scope.

- Remove the @ts-expect-error directive on the dynamic e2b import in
  src/sandbox/remote/e2b.ts. With e2b now in the lockfile, TS resolves
  the package and the directive is unused.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants