test(agents): characterize bash/read/write/edit/fetch_url + Horton composition by msfstef · Pull Request #4354 · electric-sql/electric

msfstef · 2026-05-19T12:37:40Z

Summary

Characterization tests (Feathers' term) for the runtime tool surface targeted by the security cleanup tracked in plans/sandboxing-investigation.md. These assert what the code does today, including the broken parts, so each follow-up fix lands as a one-line expectation change rather than as new test infrastructure.

This is PR 0 of a five-PR series. It is intentionally inert: no production code changes, no new test scaffolding, just additional vitest cases reusing the existing per-tool test idioms.

What's captured

bash (bash-tool.test.ts) — parent PATH and ANTHROPIC_API_KEY pass through to the spawned child (env scrubbing missing at bash.ts:23).
read / write / edit (tool-path-symlink.test.ts) — .. escape rejected by the prefix check, but a symlink inside cwd pointing outside is followed transparently because no realpath is called.
fetch_url (fetch-url-ssrf.test.ts) — loopback, RFC1918, and 169.254.169.254 are all fetchable; redirect: 'follow' is unconditional. Uses a globalThis.fetch spy + injected extractWithLLM identity — no network, no flakes.
Horton composition (horton-tool-composition.test.ts) — invokes the handler with a mocked useAgent (same pattern as horton-model-selection.test.ts) and asserts the built-in toolset is present and ...mcp.tools() is unconditionally appended with no allowlist (horton.ts:396).

Scoping notes

Plain vitest tests inside the existing per-package test/ directories — not the conformance-test harness, which models server protocol rather than per-useAgent tool composition.
No production code changes. No new test infrastructure.
All currently-broken behaviors are asserted as-is; comments at the top of each file flag what the upcoming PR will change.

Test plan

pnpm test passes in packages/agents-runtime (442/442)
pnpm test passes in packages/agents (67/67)
pnpm typecheck clean in both packages
pnpm stylecheck clean in both packages

🤖 Generated with Claude Code

codecov · 2026-05-19T12:49:10Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.83%. Comparing base (ee3ef0f) to head (083d429).
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4354      +/-   ##
==========================================
- Coverage   60.43%   55.83%   -4.60%     
==========================================
  Files         293      245      -48     
  Lines       28089    24847    -3242     
  Branches     7448     6878     -570     
==========================================
- Hits        16976    13874    -3102     
+ Misses      11096    10959     -137     
+ Partials       17       14       -3

Flag	Coverage Δ
electric-telemetry	`?`
elixir	`?`
packages/agents	`70.92% <ø> (ø)`
packages/agents-mcp	`?`
packages/agents-runtime	`81.27% <ø> (+0.53%)`	⬆️
packages/agents-server	`73.89% <ø> (-0.04%)`	⬇️
packages/agents-server-ui	`6.66% <ø> (ø)`
packages/electric-ax	`42.61% <ø> (?)`
packages/experimental	`?`
packages/react-hooks	`?`
packages/start	`?`
packages/typescript-client	`?`
packages/y-electric	`?`
typescript	`55.83% <ø> (-4.39%)`	⬇️
unit-tests	`55.83% <ø> (-4.60%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…orton composition Captures today's behavior of the runtime tool surface as a regression baseline for the security cleanup tracked in plans/sandboxing-investigation.md. These are characterization tests (Feathers' term) — they assert what the code does now, including the known-broken parts, so each upcoming fix shows up as a one-line expectation change rather than a new test. What's captured: - bash: PATH and ANTHROPIC_API_KEY pass through to the spawned child (env scrubbing missing — bash.ts:23). - read/write/edit: '..' escape rejected by the path-prefix check, but symlinks inside cwd pointing outside are followed because there is no realpath resolution. - fetch_url: the factory accepts no host/allowlist option; loopback, RFC1918, and cloud-metadata IPs are all fetchable. - Horton: `assistantHandler` unconditionally appends `...mcp.tools()` with no allowlist (horton.ts:396), in addition to the documented built-in toolset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…scape The pre-existing cwd guard in `read-file.ts`, `write.ts`, and `edit.ts` was a string prefix check on the resolved-but-not-realpathed path. A symlink inside the working directory pointing outside was followed transparently — CVE-2025-53109/53110 class bypass. The characterization tests added in #4354 documented this behavior; this PR flips them to asserting the bypass is now blocked, and adds positive tests for intra-workspace symlinks (pnpm `.pnpm` trees, etc.) which keep working. For non-existent write/edit targets, `realpath` is called on the deepest existing ancestor and containment is checked from there — the file's final name has no bearing on the parent's real path. Known gap, documented in `path-guard.ts`: hardlinks across the cwd boundary share the inode but report a path inside cwd, so the realpath check passes them. A proper fix requires a sandbox; see plans/sandboxing-investigation.md §3.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…IPs) The `fetch_url` tool previously had no host policy: literal RFC1918 addresses, loopback, and the cloud-metadata IP (169.254.169.254) were all fetchable, and the PR 0 characterization tests in #4354 documented that surface. This PR adds a default-deny check before the `fetch()` call. For literal-IP URLs the check is direct. For hostnames it runs `dns.lookup({ all: true })` and rejects if any resolved address is private — the "rebinding-style" mitigation in the design doc; not a full fix, see §3.6. New option: `createFetchUrlTool({ allowedHosts })`. Hostnames in the list bypass the check (case-insensitive literal match on the URL's hostname — no DNS, no IP comparison). This is the explicit opt-in for desktop dev flows like `localhost`. Pass `['localhost', '127.0.0.1']` in the desktop wiring after this lands; that change is not in this PR. Known gap, documented in source: DNS rebinding can return a different IP between the guard's lookup and the socket connect. A proper fix requires a custom undici dispatcher that pins the resolved address; out of scope here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously `bash.ts` passed `env: { ...process.env }` wholesale to `child_process.exec`, so any LLM-driven `echo \$ANTHROPIC_API_KEY` (or other credential-shaped variable) would leak straight back into the agent loop. The PR 0 characterization tests in #4354 documented this leak; this PR replaces the wholesale passthrough with a filtered subset. Default allowlist keeps git, npm, pnpm, and most CLI tools functional: PATH, HOME, USER, SHELL, LANG, LC_ALL, LC_CTYPE, TERM, TMPDIR, TMP, TEMP, XDG_CONFIG_HOME, XDG_CACHE_HOME, XDG_DATA_HOME. New option `createBashTool(cwd, { allowedEnvKeys })` extends the default list (it does not replace it). Customers can add `GITHUB_TOKEN` etc. on purpose, but cannot accidentally shrink the safe defaults to a narrower set. The extend-vs-replace decision is intentional: silently dropping something safe like PATH would break tooling. BREAKING: any code that relied on env-supplied credentials reaching bash via process.env (Anthropic, OpenAI, GitHub, Brave, custom secrets) silently stops working. This is the intended behavior — those keys should not be reachable from LLM-controlled commands — but it requires action. One-line mitigation: createBashTool(cwd, { allowedEnvKeys: ['GITHUB_TOKEN'] }) Description string also stops claiming a sandbox (same change as #4355; ensures this PR is self-contained if #4355 hasn't landed yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…scape The pre-existing cwd guard in `read-file.ts`, `write.ts`, and `edit.ts` was a string prefix check on the resolved-but-not-realpathed path. A symlink inside the working directory pointing outside was followed transparently — CVE-2025-53109/53110 class bypass. The characterization tests added in #4354 documented this behavior; this PR flips them to asserting the bypass is now blocked, and adds positive tests for intra-workspace symlinks (pnpm `.pnpm` trees, etc.) which keep working. For non-existent write/edit targets, `realpath` is called on the deepest existing ancestor and containment is checked from there — the file's final name has no bearing on the parent's real path. Known gap, documented in `path-guard.ts`: hardlinks across the cwd boundary share the inode but report a path inside cwd, so the realpath check passes them. A proper fix requires a sandbox; see plans/sandboxing-investigation.md §3.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously `bash.ts` passed `env: { ...process.env }` wholesale to `child_process.exec`, so any LLM-driven `echo \$ANTHROPIC_API_KEY` (or other credential-shaped variable) would leak straight back into the agent loop. The PR 0 characterization tests in #4354 documented this leak; this PR replaces the wholesale passthrough with a filtered subset. Default allowlist keeps git, npm, pnpm, and most CLI tools functional: PATH, HOME, USER, SHELL, LANG, LC_ALL, LC_CTYPE, TERM, TMPDIR, TMP, TEMP, XDG_CONFIG_HOME, XDG_CACHE_HOME, XDG_DATA_HOME. New option `createBashTool(cwd, { allowedEnvKeys })` extends the default list (it does not replace it). Customers can add `GITHUB_TOKEN` etc. on purpose, but cannot accidentally shrink the safe defaults to a narrower set. The extend-vs-replace decision is intentional: silently dropping something safe like PATH would break tooling. BREAKING: any code that relied on env-supplied credentials reaching bash via process.env (Anthropic, OpenAI, GitHub, Brave, custom secrets) silently stops working. This is the intended behavior — those keys should not be reachable from LLM-controlled commands — but it requires action. One-line mitigation: createBashTool(cwd, { allowedEnvKeys: ['GITHUB_TOKEN'] }) Description string also stops claiming a sandbox (same change as #4355; ensures this PR is self-contained if #4355 hasn't landed yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…scape The pre-existing cwd guard in `read-file.ts`, `write.ts`, and `edit.ts` was a string prefix check on the resolved-but-not-realpathed path. A symlink inside the working directory pointing outside was followed transparently — CVE-2025-53109/53110 class bypass. The characterization tests added in #4354 documented this behavior; this PR flips them to asserting the bypass is now blocked, and adds positive tests for intra-workspace symlinks (pnpm `.pnpm` trees, etc.) which keep working. For non-existent write/edit targets, `realpath` is called on the deepest existing ancestor and containment is checked from there — the file's final name has no bearing on the parent's real path. Known gap, documented in `path-guard.ts`: hardlinks across the cwd boundary share the inode but report a path inside cwd, so the realpath check passes them. A proper fix requires a sandbox; see plans/sandboxing-investigation.md §3.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously `bash.ts` passed `env: { ...process.env }` wholesale to `child_process.exec`, so any LLM-driven `echo \$ANTHROPIC_API_KEY` (or other credential-shaped variable) would leak straight back into the agent loop. The PR 0 characterization tests in #4354 documented this leak; this PR replaces the wholesale passthrough with a filtered subset. Default allowlist keeps git, npm, pnpm, and most CLI tools functional: PATH, HOME, USER, SHELL, LANG, LC_ALL, LC_CTYPE, TERM, TMPDIR, TMP, TEMP, XDG_CONFIG_HOME, XDG_CACHE_HOME, XDG_DATA_HOME. New option `createBashTool(cwd, { allowedEnvKeys })` extends the default list (it does not replace it). Customers can add `GITHUB_TOKEN` etc. on purpose, but cannot accidentally shrink the safe defaults to a narrower set. The extend-vs-replace decision is intentional: silently dropping something safe like PATH would break tooling. BREAKING: any code that relied on env-supplied credentials reaching bash via process.env (Anthropic, OpenAI, GitHub, Brave, custom secrets) silently stops working. This is the intended behavior — those keys should not be reachable from LLM-controlled commands — but it requires action. One-line mitigation: createBashTool(cwd, { allowedEnvKeys: ['GITHUB_TOKEN'] }) Description string also stops claiming a sandbox (same change as #4355; ensures this PR is self-contained if #4355 hasn't landed yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@ts-expect-error

Three integration adjustments after rebasing on origin/main: - Delete packages/agents-runtime/test/tool-path-symlink.test.ts. This was a characterization test from #4354 that documents pre-fix symlink-escape behavior with an explicit "update when realpath resolution lands" note. PR 6a's resolveSafePath helper is that fix; the file's expectations are now contradicted by sandbox-tool-symlink-safety.test.ts. - Trim packages/agents-runtime/test/bash-tool.test.ts: the two characterization tests from #4354 that documented the bash env-leak bug are removed. PR 6a fixed that bug; sandbox-tool-refactor.test.ts has the corresponding assertion ('does not forward arbitrary process.env to children'). The first test in the file (cwd + HOME exposure) stays. - Migrate packages/agents-runtime/test/fetch-url-ssrf.test.ts to the new createFetchUrlTool(sandbox, opts) signature. The assertions still hold for unrestrictedSandbox (NetPolicy SSRF protection is deferred); the test is now explicit about that scope. - Remove the @ts-expect-error directive on the dynamic e2b import in src/sandbox/remote/e2b.ts. With e2b now in the lockfile, TS resolves the package and the directive is unused. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

msfstef mentioned this pull request May 19, 2026

fix(agents-runtime): bash tool description no longer claims a sandbox #4355

Merged

3 tasks

msfstef marked this pull request as ready for review May 19, 2026 12:41

msfstef marked this pull request as draft May 19, 2026 12:45

msfstef requested review from icehaunter and kevin-dp May 19, 2026 12:49

icehaunter approved these changes May 19, 2026

View reviewed changes

msfstef marked this pull request as ready for review May 19, 2026 12:59

msfstef force-pushed the msfstef/desktop-golden-test-agents branch from ca40e43 to 083d429 Compare May 19, 2026 13:00

msfstef merged commit 20fc6c1 into main May 19, 2026
26 of 27 checks passed

msfstef deleted the msfstef/desktop-golden-test-agents branch May 19, 2026 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(agents): characterize bash/read/write/edit/fetch_url + Horton composition#4354

test(agents): characterize bash/read/write/edit/fetch_url + Horton composition#4354
msfstef merged 1 commit into
mainfrom
msfstef/desktop-golden-test-agents

msfstef commented May 19, 2026

Uh oh!

codecov Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

msfstef commented May 19, 2026

Summary

What's captured

Scoping notes

Test plan

Uh oh!

codecov Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 19, 2026 •

edited

Loading