Skip to content

fix: harden dynamic tool handlers against deadlock, hangs, and runaway output#4

Merged
electronicBlacksmith merged 1 commit intomainfrom
fix/dynamic-handler-hardening
Apr 5, 2026
Merged

fix: harden dynamic tool handlers against deadlock, hangs, and runaway output#4
electronicBlacksmith merged 1 commit intomainfrom
fix/dynamic-handler-hardening

Conversation

@electronicBlacksmith
Copy link
Copy Markdown
Owner

Summary

Fixes four latent liveness and stability bugs on the MCP dynamic-tool execution path. Each one silently hung or crashed phantom agent turns with no surfaced error. Commit: `17f7ed0`.

  1. Pipe-buffer deadlock - `executeShellHandler` and `executeScriptHandler` drained stdout then stderr sequentially. Any handler writing >64KB to stderr before closing stdout (`curl -v`, `git clone`, `npm install`, verbose loggers) blocked on its next stderr write while phantom waited for stdout EOF forever. Fix: `Promise.all` over both streams via a new `readStreamWithCap` helper.
  2. No subprocess timeout - `Bun.spawn` ran with no kill path. A hung handler froze the agent turn indefinitely. Fix: schedules SIGTERM at `HANDLER_TIMEOUT_MS` (60s default, env-overridable) and escalates to SIGKILL after a 2s grace. Timeouts report partial stderr.
  3. No stdout/stderr size cap - `new Response(stream).text()` slurped unbounded output, risking OOM of the 2GB container. Fix: 1MB cap by default (`PHANTOM_DYNAMIC_HANDLER_MAX_OUTPUT_BYTES`), appends a clear truncation notice, continues draining-to-void so the child never blocks on a full pipe buffer.
  4. Registry fail-open on bad tool - `DynamicToolRegistry.registerAllOnServer` had no per-tool guard. One bad inputSchema threw during the loop and silently skipped every subsequent tool. Fix: per-tool try/catch with a warn log.

`buildSafeEnv` and the `--env-file=` pattern are unchanged - the subprocess env isolation boundary from SECURITY.md is preserved.

Files touched

  • `src/mcp/dynamic-handlers.ts` (+156 lines, mostly the new `readStreamWithCap` + `drainProcessWithLimits`)
  • `src/mcp/dynamic-tools.ts` (+7 lines, per-tool registration guard)
  • `src/mcp/tests/dynamic-handlers.test.ts` (+109 lines, includes a 200KB-stderr regression test that would hang under the old sequential-drain code)
  • `src/mcp/tests/dynamic-tools.test.ts` (+51 lines, bad-tool-does-not-break-registry test)

Test plan

  • `bun test src/mcp` passes on branch
  • Dry-run merge against current main is clean
  • Post-merge: run `bun test src/mcp` on merged main to confirm still green

…y output

Four latent liveness/stability bugs on the MCP dynamic-tool execution path
would silently hang agent turns or crash the container. None surfaced visible
errors, which made them the worst kind of bug: the agent just stopped.

1. Pipe-buffer deadlock: executeShellHandler and executeScriptHandler drained
   stdout then stderr sequentially. Any handler writing >64KB to stderr before
   closing stdout (curl -v, git clone, npm install, verbose loggers) blocked
   on its next stderr write while phantom waited for stdout EOF forever.
   Fix: Promise.all over both streams via a new readStreamWithCap helper.

2. No subprocess timeout: Bun.spawn ran with no kill path. A hung handler
   froze the agent turn indefinitely with no recovery. Fix: drainProcessWithLimits
   schedules SIGTERM at HANDLER_TIMEOUT_MS (default 60s, env-overridable via
   PHANTOM_DYNAMIC_HANDLER_TIMEOUT_MS) and escalates to SIGKILL after a 2s
   grace. Timeouts report partial stderr so the agent has actionable signal.

3. No stdout/stderr size cap: new Response(stream).text() slurped unbounded
   output, risking OOM of the 2GB container. Fix: readStreamWithCap enforces
   a 1MB cap by default (PHANTOM_DYNAMIC_HANDLER_MAX_OUTPUT_BYTES), appends a
   clear truncation notice, and continues draining-to-void so the child never
   blocks on a full pipe buffer.

4. DynamicToolRegistry.registerAllOnServer had no per-tool guard. One tool
   with a bad inputSchema would throw during the loop and silently skip every
   subsequent tool on every agent query (MCP factory pattern recreates servers
   per query). Fix: per-tool try/catch, warn with tool name, continue. Broken
   tools are not auto-unregistered; the operator decides.

buildSafeEnv and the --env-file= pattern in executeScriptHandler are
unchanged, preserving the subprocess environment isolation boundary from
SECURITY.md. Tests spawn real subprocesses and include a 200KB-stderr
regression test that would hang under the old sequential-drain code.

Env-var cleanup in the new tests uses Reflect.deleteProperty(process.env, ...)
rather than `delete` (Biome noDelete) or `= undefined` (coerces to the string
"undefined" on process.env and does not actually unset the key). This matches
the pattern acknowledged as correct by the maintainer in #5.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant