Skip to content

Fix login shell probe process leaks#835

Closed
ratulsarna wants to merge 7 commits intomainfrom
codex/pr-822-audit
Closed

Fix login shell probe process leaks#835
ratulsarna wants to merge 7 commits intomainfrom
codex/pr-822-audit

Conversation

@ratulsarna
Copy link
Copy Markdown
Collaborator

Supersedes #822.

Thanks to @LPFchan for the original report and implementation direction. This keeps the same goal: prevent interactive login-shell PATH/CLI probes from leaking zsh/fzf-style helper processes.

Summary:

  • Run shell probes in their own process group before exec.
  • Drain stdout/stderr concurrently and wait for pipe EOF.
  • Escalate cleanup for timeout and post-exit helper leaks.
  • Add regression coverage for TERM-resistant shell-init helpers.

Verification:

  • swift test --filter ShellCommandLocatorCleanupTests
  • swift test --filter PathBuilderTests
  • pnpm check
  • ./Scripts/compile_and_run.sh

LPFchan and others added 7 commits May 3, 2026 15:59
Replace the fixed 80ms post-exit sleep with a DispatchGroup that the
stdout/stderr readability handlers leave on EOF (empty data). This
prevents truncated captures when the child exits before all buffered
pipe bytes have been delivered to the async handlers, which could cause
intermittent CLI-detection failures with verbose shell init output.

Wait is bounded (1s safety net); on fall-through or the timeout-kill
path we proactively cancel the handlers and force-leave the group via
an idempotent OnceFlag so leave() is never double-called.

Addresses Codex review P1 on #822.
The previous code called setpgid(pid, pid) from the parent after
process.run().  That call races with the child's exec — once exec
has happened, setpgid typically fails with EACCES and processGroup
silently becomes nil, defeating both the timeout-kill and the
post-exit kill(-pgid, …) cleanup, so background helpers spawned by
shell init kept running.

Replace Foundation.Process with posix_spawn and set the process
group via posix_spawnattr_setpgroup(&attr, 0) under
POSIX_SPAWN_SETPGROUP.  This makes the child its own pgid leader
*before* exec, so kill(-pgid, …) reliably reaches the entire group.

Verified via a standalone probe (compiled separately, not committed)
exercising:
- normal exit + high-volume init noise still captures full stdout
  (confirms P1 EOF-drain still works after the rewrite)
- backgrounded helper spawned by shell init is killed via pgid
  cleanup after the shell exits normally
- 1.0s timeout with a hung shell init returns nil within ~1.4s and
  kills both the shell and its backgrounded helper
- a child of `posix_spawn` reports `pid == pgid`, confirming
  POSIX_SPAWN_SETPGROUP took effect before exec

Addresses Codex review P2 on #822.
`posix_spawn_file_actions_t` and `posix_spawn_attr_t` are an opaque
pointer typedef on Darwin (Swift imports them as OpaquePointer?) and a
struct on Glibc.  The previous `posix_spawn_file_actions_t(nil as
OpaquePointer?)` form only compiles on Darwin and breaks the
CodexBarLinuxTests build on Linux.

Use `#if canImport(Darwin)` to pick the optional-nil form on Darwin
and the zero-struct form on Glibc.  Verified Darwin still builds and
the standalone probe (P1 EOF drain, P2 pgid cleanup, timeout
escalation, pid==pgid check) still passes.

Addresses Codex review P1 on commit 926181d (#822).
@ratulsarna ratulsarna marked this pull request as ready for review May 4, 2026 08:18
@ratulsarna ratulsarna closed this May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants