Fix login shell probe process leaks#835
Closed
ratulsarna wants to merge 7 commits intomainfrom
Closed
Conversation
Replace the fixed 80ms post-exit sleep with a DispatchGroup that the stdout/stderr readability handlers leave on EOF (empty data). This prevents truncated captures when the child exits before all buffered pipe bytes have been delivered to the async handlers, which could cause intermittent CLI-detection failures with verbose shell init output. Wait is bounded (1s safety net); on fall-through or the timeout-kill path we proactively cancel the handlers and force-leave the group via an idempotent OnceFlag so leave() is never double-called. Addresses Codex review P1 on #822.
The previous code called setpgid(pid, pid) from the parent after process.run(). That call races with the child's exec — once exec has happened, setpgid typically fails with EACCES and processGroup silently becomes nil, defeating both the timeout-kill and the post-exit kill(-pgid, …) cleanup, so background helpers spawned by shell init kept running. Replace Foundation.Process with posix_spawn and set the process group via posix_spawnattr_setpgroup(&attr, 0) under POSIX_SPAWN_SETPGROUP. This makes the child its own pgid leader *before* exec, so kill(-pgid, …) reliably reaches the entire group. Verified via a standalone probe (compiled separately, not committed) exercising: - normal exit + high-volume init noise still captures full stdout (confirms P1 EOF-drain still works after the rewrite) - backgrounded helper spawned by shell init is killed via pgid cleanup after the shell exits normally - 1.0s timeout with a hung shell init returns nil within ~1.4s and kills both the shell and its backgrounded helper - a child of `posix_spawn` reports `pid == pgid`, confirming POSIX_SPAWN_SETPGROUP took effect before exec Addresses Codex review P2 on #822.
`posix_spawn_file_actions_t` and `posix_spawn_attr_t` are an opaque pointer typedef on Darwin (Swift imports them as OpaquePointer?) and a struct on Glibc. The previous `posix_spawn_file_actions_t(nil as OpaquePointer?)` form only compiles on Darwin and breaks the CodexBarLinuxTests build on Linux. Use `#if canImport(Darwin)` to pick the optional-nil form on Darwin and the zero-struct form on Glibc. Verified Darwin still builds and the standalone probe (P1 EOF drain, P2 pgid cleanup, timeout escalation, pid==pgid check) still passes. Addresses Codex review P1 on commit 926181d (#822).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #822.
Thanks to @LPFchan for the original report and implementation direction. This keeps the same goal: prevent interactive login-shell PATH/CLI probes from leaking zsh/fzf-style helper processes.
Summary:
Verification: