Fix leaked login-shell probes during CLI path detection#822
Fix leaked login-shell probes during CLI path detection#822LPFchan wants to merge 4 commits intosteipete:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 11d1b9d94d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| usleep(80000) | ||
| stdoutHandle.readabilityHandler = nil | ||
| stderrHandle.readabilityHandler = nil | ||
| return stdoutCollector.drain() |
There was a problem hiding this comment.
Wait for EOF before returning captured stdout
This returns after a fixed usleep(80000) and then disables readabilityHandler, which can truncate stdout when the child exits before all buffered pipe data has been delivered to the async handlers. In that case path probes (command -v, alias/type output, PATH marker extraction) can intermittently parse incomplete output and fail to detect CLIs, especially with verbose shell init output. The capture path should synchronize on EOF/drain completion instead of a fixed sleep.
Useful? React with 👍 / 👎.
| let pid = process.processIdentifier | ||
| let processGroup: pid_t? = setpgid(pid, pid) == 0 ? pid : nil | ||
|
|
There was a problem hiding this comment.
Create process group before exec to make cleanup effective
Calling setpgid(pid, pid) after process.run() races with the child exec, and once exec has happened this call typically fails (EACCES), leaving processGroup nil. That means both timeout cleanup and normal post-exit group termination silently skip the process-group kill path, so background helpers spawned by shell init are still left running. If process-group teardown is required, the group must be established at launch time (or via a launch API that sets pgid in the child before exec).
Useful? React with 👍 / 👎.
Replace the fixed 80ms post-exit sleep with a DispatchGroup that the stdout/stderr readability handlers leave on EOF (empty data). This prevents truncated captures when the child exits before all buffered pipe bytes have been delivered to the async handlers, which could cause intermittent CLI-detection failures with verbose shell init output. Wait is bounded (1s safety net); on fall-through or the timeout-kill path we proactively cancel the handlers and force-leave the group via an idempotent OnceFlag so leave() is never double-called. Addresses Codex review P1 on steipete#822.
The previous code called setpgid(pid, pid) from the parent after process.run(). That call races with the child's exec — once exec has happened, setpgid typically fails with EACCES and processGroup silently becomes nil, defeating both the timeout-kill and the post-exit kill(-pgid, …) cleanup, so background helpers spawned by shell init kept running. Replace Foundation.Process with posix_spawn and set the process group via posix_spawnattr_setpgroup(&attr, 0) under POSIX_SPAWN_SETPGROUP. This makes the child its own pgid leader *before* exec, so kill(-pgid, …) reliably reaches the entire group. Verified via a standalone probe (compiled separately, not committed) exercising: - normal exit + high-volume init noise still captures full stdout (confirms P1 EOF-drain still works after the rewrite) - backgrounded helper spawned by shell init is killed via pgid cleanup after the shell exits normally - 1.0s timeout with a hung shell init returns nil within ~1.4s and kills both the shell and its backgrounded helper - a child of `posix_spawn` reports `pid == pgid`, confirming POSIX_SPAWN_SETPGROUP took effect before exec Addresses Codex review P2 on steipete#822.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 926181d94d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Thread.sleep(forTimeInterval: 0.05) | ||
| // Build file actions: redirect stdin from /dev/null, dup pipe write ends to | ||
| // fds 1 and 2, and close every pipe fd in the child. | ||
| var fileActions = posix_spawn_file_actions_t(nil as OpaquePointer?) |
There was a problem hiding this comment.
Initialize posix_spawn structs portably
posix_spawn_file_actions_t(nil as OpaquePointer?) is not portable to Linux Swift toolchains (it errors with “argument passed to call that takes no arguments”), and the same pattern is repeated for posix_spawnattr_t a few lines later. Since CodexBarCore is compiled for the CodexBarLinuxTests target, this change can break Linux builds/CI immediately. Use the platform-compatible zero initializer form (and similarly for posix_spawnattr_t) so the module compiles across supported targets.
Useful? React with 👍 / 👎.
`posix_spawn_file_actions_t` and `posix_spawn_attr_t` are an opaque pointer typedef on Darwin (Swift imports them as OpaquePointer?) and a struct on Glibc. The previous `posix_spawn_file_actions_t(nil as OpaquePointer?)` form only compiles on Darwin and breaks the CodexBarLinuxTests build on Linux. Use `#if canImport(Darwin)` to pick the optional-nil form on Darwin and the zero-struct form on Glibc. Verified Darwin still builds and the standalone probe (P1 EOF drain, P2 pgid cleanup, timeout escalation, pid==pgid check) still passes. Addresses Codex review P1 on commit 926181d (steipete#822).
Replace the fixed 80ms post-exit sleep with a DispatchGroup that the stdout/stderr readability handlers leave on EOF (empty data). This prevents truncated captures when the child exits before all buffered pipe bytes have been delivered to the async handlers, which could cause intermittent CLI-detection failures with verbose shell init output. Wait is bounded (1s safety net); on fall-through or the timeout-kill path we proactively cancel the handlers and force-leave the group via an idempotent OnceFlag so leave() is never double-called. Addresses Codex review P1 on #822.
The previous code called setpgid(pid, pid) from the parent after process.run(). That call races with the child's exec — once exec has happened, setpgid typically fails with EACCES and processGroup silently becomes nil, defeating both the timeout-kill and the post-exit kill(-pgid, …) cleanup, so background helpers spawned by shell init kept running. Replace Foundation.Process with posix_spawn and set the process group via posix_spawnattr_setpgroup(&attr, 0) under POSIX_SPAWN_SETPGROUP. This makes the child its own pgid leader *before* exec, so kill(-pgid, …) reliably reaches the entire group. Verified via a standalone probe (compiled separately, not committed) exercising: - normal exit + high-volume init noise still captures full stdout (confirms P1 EOF-drain still works after the rewrite) - backgrounded helper spawned by shell init is killed via pgid cleanup after the shell exits normally - 1.0s timeout with a hung shell init returns nil within ~1.4s and kills both the shell and its backgrounded helper - a child of `posix_spawn` reports `pid == pgid`, confirming POSIX_SPAWN_SETPGROUP took effect before exec Addresses Codex review P2 on #822.
`posix_spawn_file_actions_t` and `posix_spawn_attr_t` are an opaque pointer typedef on Darwin (Swift imports them as OpaquePointer?) and a struct on Glibc. The previous `posix_spawn_file_actions_t(nil as OpaquePointer?)` form only compiles on Darwin and breaks the CodexBarLinuxTests build on Linux. Use `#if canImport(Darwin)` to pick the optional-nil form on Darwin and the zero-struct form on Glibc. Verified Darwin still builds and the standalone probe (P1 EOF drain, P2 pgid cleanup, timeout escalation, pid==pgid check) still passes. Addresses Codex review P1 on commit 926181d (#822).
|
Thanks @LPFchan, landed this on main via cherry-picks b3b79e0, dcea9ce, 138da85, and 0eef928, with follow-up hardening/tests in 3b13ee5. I kept the posix_spawn process-group approach, added regression coverage for noisy shell output and background-child cleanup, and tightened normal completion to escalate to SIGKILL if descendants keep pipes open.\n\nVerified: pnpm check, swiftlint --strict, swift test --filter PathBuilderTests, full swift test (1932 tests), and ./Scripts/compile_and_run.sh rebuilt/relaunched CodexBar.app successfully. |
Summary
Fixes process leaks from interactive login-shell probes used by CLI path detection.
PathEnvironmentruns/bin/zsh -l -i -c ...to discover PATH mutations and CLI locations. If shell init scripts emit enough output or spawn helper/background processes, the previous implementation could leavezsh/child processes alive across repeated provider refreshes.Fixes #821.
Changes
setpgid.SIGTERM, then escalate toSIGKILL.Verification
git diff --checkswiftc -typecheck -swift-version 6 -enable-upcoming-feature StrictConcurrency Sources/CodexBarCore/PathEnvironment.swiftNote: full
swift testwas not run locally because my machine has Swift 6.1, while the package requires Swift tools 6.2.