Skip to content

Fix leaked login-shell probes during CLI path detection#822

Closed
LPFchan wants to merge 4 commits intosteipete:mainfrom
LPFchan:codex/fix-login-shell-probe-leaks
Closed

Fix leaked login-shell probes during CLI path detection#822
LPFchan wants to merge 4 commits intosteipete:mainfrom
LPFchan:codex/fix-login-shell-probe-leaks

Conversation

@LPFchan
Copy link
Copy Markdown
Contributor

@LPFchan LPFchan commented Apr 30, 2026

Summary

Fixes process leaks from interactive login-shell probes used by CLI path detection.

PathEnvironment runs /bin/zsh -l -i -c ... to discover PATH mutations and CLI locations. If shell init scripts emit enough output or spawn helper/background processes, the previous implementation could leave zsh/child processes alive across repeated provider refreshes.

Fixes #821.

Changes

  • Drain both stdout and stderr concurrently while the shell probe is running.
  • Isolate each shell probe in its own process group with setpgid.
  • Use a termination-handler/semaphore wait instead of polling.
  • On timeout, terminate the process group with SIGTERM, then escalate to SIGKILL.
  • Clean up the process group after normal shell completion to stop background helpers from shell init.
  • Share the shell-runner implementation between command lookup and login-shell PATH capture.

Verification

  • git diff --check
  • swiftc -typecheck -swift-version 6 -enable-upcoming-feature StrictConcurrency Sources/CodexBarCore/PathEnvironment.swift
  • Targeted probes for high-volume stdout/stderr, timeout escalation, and background children inheriting stdout.

Note: full swift test was not run locally because my machine has Swift 6.1, while the package requires Swift tools 6.2.

@LPFchan LPFchan marked this pull request as ready for review May 1, 2026 01:43
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 11d1b9d94d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +362 to +365
usleep(80000)
stdoutHandle.readabilityHandler = nil
stderrHandle.readabilityHandler = nil
return stdoutCollector.drain()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Wait for EOF before returning captured stdout

This returns after a fixed usleep(80000) and then disables readabilityHandler, which can truncate stdout when the child exits before all buffered pipe data has been delivered to the async handlers. In that case path probes (command -v, alias/type output, PATH marker extraction) can intermittently parse incomplete output and fail to detect CLIs, especially with verbose shell init output. The capture path should synchronize on EOF/drain completion instead of a fixed sleep.

Useful? React with 👍 / 👎.

Comment on lines +310 to +312
let pid = process.processIdentifier
let processGroup: pid_t? = setpgid(pid, pid) == 0 ? pid : nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Create process group before exec to make cleanup effective

Calling setpgid(pid, pid) after process.run() races with the child exec, and once exec has happened this call typically fails (EACCES), leaving processGroup nil. That means both timeout cleanup and normal post-exit group termination silently skip the process-group kill path, so background helpers spawned by shell init are still left running. If process-group teardown is required, the group must be established at launch time (or via a launch API that sets pgid in the child before exec).

Useful? React with 👍 / 👎.

LPFchan added 2 commits May 2, 2026 16:40
Replace the fixed 80ms post-exit sleep with a DispatchGroup that the
stdout/stderr readability handlers leave on EOF (empty data). This
prevents truncated captures when the child exits before all buffered
pipe bytes have been delivered to the async handlers, which could cause
intermittent CLI-detection failures with verbose shell init output.

Wait is bounded (1s safety net); on fall-through or the timeout-kill
path we proactively cancel the handlers and force-leave the group via
an idempotent OnceFlag so leave() is never double-called.

Addresses Codex review P1 on steipete#822.
The previous code called setpgid(pid, pid) from the parent after
process.run().  That call races with the child's exec — once exec
has happened, setpgid typically fails with EACCES and processGroup
silently becomes nil, defeating both the timeout-kill and the
post-exit kill(-pgid, …) cleanup, so background helpers spawned by
shell init kept running.

Replace Foundation.Process with posix_spawn and set the process
group via posix_spawnattr_setpgroup(&attr, 0) under
POSIX_SPAWN_SETPGROUP.  This makes the child its own pgid leader
*before* exec, so kill(-pgid, …) reliably reaches the entire group.

Verified via a standalone probe (compiled separately, not committed)
exercising:
- normal exit + high-volume init noise still captures full stdout
  (confirms P1 EOF-drain still works after the rewrite)
- backgrounded helper spawned by shell init is killed via pgid
  cleanup after the shell exits normally
- 1.0s timeout with a hung shell init returns nil within ~1.4s and
  kills both the shell and its backgrounded helper
- a child of `posix_spawn` reports `pid == pgid`, confirming
  POSIX_SPAWN_SETPGROUP took effect before exec

Addresses Codex review P2 on steipete#822.
@ratulsarna
Copy link
Copy Markdown
Collaborator

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 926181d94d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Thread.sleep(forTimeInterval: 0.05)
// Build file actions: redirect stdin from /dev/null, dup pipe write ends to
// fds 1 and 2, and close every pipe fd in the child.
var fileActions = posix_spawn_file_actions_t(nil as OpaquePointer?)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Initialize posix_spawn structs portably

posix_spawn_file_actions_t(nil as OpaquePointer?) is not portable to Linux Swift toolchains (it errors with “argument passed to call that takes no arguments”), and the same pattern is repeated for posix_spawnattr_t a few lines later. Since CodexBarCore is compiled for the CodexBarLinuxTests target, this change can break Linux builds/CI immediately. Use the platform-compatible zero initializer form (and similarly for posix_spawnattr_t) so the module compiles across supported targets.

Useful? React with 👍 / 👎.

`posix_spawn_file_actions_t` and `posix_spawn_attr_t` are an opaque
pointer typedef on Darwin (Swift imports them as OpaquePointer?) and a
struct on Glibc.  The previous `posix_spawn_file_actions_t(nil as
OpaquePointer?)` form only compiles on Darwin and breaks the
CodexBarLinuxTests build on Linux.

Use `#if canImport(Darwin)` to pick the optional-nil form on Darwin
and the zero-struct form on Glibc.  Verified Darwin still builds and
the standalone probe (P1 EOF drain, P2 pgid cleanup, timeout
escalation, pid==pgid check) still passes.

Addresses Codex review P1 on commit 926181d (steipete#822).
@ratulsarna
Copy link
Copy Markdown
Collaborator

Opened a replacement PR with the cleanup hardening here: #835

Thanks again @LPFchan for the original report and implementation work — I used that as the base and kept the contributor branch untouched.

steipete pushed a commit that referenced this pull request May 4, 2026
Replace the fixed 80ms post-exit sleep with a DispatchGroup that the
stdout/stderr readability handlers leave on EOF (empty data). This
prevents truncated captures when the child exits before all buffered
pipe bytes have been delivered to the async handlers, which could cause
intermittent CLI-detection failures with verbose shell init output.

Wait is bounded (1s safety net); on fall-through or the timeout-kill
path we proactively cancel the handlers and force-leave the group via
an idempotent OnceFlag so leave() is never double-called.

Addresses Codex review P1 on #822.
steipete pushed a commit that referenced this pull request May 4, 2026
The previous code called setpgid(pid, pid) from the parent after
process.run().  That call races with the child's exec — once exec
has happened, setpgid typically fails with EACCES and processGroup
silently becomes nil, defeating both the timeout-kill and the
post-exit kill(-pgid, …) cleanup, so background helpers spawned by
shell init kept running.

Replace Foundation.Process with posix_spawn and set the process
group via posix_spawnattr_setpgroup(&attr, 0) under
POSIX_SPAWN_SETPGROUP.  This makes the child its own pgid leader
*before* exec, so kill(-pgid, …) reliably reaches the entire group.

Verified via a standalone probe (compiled separately, not committed)
exercising:
- normal exit + high-volume init noise still captures full stdout
  (confirms P1 EOF-drain still works after the rewrite)
- backgrounded helper spawned by shell init is killed via pgid
  cleanup after the shell exits normally
- 1.0s timeout with a hung shell init returns nil within ~1.4s and
  kills both the shell and its backgrounded helper
- a child of `posix_spawn` reports `pid == pgid`, confirming
  POSIX_SPAWN_SETPGROUP took effect before exec

Addresses Codex review P2 on #822.
steipete pushed a commit that referenced this pull request May 4, 2026
`posix_spawn_file_actions_t` and `posix_spawn_attr_t` are an opaque
pointer typedef on Darwin (Swift imports them as OpaquePointer?) and a
struct on Glibc.  The previous `posix_spawn_file_actions_t(nil as
OpaquePointer?)` form only compiles on Darwin and breaks the
CodexBarLinuxTests build on Linux.

Use `#if canImport(Darwin)` to pick the optional-nil form on Darwin
and the zero-struct form on Glibc.  Verified Darwin still builds and
the standalone probe (P1 EOF drain, P2 pgid cleanup, timeout
escalation, pid==pgid check) still passes.

Addresses Codex review P1 on commit 926181d (#822).
@steipete
Copy link
Copy Markdown
Owner

steipete commented May 4, 2026

Thanks @LPFchan, landed this on main via cherry-picks b3b79e0, dcea9ce, 138da85, and 0eef928, with follow-up hardening/tests in 3b13ee5. I kept the posix_spawn process-group approach, added regression coverage for noisy shell output and background-child cleanup, and tightened normal completion to escalate to SIGKILL if descendants keep pipes open.\n\nVerified: pnpm check, swiftlint --strict, swift test --filter PathBuilderTests, full swift test (1932 tests), and ./Scripts/compile_and_run.sh rebuilt/relaunched CodexBar.app successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CodexBar leaks zsh/fzf child processes until macOS hits maxproc and new commands fail with "Resource temporarily unavailable"

3 participants