Skip to content

codex exec: parent-kill should terminate all spawned child processes (Windows + POSIX) #21994

@nagatani-design

Description

@nagatani-design

Summary

When codex exec --dangerously-bypass-approvals-and-sandbox <prompt> is killed externally (e.g. via Stop-Process on Windows, kill on POSIX), the child processes it spawned during the run keep running indefinitely and consume CPU. There is no Job Object / process group cleanup hook.

Repro (Windows)

# Terminal A
codex exec --dangerously-bypass-approvals-and-sandbox "$(Get-Content prompt.md)" > out.log

# Terminal B
$pid = (Get-Process codex | Select-Object -First 1).Id
Stop-Process -Id $pid -Force
Start-Sleep -Seconds 5

# Many child processes (often `python -` from heredoc invocations) are still alive
Get-CimInstance Win32_Process -Filter "Name='python.exe'" |
  Where-Object { $_.CommandLine -match 'python\.exe" -$' }
# -> shows orphaned subprocesses, each ~10% CPU

Observed impact (2026-05-10)

  • codex exec parent killed
  • 7 orphaned python - (heredoc) children survived
  • Each consumed ~10% CPU for 43-76 minutes before manual cleanup
  • Total wasted CPU time: 7.5 hours across 7 orphans
  • No way to identify orphans without walking ParentProcessId tree manually

Root cause (hypothesis)

Codex CLI on Windows does not attach spawned subprocesses to a Job Object with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE. On POSIX, setpgid() + signaling the process group is not done. Therefore OS does not propagate parent termination to children.

Expected behavior

When the codex exec parent receives SIGTERM/SIGKILL/Stop-Process, all descendants spawned during the run terminate within 30 seconds.

Proposed fix

Windows: wrap subprocess spawning with Win32 Job Object:

// pseudo-code, depending on Codex CLI implementation language
let job = create_job_object();
set_job_limits(job, JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE);
let child = spawn_subprocess(...);
assign_process_to_job(child, job);
// when job handle is dropped (parent exit), all assigned processes get terminated

POSIX: use setpgid + signal process group:

let child = Command::new(...)
    .pre_exec(|| { setpgid(0, 0); Ok(()) })
    .spawn();
// on parent shutdown:
killpg(child.pid as i32, SIGTERM);

Workaround (currently in use)

We maintain a 30-line PowerShell helper that walks Win32_Process.ParentProcessId recursively and kills each PID. Acceptable but not ideal:

  • Has to be invoked manually (extra step in monitor's runbook)
  • Can race against new spawns
  • Doesn't help users on POSIX

Severity / priority

Medium-low impact for individual users (manual cleanup possible), but for automated pipelines and CI, orphan accumulation can saturate the host machine. Especially relevant for --dangerously-bypass-approvals-and-sandbox use cases where automated kill is the standard recovery path.

References

Environment

  • OS: Windows 11
  • Codex CLI: 0.128.0
  • Login: ChatGPT

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingexecIssues related to the `codex exec` subcommandwindows-osIssues related to Codex on Windows systems

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions