Skip to content

CLI worker silent death after parallel-command burst (announce → mkdir → probe → exit) on Linux/WSL2 — Turn completed event never fires #21937

@vstandos

Description

@vstandos

Summary

The Codex CLI app-server worker process exits with code 0 mid-turn on Linux/WSL2 immediately after a specific tool-call sequence pattern, leaving the broker stuck and the wrapper layer hanging on state.completion. No error event is propagated. Reproducible 4 times across 24 hours on codex-cli 0.130.0 and 0.129.0.

Environment

  • Codex CLI: 0.129.0 → 0.130.0 (both reproduce)
  • Companion / wrapper: @openai/codex plugin runtime via codex-companion.mjs
  • Host: Windows 11 + WSL2 (Ubuntu 22.04, kernel 6.6.114.1-microsoft-standard-WSL2)
  • Model: gpt-5.5 with xhigh reasoning effort (default for codex)
  • Sandbox: danger-full-access
  • Trigger: dispatched as background job from a Claude Code orchestrator

Reproducer (high reliability — 4/4 in 24h)

The worker dies after this exact sequence in a single turn:

  1. Multiple psql / sed exploration commands (parallel batch, all complete with exit 0)
  2. Model emits an "announcement" assistant message ("I'm going to write <file>...")
  3. mkdir -p <output_dir> — succeeds
  4. ONE more psql or shell probe (e.g., psql -Atc "SELECT exchange, contract_type FROM symbols") — succeeds
  5. Worker silently exits. No Turn completed event, no error, no SIGTERM in logs. Companion state.completion Promise hangs indefinitely.

Observed across 4 distinct Codex jobs today:

  • task-moxeya7t-248j70 (v4 analysis executor, 21:20 UTC)
  • task-moxpj6m0-zr9uvr (PR creation task, 02:13 UTC next day)
  • task-moxq4ixm-aln8jh (REST probe investigation, 02:29 UTC)
  • bt0dkgkg6 / b77sx7ne9 (analytical re-run, 16:00 UTC)

What I see in logs

/home/user/.claude/plugins/data/codex-openai-codex/state/<workspace>/jobs/<task-id>.log ends with:

[ts] Command completed: /bin/bash -lc 'psql ...' (exit 0)

and zero lines after that. No turn/completed notification, no client.exit, no error event.

broker.log is empty (0 bytes) on the workspace's /tmp/cxc-*/ directory.

Expected behavior

Either:

  • Turn completes normally with subsequent file-write tool calls
  • OR worker emits an error event before exiting

Neither happens — exit is silent.

Workaround applied locally (wrapper-layer)

Patched lib/codex.mjs captureTurn() to use Promise.race([state.completion, exitRace]) where exitRace watches client.exitPromise and throws if turn isn't yet complete. This converts indefinite hang into clean exception, but doesn't fix the underlying app-server crash.

Related (non-duplicate)

Reproduction setup (if upstream wants to bisect)

I can provide:

  • All 4 task log files (~10-90 lines each, JSON-ish single-line structure)
  • Companion broker.log captures
  • Process tree at moment of failure (broker alive, app-server dead)
  • The wrapper-layer Promise.race patch as before/after diff

Request: enable structured logging in app-server's exit path, OR add a "worker-died-mid-turn" event to surface this state instead of silent exit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CLIIssues related to the Codex CLIapp-serverIssues involving app server protocol or interfacesbugSomething isn't workingtool-callsIssues related to tool callingwindows-osIssues related to Codex on Windows systems

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions