Skip to content

codex:codex-rescue subagent returns stub instead of actual task output #324

@taibaran

Description

@taibaran

Symptom

When a parent agent calls subagent_type: codex:codex-rescue expecting a substantive answer, the wrapper returns within seconds with text like:

"Codex is running the review in the background. You will be notified when the output is ready."

The actual task spawned by the wrapper continues in a separate background job; the parent agent never gets the real result because it was subscribed to the wrapper agent (which exited immediately), not to the orphaned background job that holds the real Codex output.

Discovered while running a multi-reviewer cycle against a third-party plugin: the Codex rescue stub returned within seconds while the real task-worker continued for ~18 minutes and died without producing a verdict.

Reproduction

From a parent agent (Claude Code session):

Agent({
  subagent_type: "codex:codex-rescue",
  prompt: "Review <some code>... return a structured verdict."
})

Expected: the wrapper waits for Codex, returns the verdict.
Observed: the wrapper returns within seconds with "Codex is running the review in the background."

Root cause (3 stacked issues)

Issue 1: task subcommand is async-by-design

In scripts/codex-companion.mjs:643:

const child = spawn(process.execPath, [scriptPath, "task-worker", "--cwd", cwd, "--job-id", jobId], {
    ...
    detached: true,

task spawns a detached task-worker and returns a job ID. Without --wait, the call returns immediately and the actual reasoning happens in the spawned worker.

Issue 2: rescue skill instructions STRIP --wait

skills/codex-cli-runtime/SKILL.md says:

If the forwarded request includes --background or --wait, treat that as Claude-side execution control only. Strip it before calling task, and do not treat it as part of the natural-language task text.

So even if the parent passes --wait, the rescue wrapper removes it. The wrapper never adds --wait either — the task call always runs in detached/async mode.

Issue 3: Claude Code's Bash tool further auto-backgrounds long commands

When the wrapper agent's Claude calls Bash(node codex-companion.mjs task "..."), the Bash tool itself auto-decides to background the command. The wrapper sees only the Bash auto-background acknowledgement:

"Command running in background with ID: bodpz7pmd. Output is being written to: /private/tmp/claude-501/.../tasks/bodpz7pmd.output"

…and forwards that to its parent verbatim, per the rescue skill's "return stdout exactly as-is" rule.

Real Codex output goes to bodpz7pmd.output, which the parent never knows about.

Evidence from a real session

Auto-background output landed at:

/private/tmp/claude-501/.../tasks/bodpz7pmd.output

23 lines of genuine Codex progress (excerpts):

[codex] Starting Codex task thread.
[codex] Thread ready (019e2b02-ded1-7e20-a0f3-b584b6b6b3e7).
[codex] Turn started.
[codex] Assistant message captured: I'll try the requested checkout first.
[codex] Searching:
[codex] Assistant message captured: The direct clone is blocked by the read-only sandbox: `/tmp/grok-v060-codex` cannot be created
[codex] Calling codex_apps/github_fetch_file.
[codex] Tool codex_apps/github_fetch_file completed.
...

Codex was actually working — hit a read-only sandbox blocking git clone, pivoted to GitHub API fetches, ran for ~18 minutes, then died without producing a final verdict. The parent agent had no visibility into any of this.

Suggested fixes (any one is sufficient)

Fix A (cleanest) — drop the strip---wait rule from the skill

In skills/codex-cli-runtime/SKILL.md, change the execution rule from:

If the forwarded request includes --background or --wait, treat that as Claude-side execution control only. Strip it before calling task.

to:

Always pass --wait to task from inside the rescue subagent — the rescue contract is "block until done and return the final result." If the user also passed --background, strip it (rescue is never async).

This makes the wrapper synchronous from the parent's point of view, which is what the rescue subagent contract implies.

Fix B — task defaults to synchronous when invoked from a rescue context

Detect inside codex-companion.mjs::handleTask that the caller is non-interactive (no TTY, or process.env.CODEX_RESCUE_MODE=1) and skip the detached spawn — just await the worker inline.

Fix C — wrapper polls until done

The skill instructs the wrapper to call task → capture job ID → loop on status --wait <id> until terminal status → return final result. More changes to the skill, but does not require changes to task.

Fix A is the smallest delta and matches what users actually expect from a rescue subagent.

Related

I found the same pattern in taibaran/gemini-plugin-cc (same async-task + strip---wait design). Opening a parallel issue there.

Cross-link: taibaran/gemini-plugin-cc#3


Reported while building taibaran/grok-plugin-cc, a third Claude-Code plugin modeled on this one — context for why the multi-reviewer flow surfaced the bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions