Skip to content

Codex returns 'verdict: blocked' on completed work — three distinct causes mask successful task output #273

@2015kmadanap

Description

@2015kmadanap

Summary

I'm using sapoto-codex:implement (the --background task pipeline) to delegate plan-driven work from Claude Code while Claude orchestrates. In ~30 min of usage with gpt-5.4 --effort high, I hit three distinct failure modes that all produce the same verdict: blocked summary even though the underlying work was completed and verified. Surfacing all three because they compound: a downstream consumer (Claude) gets a single "blocked" signal with no actionable distinction between "I couldn't do the work" and "I did the work but the contract refuses to let me commit it."

Plugin: @openai/codex-plugin-cc v1.0.4
Codex CLI: codex-cli 0.114.0
Auth: ChatGPT (madanapallikalyan@gmail.com)
Host: macOS 24.6.0 (darwin), Apple Silicon
Plugin install: /Users/kalyanmadanapalli/Desktop/sapoto-codex-plugin/
Companion: node .../codex/scripts/codex-companion.mjs task ...

Issue 1 — Sandbox blocks git commit inside a linked git worktree

Reproduction

Branch is on a linked git worktree (git worktree add ...). The worktree's .git is a file pointing to <main>/.git/worktrees/<name>/. Codex makes file changes inside the worktree's working tree successfully, then tries git commit:

fatal: Unable to create '/Users/<me>/Desktop/automatic-document-fetcher/.worktrees/resilient-retry-scheduler/.git/worktrees/resilient-retry-scheduler/index.lock': Operation not permitted

The sandbox grants write access to the worktree's working tree but blocks writes to the linked git directory at <main>/.git/worktrees/<name>/. So git commit can't acquire index.lock.

This appears to be the same general class as #240 (sandbox config issues) but specific to git worktrees on macOS. Workaround per project's CLAUDE.md is to have Codex write files only and commit yourself afterward, but that surfaces as verdict: blocked in the companion output, which forces the orchestrator to read the raw .log file to recover what actually happened.

Concrete log line

[2026-04-26T20:25:17.005Z] Assistant message
- verdict: blocked
- ...
- notes:
  - `git commit` was blocked by sandbox permissions: `fatal: Unable to create '.../index.lock': Operation not permitted`.
  - Acceptance items 1-4 are implemented and verified; the only incomplete part of the completeness contract is the required commit.

Suggested fix

  • Detect the sandbox-permission flavor of git commit failure separately and emit a distinct verdict like verdict: ready_to_commit or commit_blocked_by_sandbox so orchestrators can auto-commit. OR
  • Allow the sandbox to write to .git/worktrees/<name>/ paths under the user's repo root, the same way it allows writing to the working tree. OR
  • Explicitly document in CLAUDE.md-style guidance that linked-worktree commits will fail and the orchestrator should commit after a "blocked" report — and surface a structured field in the JSON result that flags this case.

Issue 2 — result.rawOutput returns null despite a final summary existing

Reproduction

After a job completes, codex-companion.mjs result <task-id> --json returns:

{
  "job": {
    "status": "completed",
    "summary": "- verdict: blocked",
    "result": null
  }
}

But the corresponding job log file (~/.claude/plugins/data/codex-openai-codex/state/<workspace>/jobs/<task-id>.log) clearly contains a [Final output] block with the structured report. The companion just doesn't surface it. This makes the orchestrator unable to programmatically extract files_changed, tests results, and notes — which are all in the log but not in result.rawOutput.

This is adjacent to #264 ("per-job state JSON stuck at status=running ...") but specific to the case where status=completed but result is null.

Suggested fix

  • Plumb the final assistant message through from the in-memory turn state into the persisted result.rawOutput, even when the only emission is a structured summary. Or fall back to the last Assistant message in the log if rawOutput is not set.

Issue 3 — Codex runs the full test suite despite explicit "only run targeted tests" guidance

Reproduction

My brief said:

Pre-existing test failures (IMPORTANT — don't be blocked by these)

Pre-existing failures unrelated to this work in: browserWindowRegistry.test.ts, chromeSetupStatusResolver.test.ts, notificationService.test.ts, autoExportRepo.test.ts. Do NOT run the full pnpm test suite — only run the targeted tests above.

Plus a <verification> block listing the exact commands.

Codex ran the targeted tests (correctly) AND THEN ran pnpm test (full suite). The full suite hit the four pre-existing failures listed above. Codex treated these as part of the completeness contract → refused to commit → verdict: blocked. Files were correctly modified, targeted tests passed, typecheck clean — but the orchestrator gets a "blocked" verdict and has to manually verify and commit.

This is essentially the completeness contract failing to differentiate "tests I could have caused" from "tests already broken on the parent branch." On a large codebase with any flaky/red tests on main, this means every Codex implement task will go "blocked" unless the user is on a perfectly-green branch.

Suggested fix

  • Have the verification step honor the brief's explicit list of commands and NOT run additional commands (especially pnpm test) without a prompt-side opt-in. OR
  • Compare which tests were red BEFORE the diff vs. AFTER, and only block on tests the diff caused to fail. Existing red tests = warn-with-context, not block.
  • Allow the brief to set a --no-full-suite flag or honor the project's CLAUDE.md "pre-existing failures" markers if present.

Why these compound into a usability problem

A consumer like Claude Code orchestrating a 20-task implementation plan via sapoto-codex:implement gets a verdict: blocked signal in three different shapes:

Cause Real state What surfaces
Sandbox can't commit Done, just needs commit verdict: blocked
Pre-existing test red Done, tests pass for the diff verdict: blocked
Genuine implementation failure Not done verdict: blocked

The orchestrator can't distinguish these without reading the log file every time. Today I'm running these three queries on every "blocked" return:

node codex-companion.mjs result <task-id> --json | jq '.job.summary'
git status --short                    # was anything actually changed?
tail -120 .../jobs/<task-id>.log     # what's the real verdict?

A structured verdict enum (done / commit_blocked_by_sandbox / red_tests_outside_diff / blocked) would make the result programmatically actionable without log spelunking.

Reproducer

If useful I can share a sanitized log file + the exact brief that triggered all three failure modes in two consecutive Codex runs (T7 fix-up: triggered #3 → blocked despite green diff; T8 implement: triggered #1 → blocked on commit despite green tests + green typecheck + green diff).

Workarounds in use

  • After every Codex completion, my orchestrator runs git status --short && pnpm typecheck:all && pnpm test --run <targeted-files> and commits manually if everything is green. This is the documented pattern in our project's CLAUDE.md ("If git commit is blocked in the codex sandbox (common), have codex write files only and commit yourself afterward") but it requires the orchestrator to second-guess every "blocked" verdict.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions