Skip to content

Sandbox/harness makes external agent CLI supervision unreliable vs Terminal #23857

@hardlined

Description

@hardlined

What version of Codex CLI is running?

codex-cli 0.132.0

What subscription do you have?

ChatGPT Pro

Which model were you using?

gpt-5.5 as supervising Codex model; external worker commands used opencode with deepseek/deepseek-v4-pro and deepseek-chat

What platform is your computer?

Darwin 25.5.0 arm64 arm

What terminal emulator and version are you using (if applicable)?

Ghostty with tmux 3.6b, zsh. Codex env reports TERM_PROGRAM=tmux, TERM=tmux-256color. Manual opencode commands were run from a normal terminal session.

Codex doctor report

codex doctor --json was available. Relevant excerpts from this Codex session:

- codexVersion: 0.132.0
- config.load: ok; model: gpt-5.5; feature flags include shell_tool, unified_exec, multi_agent, plugins, in_app_browser, browser_use, etc.
- sandbox.helpers: ok; approval policy OnRequest; filesystem sandbox restricted; network sandbox restricted
- state.paths: ok; state/log DBs inspectable under ~/.codex
- terminal.env: ok; tmux 3.6b; TERM=tmux-256color; shell=/bin/zsh
- network.provider_reachability: fail inside the Codex environment due sandboxed/network reachability, while the user terminal could run the external CLI successfully
- updates.status: warning due api.github.com DNS from the Codex environment

I can provide the full JSON if useful, but omitted it here to keep the issue focused and avoid unnecessary local path/account detail.

What issue are you seeing?

When using Codex as a supervising agent and delegating implementation/testing to an external CLI worker (OpenCode with DeepSeek), commands launched through the Codex exec/harness were unreliable in ways that did not reproduce in a normal terminal.

The concrete pattern:

  • The user could run this manually without issue:

    opencode run -m deepseek/deepseek-v4-pro 'Read tests/test_telegram_notify.py, then reply with the number of test functions only. Do not edit files.'

    It completed and returned the expected answer.

  • Similar OpenCode/DeepSeek runs launched through Codex/tmux/exec were unreliable: broad prompts stalled, behavior was hard to diagnose from the harness, and sandboxed/executed variants appeared to have trouble with OpenCode's normal home-directory state/lock/database files.

  • The likely failure surface was not DeepSeek or OpenCode itself. The same model/tool worked outside the Codex harness. The problem was the mismatch between Codex's sandbox/exec/harness environment and a normal host terminal when supervising an external agent CLI that maintains its own state.

  • The result was misleading for a supervisor workflow: Codex could not confidently tell whether the worker was actually blocked, sandboxed, waiting, or simply behaving differently under the harness.

Expected supervisor workflow:

  1. Codex reads and plans.
  2. Codex starts a tmux pane running an external worker CLI such as OpenCode/DeepSeek.
  3. Worker edits/tests.
  4. Codex reviews diffs and runs final verification.

That workflow works manually on the same machine, but was unreliable when Codex launched/managed the external CLI.

What steps can reproduce the bug?

Uploaded thread: 019e49fc-70cc-7632-b042-68686cbb5289

  1. On macOS, from a repo workspace, ask Codex to supervise implementation while delegating coding/testing to an external CLI worker in tmux, e.g. OpenCode with DeepSeek.

  2. Have Codex launch a worker command such as:

    opencode run -m deepseek/deepseek-v4-pro ''

    or run it inside a tmux pane from Codex.

  3. Observe that the worker can stall or behave unreliably from the Codex-managed execution path. The same tool/model can work normally from Terminal.

  4. In a normal Terminal, run a minimal smoke test:

    opencode run -m deepseek/deepseek-v4-pro 'Read tests/test_telegram_notify.py, then reply with the number of test functions only. Do not edit files.'

    In my case this completed successfully and returned the expected result.

  5. Compare that with the Codex-supervised/harness-launched run. Codex had difficulty determining whether the worker was blocked due to sandbox permissions, external CLI state files, model behavior, or harness management.

Searches I checked before filing:

  • opencode sandbox
  • deepseek opencode
  • SQLITE_READONLY OR EPERM opencode
  • external agent sandbox CLI

I did not find an issue specifically covering OpenCode/DeepSeek or external agent CLI supervision failing under Codex while the same command works in Terminal.

What is the expected behavior?

Codex should either support this external-supervisor workflow reliably or make the limitation explicit.

Expected behavior:

  • If a command is approved/escalated, Codex should make clear whether it is host-equivalent or still subject to sandbox/harness differences.
  • External CLI tools with normal home-directory state, lock files, and local databases should either work when approved, or Codex should surface the exact sandbox denial/state-path problem rather than leaving the supervising agent to infer it from a stalled worker.
  • A tmux-launched external CLI worker should be observable enough for Codex to distinguish: running, waiting for input, blocked by sandbox, failed due permission, failed due model/provider, or completed.
  • The same command should not behave materially differently under Codex without a clear warning or diagnostic.

This matters because external worker CLIs are a practical way to implement supervised multi-agent workflows where Codex remains responsible for planning, diff review, final verification, and merge decisions.

Additional information

This is not about Codex's built-in subagents. It is about Codex supervising an external CLI agent process, specifically OpenCode with DeepSeek, from a tmux/executed shell workflow.

Workarounds used:

  • The user manually ran OpenCode/DeepSeek in Terminal to confirm the tool/model worked outside Codex.
  • Codex narrowed prompts and switched from deepseek/deepseek-v4-pro to deepseek-chat for the delegated implementation after troubleshooting.
  • Codex treated worker output as untrusted, reviewed every diff manually, and ran final verification itself.

What would make this better:

  • Clear diagnostics when sandboxed commands cannot access external CLI state under ~/.local, ~/.cache, ~/.config, or similar paths.
  • A documented/reliable way for Codex to launch and monitor external supervisor-worker CLIs in tmux.
  • Explicit UI/tool output stating whether an escalated command is truly host-equivalent or still differs from a normal terminal.
  • Better status reporting for worker processes launched through the Codex harness: running, blocked on permission, waiting for input, exited, or unreachable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CLIIssues related to the Codex CLIbugSomething isn't workingexecIssues related to the `codex exec` subcommandsandboxIssues related to permissions or sandboxingtool-callsIssues related to tool calling

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions