Sandbox/harness makes external agent CLI supervision unreliable vs Terminal

### What version of Codex CLI is running?

codex-cli 0.132.0

### What subscription do you have?

ChatGPT Pro

### Which model were you using?

gpt-5.5 as supervising Codex model; external worker commands used opencode with deepseek/deepseek-v4-pro and deepseek-chat

### What platform is your computer?

Darwin 25.5.0 arm64 arm

### What terminal emulator and version are you using (if applicable)?

Ghostty with tmux 3.6b, zsh. Codex env reports TERM_PROGRAM=tmux, TERM=tmux-256color. Manual opencode commands were run from a normal terminal session.

### Codex doctor report

```json
codex doctor --json was available. Relevant excerpts from this Codex session:

- codexVersion: 0.132.0
- config.load: ok; model: gpt-5.5; feature flags include shell_tool, unified_exec, multi_agent, plugins, in_app_browser, browser_use, etc.
- sandbox.helpers: ok; approval policy OnRequest; filesystem sandbox restricted; network sandbox restricted
- state.paths: ok; state/log DBs inspectable under ~/.codex
- terminal.env: ok; tmux 3.6b; TERM=tmux-256color; shell=/bin/zsh
- network.provider_reachability: fail inside the Codex environment due sandboxed/network reachability, while the user terminal could run the external CLI successfully
- updates.status: warning due api.github.com DNS from the Codex environment

I can provide the full JSON if useful, but omitted it here to keep the issue focused and avoid unnecessary local path/account detail.
```

### What issue are you seeing?

When using Codex as a supervising agent and delegating implementation/testing to an external CLI worker (OpenCode with DeepSeek), commands launched through the Codex exec/harness were unreliable in ways that did not reproduce in a normal terminal.

The concrete pattern:

- The user could run this manually without issue:

  opencode run -m deepseek/deepseek-v4-pro 'Read tests/test_telegram_notify.py, then reply with the number of test functions only. Do not edit files.'

  It completed and returned the expected answer.

- Similar OpenCode/DeepSeek runs launched through Codex/tmux/exec were unreliable: broad prompts stalled, behavior was hard to diagnose from the harness, and sandboxed/executed variants appeared to have trouble with OpenCode's normal home-directory state/lock/database files.

- The likely failure surface was not DeepSeek or OpenCode itself. The same model/tool worked outside the Codex harness. The problem was the mismatch between Codex's sandbox/exec/harness environment and a normal host terminal when supervising an external agent CLI that maintains its own state.

- The result was misleading for a supervisor workflow: Codex could not confidently tell whether the worker was actually blocked, sandboxed, waiting, or simply behaving differently under the harness.

Expected supervisor workflow:

1. Codex reads and plans.
2. Codex starts a tmux pane running an external worker CLI such as OpenCode/DeepSeek.
3. Worker edits/tests.
4. Codex reviews diffs and runs final verification.

That workflow works manually on the same machine, but was unreliable when Codex launched/managed the external CLI.

### What steps can reproduce the bug?

Uploaded thread: 019e49fc-70cc-7632-b042-68686cbb5289

1. On macOS, from a repo workspace, ask Codex to supervise implementation while delegating coding/testing to an external CLI worker in tmux, e.g. OpenCode with DeepSeek.

2. Have Codex launch a worker command such as:

   opencode run -m deepseek/deepseek-v4-pro '<bounded repo task prompt>'

   or run it inside a tmux pane from Codex.

3. Observe that the worker can stall or behave unreliably from the Codex-managed execution path. The same tool/model can work normally from Terminal.

4. In a normal Terminal, run a minimal smoke test:

   opencode run -m deepseek/deepseek-v4-pro 'Read tests/test_telegram_notify.py, then reply with the number of test functions only. Do not edit files.'

   In my case this completed successfully and returned the expected result.

5. Compare that with the Codex-supervised/harness-launched run. Codex had difficulty determining whether the worker was blocked due to sandbox permissions, external CLI state files, model behavior, or harness management.

Searches I checked before filing:

- opencode sandbox
- deepseek opencode
- SQLITE_READONLY OR EPERM opencode
- external agent sandbox CLI

I did not find an issue specifically covering OpenCode/DeepSeek or external agent CLI supervision failing under Codex while the same command works in Terminal.

### What is the expected behavior?

Codex should either support this external-supervisor workflow reliably or make the limitation explicit.

Expected behavior:

- If a command is approved/escalated, Codex should make clear whether it is host-equivalent or still subject to sandbox/harness differences.
- External CLI tools with normal home-directory state, lock files, and local databases should either work when approved, or Codex should surface the exact sandbox denial/state-path problem rather than leaving the supervising agent to infer it from a stalled worker.
- A tmux-launched external CLI worker should be observable enough for Codex to distinguish: running, waiting for input, blocked by sandbox, failed due permission, failed due model/provider, or completed.
- The same command should not behave materially differently under Codex without a clear warning or diagnostic.

This matters because external worker CLIs are a practical way to implement supervised multi-agent workflows where Codex remains responsible for planning, diff review, final verification, and merge decisions.

### Additional information

This is not about Codex's built-in subagents. It is about Codex supervising an external CLI agent process, specifically OpenCode with DeepSeek, from a tmux/executed shell workflow.

Workarounds used:

- The user manually ran OpenCode/DeepSeek in Terminal to confirm the tool/model worked outside Codex.
- Codex narrowed prompts and switched from deepseek/deepseek-v4-pro to deepseek-chat for the delegated implementation after troubleshooting.
- Codex treated worker output as untrusted, reviewed every diff manually, and ran final verification itself.

What would make this better:

- Clear diagnostics when sandboxed commands cannot access external CLI state under ~/.local, ~/.cache, ~/.config, or similar paths.
- A documented/reliable way for Codex to launch and monitor external supervisor-worker CLIs in tmux.
- Explicit UI/tool output stating whether an escalated command is truly host-equivalent or still differs from a normal terminal.
- Better status reporting for worker processes launched through the Codex harness: running, blocked on permission, waiting for input, exited, or unreachable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sandbox/harness makes external agent CLI supervision unreliable vs Terminal #23857

What version of Codex CLI is running?

What subscription do you have?

Which model were you using?

What platform is your computer?

What terminal emulator and version are you using (if applicable)?

Codex doctor report

What issue are you seeing?

What steps can reproduce the bug?

What is the expected behavior?

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Sandbox/harness makes external agent CLI supervision unreliable vs Terminal #23857

Description

What version of Codex CLI is running?

What subscription do you have?

Which model were you using?

What platform is your computer?

What terminal emulator and version are you using (if applicable)?

Codex doctor report

What issue are you seeing?

What steps can reproduce the bug?

What is the expected behavior?

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions