feat(state): complete activity-state coverage for Claude / Codex / OpenCode by madarco · Pull Request #11 · madarco/agentbox

madarco · 2026-05-27T23:51:56Z

Summary

Stacks on top of PR #10 (the drive / agent / queue wait-for commands). PR #10 wired the command surface but the underlying state machine was lopsided: Claude reported 6 states, Codex hooks were staged but didn't fire (codex-cli 0.133.0 bug; confirmed still unreliable in 0.134.0), and OpenCode had no state reporting at all. This change makes all three agents report state symmetrically, adds first-class compacting and error states, and ships an OpenCode plugin to bridge its plugin-only extension surface.

Claude — fill in the missing hooks

PreCompact → compacting; PostCompact → working --clear-pending (defensive against /compact mid-plan).
StopFailure → error (cleared naturally by next UserPromptSubmit).
SubagentStart/SubagentStop → working (subagents keep state at working from the parent's view; explicit re-assertions tighten the catchall PreToolUse race).
Adds 'compacting' | 'error' to ClaudeActivityState (shared by Codex/OpenCode via AgentActivityState).

Codex — tmux-pane scraper as the actual mechanism

Codex's ~/.codex/hooks.json firing is still unreliable in 0.134.0 even with --enable hooks --dangerously-bypass-hook-trust. The fallback (per plan approval): a small packages/ctl/src/codex-scraper.ts polls tmux capture-pane -p -t codex every 1s, matches against an ordered pattern table, pushes through the existing codex-state socket op only on transitions (no 1Hz heartbeat). Patterns cover trust dialogs / permission prompts (waiting), /compact (compacting), error frames (error), codex-specific active-work TUI fragments (working), and the model·cwd idle footer (idle). The hooks.json shape is also fixed (top-level → { hooks: { Event: [...] } } to match codex 0.134.0's HooksFile struct) and startCodexSession now passes --enable hooks --dangerously-bypass-hook-trust — defense-in-depth for when codex's JSON-hook firing eventually becomes reliable.

OpenCode — first-class plugin

OpenCode has no hooks system but does have a plugin event bus. New packages/sandbox-docker/scripts/opencode-agentbox-plugin.js subscribes to session.idle, permission.asked, tool.execute.before, session.compacted, session.error, etc. and shells agentbox-ctl opencode-state for each transition. Baked into the image; seeded into <vol>/config/plugins/agentbox-state.js by a new seedOpencodePlugin helper (mirrors seedCodexHooks). New opencode-state socket op + setOpencodeState reporter method + state field on BoxStatusOpencode so the host can agent state <opencode-box> / agent wait-for waiting <opencode-box>.

Surface in `agentbox agent` + skill doc

agent wait-for compacting | error now work.
~/.claude/skills/agentbox/SKILL.md updated: the agent section is no longer Claude-only and documents the three sources (Claude hooks / Codex scraper / OpenCode plugin).

Test plan

pnpm -r typecheck clean
pnpm lint clean
pnpm -r test — 327 + 1 skipped (cloud-e2e); 22 new tests:
- status-reporter.test.ts (+3): compacting round-trip, error non-sticky, subagent re-assertion.
- codex-scraper.test.ts (+12): pattern table priority + false-positive guards (the directory-trust prompt's "Working with untrusted contents" must NOT match working), change-detection across session up/down, baseline idle on session reappearance.
- agent-state.test.ts (+4): new compacting / error matchers; stale-payload guard.
Live-verified on a fresh docker box:
- Created with agentbox codex --no-attach, the scraper picked up the directory-trust dialog as waiting, then the model·cwd idle footer as idle after the trust prompts cleared. Confirms the daemon-scraper wiring + the new pattern table on a real codex 0.134.0 TUI.
OpenCode plugin live-verify deferred — requires an OpenCode provider login set up; the seed step + opencode-state wire op are unit-validated.
Cloud provider parity (Daytona / Hetzner) — these snapshots pick up the new managed-settings / hooks file / plugin automatically on the next agentbox prepare. No code changes needed.

Note

Medium Risk
Changes how host automation observes agent readiness (new states, 1s Codex tmux scraping, plugin-spawned ctl calls); misclassified pane patterns could skew wait-for, but no auth or data-path changes.

Overview
Symmetric activity reporting for Claude, Codex, and OpenCode: shared states now include compacting and error, and the host can agent wait-for on them.

Claude adds managed-settings hooks for compaction (PreCompact / PostCompact with --clear-pending), turn failures (StopFailure → error), and subagent lifecycle (re-assert working).

Codex treats unreliable JSON hooks as defense-in-depth: the ctl daemon runs a tmux pane scraper (~1s capture-pane, ordered regex table, updates only on transitions) as the production path. Hooks JSON is reshaped to Codex 0.134’s { hooks: { … } } layout, and startCodexSession passes --enable hooks and --dangerously-bypass-hook-trust.

OpenCode gets a baked plugin that maps bus events to agentbox-ctl opencode-state, plus seedOpencodePlugin, socket/client/reporter wiring, and state / updatedAt on box status (parity with Codex). Runtime staging and the box image copy the plugin alongside existing hook assets.

Tests cover scraper patterns, reporter compaction/error behavior, wait-state matching, and a less flaky supervisor parallelism timing threshold.

^{Reviewed by Cursor Bugbot for commit c9f445f. Configure here.}

…enCode Previously the in-box activity-state pipeline was lopsided: - Claude reported 6 states via lifecycle hooks (working/idle/waiting/end-plan/question/unknown). - Codex hooks were staged but didn't fire in codex-cli 0.133.0; state stayed `unknown` in production. Bumping to 0.134.0 doesn't fix it (the JSON-hook firing path is still unreliable in TUI mode even with `--enable hooks --dangerously-bypass-hook-trust`). - OpenCode had zero state reporting — only the tmux session probe. This change makes all three agents report state symmetrically with first-class `compacting` and `error` values added to the union. **Claude — fill in the missing hooks** (packages/sandbox-docker/scripts/claude-managed-settings.json): - PreCompact → `compacting`, PostCompact → `working --clear-pending` (clears any pending plan/question payload too — defensive against `/compact` mid-plan). - StopFailure → `error` (cleared naturally by the next UserPromptSubmit). - SubagentStart/Stop → `working` (subagents keep state at working from the parent's view). Extends `ClaudeActivityState` (shared by Codex/OpenCode via `AgentActivityState`) with `'compacting' | 'error'`. No new sticky logic needed — only `working` is special-cased. **Codex — tmux-pane scraper** (packages/ctl/src/codex-scraper.ts): - Polls `tmux capture-pane -p -t codex` every 1s, matches against an ordered pattern table, pushes through the existing `codex-state` socket op only on transitions. - Patterns cover: trust dialogs (`waiting`), permission prompts (`waiting`), compaction (`compacting`), error frames (`error`), codex-specific active-work TUI fragments (`working`), and the model·cwd idle footer (`idle`). - The hooks.json shape is also corrected (was top-level events; codex 0.134.0 expects `{ hooks: { Event: [...] } }`) and `startCodexSession` now passes `--enable hooks --dangerously-bypass-hook-trust` — defense-in-depth for when codex's JSON-hook firing becomes reliable. **OpenCode — first-class plugin** (packages/sandbox-docker/scripts/opencode-agentbox-plugin.js): - New plugin that subscribes to OpenCode's plugin event bus (`session.idle`, `permission.asked`, `tool.execute.before`, `session.compacted`, `session.error`, ...) and shells `agentbox-ctl opencode-state` for each lifecycle transition. - Baked into the image at `/usr/local/share/agentbox/opencode-agentbox-plugin.js`; seeded into `<vol>/config/plugins/agentbox-state.js` by a new `seedOpencodePlugin` helper (mirrors `seedCodexHooks`). - New `opencode-state` socket op + `setOpencodeState` reporter method + `state` field on `BoxStatusOpencode` so the host can `agent state` / `agent wait-for` against OpenCode boxes too. **Surface in `agentbox agent`** (apps/cli/src/lib/wait/agent-state.ts + skill doc): - `agent wait-for compacting | error` now work. - Skill doc updated to call out that `agent` is no longer Claude-only. 26 new unit tests: 6 cover the new states + sticky / clearPending interactions (packages/ctl/test/status-reporter.test.ts), 12 cover the scraper's pattern table and change-detection logic (packages/ctl/test/codex-scraper.test.ts), 4 cover the new agent-wait-state matchers (apps/cli/test/agent-state.test.ts). Live-verified on a fresh docker box: codex state transitions waiting → idle as the trust dialog clears; all 327 + 1 skipped tests pass; typecheck + lint clean.

… to this PR) The supervisor `concurrent independent tasks run in parallel` test flaked on PR #11's CI run with 733ms observed against a 700ms cap — a 33ms slip on a slow GitHub Actions runner. The test isn't touched by this PR, just caught in the cross-fire. Double the per-task delay (400ms → 800ms) so the parallel/sequential gap is ~600ms of headroom instead of ~50ms, and bump the threshold proportionally (700ms → 1400ms). Sequential execution would now take ≥1600ms + 2× spawn (~1700-1900ms) so the test still catches the regression it was written for; local run completes in 829ms.

OpenCode persists the selected model in ~/.local/state/opencode/model.json, a third XDG dir AgentBox synced in neither direction — so a box booted with OpenCode's default (Gemini) instead of the host's choice, and a model picked inside a box was lost on destroy/recreate. Docker: relocate the state dir into the existing opencode volume via XDG_STATE_HOME, and seed host model.json into it newest-wins (--update) so a stale host file can't clobber an in-box selection; the shared volume then carries it across recreate. Cloud (Daytona + Hetzner): seed host model.json into the box's default state path per-create from the shared cloud create flow. Cloud has no persistent per-box store, so each box reflects the current host model.

Live smoke against opencode 1.15.11 showed the original event map never produced a `working` state: - `tool.execute.before` does NOT reach the plugin event bus in 1.15. - `message.updated` fires once AFTER `session.idle`, so mapping it to `working` would leave the box stuck `working` at end of turn. Switch the working signal to `message.part.delta` (streamed tokens), which fires only during active generation and always before `session.idle`. Add a last-state dedupe so a streamed turn costs ~2 `agentbox-ctl` spawns (working on first delta, idle on session.idle) instead of one per delta (~50). Verified end-to-end: state transitions idle → working (t+3–8s of a long generation) → idle (session.idle), sampled live on a docker box.

…expiry) OIDC dev tokens expire on a ~12h cycle and resolveCredentials has no headless refresh, so a long `agentbox prepare --provider vercel` (or CI) can outlive the token. Document the access-token trio as the recommended path for long/headless jobs in cloud-providers.md, and surface the same guidance in the `vercel login` prompt + option labels. Closes backlog #11 (documentation half; auto-refresh stays unbuilt).

madarco added 4 commits May 28, 2026 00:51

madarco merged commit a557665 into main May 28, 2026
1 check passed

madarco mentioned this pull request May 29, 2026

Vercel provider backlog: 9 items done + plans for the rest #22

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(state): complete activity-state coverage for Claude / Codex / OpenCode#11

feat(state): complete activity-state coverage for Claude / Codex / OpenCode#11
madarco merged 4 commits into
mainfrom
feat/agent-state-coverage

madarco commented May 27, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

madarco commented May 27, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Claude — fill in the missing hooks

Codex — tmux-pane scraper as the actual mechanism

OpenCode — first-class plugin

Surface in agentbox agent + skill doc

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

madarco commented May 27, 2026 •

edited by cursor Bot

Loading

Surface in `agentbox agent` + skill doc