feat(state): complete activity-state coverage for Claude / Codex / OpenCode#11
Merged
Conversation
…enCode
Previously the in-box activity-state pipeline was lopsided:
- Claude reported 6 states via lifecycle hooks (working/idle/waiting/end-plan/question/unknown).
- Codex hooks were staged but didn't fire in codex-cli 0.133.0; state stayed `unknown` in
production. Bumping to 0.134.0 doesn't fix it (the JSON-hook firing path is still
unreliable in TUI mode even with `--enable hooks --dangerously-bypass-hook-trust`).
- OpenCode had zero state reporting — only the tmux session probe.
This change makes all three agents report state symmetrically with first-class
`compacting` and `error` values added to the union.
**Claude — fill in the missing hooks** (packages/sandbox-docker/scripts/claude-managed-settings.json):
- PreCompact → `compacting`, PostCompact → `working --clear-pending` (clears any
pending plan/question payload too — defensive against `/compact` mid-plan).
- StopFailure → `error` (cleared naturally by the next UserPromptSubmit).
- SubagentStart/Stop → `working` (subagents keep state at working from the parent's view).
Extends `ClaudeActivityState` (shared by Codex/OpenCode via `AgentActivityState`) with
`'compacting' | 'error'`. No new sticky logic needed — only `working` is special-cased.
**Codex — tmux-pane scraper** (packages/ctl/src/codex-scraper.ts):
- Polls `tmux capture-pane -p -t codex` every 1s, matches against an ordered pattern
table, pushes through the existing `codex-state` socket op only on transitions.
- Patterns cover: trust dialogs (`waiting`), permission prompts (`waiting`), compaction
(`compacting`), error frames (`error`), codex-specific active-work TUI fragments
(`working`), and the model·cwd idle footer (`idle`).
- The hooks.json shape is also corrected (was top-level events; codex 0.134.0 expects
`{ hooks: { Event: [...] } }`) and `startCodexSession` now passes `--enable hooks
--dangerously-bypass-hook-trust` — defense-in-depth for when codex's JSON-hook firing
becomes reliable.
**OpenCode — first-class plugin** (packages/sandbox-docker/scripts/opencode-agentbox-plugin.js):
- New plugin that subscribes to OpenCode's plugin event bus (`session.idle`,
`permission.asked`, `tool.execute.before`, `session.compacted`, `session.error`, ...)
and shells `agentbox-ctl opencode-state` for each lifecycle transition.
- Baked into the image at `/usr/local/share/agentbox/opencode-agentbox-plugin.js`;
seeded into `<vol>/config/plugins/agentbox-state.js` by a new `seedOpencodePlugin`
helper (mirrors `seedCodexHooks`).
- New `opencode-state` socket op + `setOpencodeState` reporter method + `state` field
on `BoxStatusOpencode` so the host can `agent state` / `agent wait-for` against
OpenCode boxes too.
**Surface in `agentbox agent`** (apps/cli/src/lib/wait/agent-state.ts + skill doc):
- `agent wait-for compacting | error` now work.
- Skill doc updated to call out that `agent` is no longer Claude-only.
26 new unit tests: 6 cover the new states + sticky / clearPending interactions
(packages/ctl/test/status-reporter.test.ts), 12 cover the scraper's pattern table and
change-detection logic (packages/ctl/test/codex-scraper.test.ts), 4 cover the new
agent-wait-state matchers (apps/cli/test/agent-state.test.ts). Live-verified on a
fresh docker box: codex state transitions waiting → idle as the trust dialog clears;
all 327 + 1 skipped tests pass; typecheck + lint clean.
… to this PR) The supervisor `concurrent independent tasks run in parallel` test flaked on PR #11's CI run with 733ms observed against a 700ms cap — a 33ms slip on a slow GitHub Actions runner. The test isn't touched by this PR, just caught in the cross-fire. Double the per-task delay (400ms → 800ms) so the parallel/sequential gap is ~600ms of headroom instead of ~50ms, and bump the threshold proportionally (700ms → 1400ms). Sequential execution would now take ≥1600ms + 2× spawn (~1700-1900ms) so the test still catches the regression it was written for; local run completes in 829ms.
OpenCode persists the selected model in ~/.local/state/opencode/model.json, a third XDG dir AgentBox synced in neither direction — so a box booted with OpenCode's default (Gemini) instead of the host's choice, and a model picked inside a box was lost on destroy/recreate. Docker: relocate the state dir into the existing opencode volume via XDG_STATE_HOME, and seed host model.json into it newest-wins (--update) so a stale host file can't clobber an in-box selection; the shared volume then carries it across recreate. Cloud (Daytona + Hetzner): seed host model.json into the box's default state path per-create from the shared cloud create flow. Cloud has no persistent per-box store, so each box reflects the current host model.
Live smoke against opencode 1.15.11 showed the original event map never produced a `working` state: - `tool.execute.before` does NOT reach the plugin event bus in 1.15. - `message.updated` fires once AFTER `session.idle`, so mapping it to `working` would leave the box stuck `working` at end of turn. Switch the working signal to `message.part.delta` (streamed tokens), which fires only during active generation and always before `session.idle`. Add a last-state dedupe so a streamed turn costs ~2 `agentbox-ctl` spawns (working on first delta, idle on session.idle) instead of one per delta (~50). Verified end-to-end: state transitions idle → working (t+3–8s of a long generation) → idle (session.idle), sampled live on a docker box.
madarco
added a commit
that referenced
this pull request
May 28, 2026
…expiry) OIDC dev tokens expire on a ~12h cycle and resolveCredentials has no headless refresh, so a long `agentbox prepare --provider vercel` (or CI) can outlive the token. Document the access-token trio as the recommended path for long/headless jobs in cloud-providers.md, and surface the same guidance in the `vercel login` prompt + option labels. Closes backlog #11 (documentation half; auto-refresh stays unbuilt).
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacks on top of PR #10 (the
drive/agent/queue wait-forcommands). PR #10 wired the command surface but the underlying state machine was lopsided: Claude reported 6 states, Codex hooks were staged but didn't fire (codex-cli 0.133.0 bug; confirmed still unreliable in 0.134.0), and OpenCode had no state reporting at all. This change makes all three agents report state symmetrically, adds first-classcompactinganderrorstates, and ships an OpenCode plugin to bridge its plugin-only extension surface.Claude — fill in the missing hooks
PreCompact→compacting;PostCompact→working --clear-pending(defensive against/compactmid-plan).StopFailure→error(cleared naturally by nextUserPromptSubmit).SubagentStart/SubagentStop→working(subagents keep state at working from the parent's view; explicit re-assertions tighten the catchall PreToolUse race).'compacting' | 'error'toClaudeActivityState(shared by Codex/OpenCode viaAgentActivityState).Codex — tmux-pane scraper as the actual mechanism
Codex's
~/.codex/hooks.jsonfiring is still unreliable in 0.134.0 even with--enable hooks --dangerously-bypass-hook-trust. The fallback (per plan approval): a smallpackages/ctl/src/codex-scraper.tspollstmux capture-pane -p -t codexevery 1s, matches against an ordered pattern table, pushes through the existingcodex-statesocket op only on transitions (no 1Hz heartbeat). Patterns cover trust dialogs / permission prompts (waiting),/compact(compacting), error frames (error), codex-specific active-work TUI fragments (working), and the model·cwd idle footer (idle). The hooks.json shape is also fixed (top-level →{ hooks: { Event: [...] } }to match codex 0.134.0'sHooksFilestruct) andstartCodexSessionnow passes--enable hooks --dangerously-bypass-hook-trust— defense-in-depth for when codex's JSON-hook firing eventually becomes reliable.OpenCode — first-class plugin
OpenCode has no hooks system but does have a plugin event bus. New
packages/sandbox-docker/scripts/opencode-agentbox-plugin.jssubscribes tosession.idle,permission.asked,tool.execute.before,session.compacted,session.error, etc. and shellsagentbox-ctl opencode-statefor each transition. Baked into the image; seeded into<vol>/config/plugins/agentbox-state.jsby a newseedOpencodePluginhelper (mirrorsseedCodexHooks). Newopencode-statesocket op +setOpencodeStatereporter method +statefield onBoxStatusOpencodeso the host canagent state <opencode-box>/agent wait-for waiting <opencode-box>.Surface in
agentbox agent+ skill docagent wait-for compacting | errornow work.~/.claude/skills/agentbox/SKILL.mdupdated: the agent section is no longer Claude-only and documents the three sources (Claude hooks / Codex scraper / OpenCode plugin).Test plan
pnpm -r typecheckcleanpnpm lintcleanpnpm -r test— 327 + 1 skipped (cloud-e2e); 22 new tests:status-reporter.test.ts(+3):compactinground-trip,errornon-sticky, subagent re-assertion.codex-scraper.test.ts(+12): pattern table priority + false-positive guards (the directory-trust prompt's "Working with untrusted contents" must NOT matchworking), change-detection across session up/down, baseline idle on session reappearance.agent-state.test.ts(+4): newcompacting/errormatchers; stale-payload guard.agentbox codex --no-attach, the scraper picked up the directory-trust dialog aswaiting, then the model·cwd idle footer asidleafter the trust prompts cleared. Confirms the daemon-scraper wiring + the new pattern table on a real codex 0.134.0 TUI.agentbox prepare. No code changes needed.Note
Medium Risk
Changes how host automation observes agent readiness (new states, 1s Codex tmux scraping, plugin-spawned ctl calls); misclassified pane patterns could skew wait-for, but no auth or data-path changes.
Overview
Symmetric activity reporting for Claude, Codex, and OpenCode: shared states now include
compactinganderror, and the host canagent wait-foron them.Claude adds managed-settings hooks for compaction (
PreCompact/PostCompactwith--clear-pending), turn failures (StopFailure→error), and subagent lifecycle (re-assertworking).Codex treats unreliable JSON hooks as defense-in-depth: the ctl daemon runs a tmux pane scraper (~1s
capture-pane, ordered regex table, updates only on transitions) as the production path. Hooks JSON is reshaped to Codex 0.134’s{ hooks: { … } }layout, andstartCodexSessionpasses--enable hooksand--dangerously-bypass-hook-trust.OpenCode gets a baked plugin that maps bus events to
agentbox-ctl opencode-state, plusseedOpencodePlugin, socket/client/reporter wiring, andstate/updatedAton box status (parity with Codex). Runtime staging and the box image copy the plugin alongside existing hook assets.Tests cover scraper patterns, reporter compaction/error behavior, wait-state matching, and a less flaky supervisor parallelism timing threshold.
Reviewed by Cursor Bugbot for commit c9f445f. Configure here.