Skip to content

feat(state): complete activity-state coverage for Claude / Codex / OpenCode#11

Merged
madarco merged 4 commits into
mainfrom
feat/agent-state-coverage
May 28, 2026
Merged

feat(state): complete activity-state coverage for Claude / Codex / OpenCode#11
madarco merged 4 commits into
mainfrom
feat/agent-state-coverage

Conversation

@madarco
Copy link
Copy Markdown
Owner

@madarco madarco commented May 27, 2026

Summary

Stacks on top of PR #10 (the drive / agent / queue wait-for commands). PR #10 wired the command surface but the underlying state machine was lopsided: Claude reported 6 states, Codex hooks were staged but didn't fire (codex-cli 0.133.0 bug; confirmed still unreliable in 0.134.0), and OpenCode had no state reporting at all. This change makes all three agents report state symmetrically, adds first-class compacting and error states, and ships an OpenCode plugin to bridge its plugin-only extension surface.

Claude — fill in the missing hooks

  • PreCompactcompacting; PostCompactworking --clear-pending (defensive against /compact mid-plan).
  • StopFailureerror (cleared naturally by next UserPromptSubmit).
  • SubagentStart/SubagentStopworking (subagents keep state at working from the parent's view; explicit re-assertions tighten the catchall PreToolUse race).
  • Adds 'compacting' | 'error' to ClaudeActivityState (shared by Codex/OpenCode via AgentActivityState).

Codex — tmux-pane scraper as the actual mechanism

Codex's ~/.codex/hooks.json firing is still unreliable in 0.134.0 even with --enable hooks --dangerously-bypass-hook-trust. The fallback (per plan approval): a small packages/ctl/src/codex-scraper.ts polls tmux capture-pane -p -t codex every 1s, matches against an ordered pattern table, pushes through the existing codex-state socket op only on transitions (no 1Hz heartbeat). Patterns cover trust dialogs / permission prompts (waiting), /compact (compacting), error frames (error), codex-specific active-work TUI fragments (working), and the model·cwd idle footer (idle). The hooks.json shape is also fixed (top-level → { hooks: { Event: [...] } } to match codex 0.134.0's HooksFile struct) and startCodexSession now passes --enable hooks --dangerously-bypass-hook-trust — defense-in-depth for when codex's JSON-hook firing eventually becomes reliable.

OpenCode — first-class plugin

OpenCode has no hooks system but does have a plugin event bus. New packages/sandbox-docker/scripts/opencode-agentbox-plugin.js subscribes to session.idle, permission.asked, tool.execute.before, session.compacted, session.error, etc. and shells agentbox-ctl opencode-state for each transition. Baked into the image; seeded into <vol>/config/plugins/agentbox-state.js by a new seedOpencodePlugin helper (mirrors seedCodexHooks). New opencode-state socket op + setOpencodeState reporter method + state field on BoxStatusOpencode so the host can agent state <opencode-box> / agent wait-for waiting <opencode-box>.

Surface in agentbox agent + skill doc

  • agent wait-for compacting | error now work.
  • ~/.claude/skills/agentbox/SKILL.md updated: the agent section is no longer Claude-only and documents the three sources (Claude hooks / Codex scraper / OpenCode plugin).

Test plan

  • pnpm -r typecheck clean
  • pnpm lint clean
  • pnpm -r test — 327 + 1 skipped (cloud-e2e); 22 new tests:
    • status-reporter.test.ts (+3): compacting round-trip, error non-sticky, subagent re-assertion.
    • codex-scraper.test.ts (+12): pattern table priority + false-positive guards (the directory-trust prompt's "Working with untrusted contents" must NOT match working), change-detection across session up/down, baseline idle on session reappearance.
    • agent-state.test.ts (+4): new compacting / error matchers; stale-payload guard.
  • Live-verified on a fresh docker box:
    • Created with agentbox codex --no-attach, the scraper picked up the directory-trust dialog as waiting, then the model·cwd idle footer as idle after the trust prompts cleared. Confirms the daemon-scraper wiring + the new pattern table on a real codex 0.134.0 TUI.
  • OpenCode plugin live-verify deferred — requires an OpenCode provider login set up; the seed step + opencode-state wire op are unit-validated.
  • Cloud provider parity (Daytona / Hetzner) — these snapshots pick up the new managed-settings / hooks file / plugin automatically on the next agentbox prepare. No code changes needed.

Note

Medium Risk
Changes how host automation observes agent readiness (new states, 1s Codex tmux scraping, plugin-spawned ctl calls); misclassified pane patterns could skew wait-for, but no auth or data-path changes.

Overview
Symmetric activity reporting for Claude, Codex, and OpenCode: shared states now include compacting and error, and the host can agent wait-for on them.

Claude adds managed-settings hooks for compaction (PreCompact / PostCompact with --clear-pending), turn failures (StopFailureerror), and subagent lifecycle (re-assert working).

Codex treats unreliable JSON hooks as defense-in-depth: the ctl daemon runs a tmux pane scraper (~1s capture-pane, ordered regex table, updates only on transitions) as the production path. Hooks JSON is reshaped to Codex 0.134’s { hooks: { … } } layout, and startCodexSession passes --enable hooks and --dangerously-bypass-hook-trust.

OpenCode gets a baked plugin that maps bus events to agentbox-ctl opencode-state, plus seedOpencodePlugin, socket/client/reporter wiring, and state / updatedAt on box status (parity with Codex). Runtime staging and the box image copy the plugin alongside existing hook assets.

Tests cover scraper patterns, reporter compaction/error behavior, wait-state matching, and a less flaky supervisor parallelism timing threshold.

Reviewed by Cursor Bugbot for commit c9f445f. Configure here.

madarco added 4 commits May 28, 2026 00:51
…enCode

Previously the in-box activity-state pipeline was lopsided:
- Claude reported 6 states via lifecycle hooks (working/idle/waiting/end-plan/question/unknown).
- Codex hooks were staged but didn't fire in codex-cli 0.133.0; state stayed `unknown` in
  production. Bumping to 0.134.0 doesn't fix it (the JSON-hook firing path is still
  unreliable in TUI mode even with `--enable hooks --dangerously-bypass-hook-trust`).
- OpenCode had zero state reporting — only the tmux session probe.

This change makes all three agents report state symmetrically with first-class
`compacting` and `error` values added to the union.

**Claude — fill in the missing hooks** (packages/sandbox-docker/scripts/claude-managed-settings.json):
- PreCompact → `compacting`, PostCompact → `working --clear-pending` (clears any
  pending plan/question payload too — defensive against `/compact` mid-plan).
- StopFailure → `error` (cleared naturally by the next UserPromptSubmit).
- SubagentStart/Stop → `working` (subagents keep state at working from the parent's view).

Extends `ClaudeActivityState` (shared by Codex/OpenCode via `AgentActivityState`) with
`'compacting' | 'error'`. No new sticky logic needed — only `working` is special-cased.

**Codex — tmux-pane scraper** (packages/ctl/src/codex-scraper.ts):
- Polls `tmux capture-pane -p -t codex` every 1s, matches against an ordered pattern
  table, pushes through the existing `codex-state` socket op only on transitions.
- Patterns cover: trust dialogs (`waiting`), permission prompts (`waiting`), compaction
  (`compacting`), error frames (`error`), codex-specific active-work TUI fragments
  (`working`), and the model·cwd idle footer (`idle`).
- The hooks.json shape is also corrected (was top-level events; codex 0.134.0 expects
  `{ hooks: { Event: [...] } }`) and `startCodexSession` now passes `--enable hooks
  --dangerously-bypass-hook-trust` — defense-in-depth for when codex's JSON-hook firing
  becomes reliable.

**OpenCode — first-class plugin** (packages/sandbox-docker/scripts/opencode-agentbox-plugin.js):
- New plugin that subscribes to OpenCode's plugin event bus (`session.idle`,
  `permission.asked`, `tool.execute.before`, `session.compacted`, `session.error`, ...)
  and shells `agentbox-ctl opencode-state` for each lifecycle transition.
- Baked into the image at `/usr/local/share/agentbox/opencode-agentbox-plugin.js`;
  seeded into `<vol>/config/plugins/agentbox-state.js` by a new `seedOpencodePlugin`
  helper (mirrors `seedCodexHooks`).
- New `opencode-state` socket op + `setOpencodeState` reporter method + `state` field
  on `BoxStatusOpencode` so the host can `agent state` / `agent wait-for` against
  OpenCode boxes too.

**Surface in `agentbox agent`** (apps/cli/src/lib/wait/agent-state.ts + skill doc):
- `agent wait-for compacting | error` now work.
- Skill doc updated to call out that `agent` is no longer Claude-only.

26 new unit tests: 6 cover the new states + sticky / clearPending interactions
(packages/ctl/test/status-reporter.test.ts), 12 cover the scraper's pattern table and
change-detection logic (packages/ctl/test/codex-scraper.test.ts), 4 cover the new
agent-wait-state matchers (apps/cli/test/agent-state.test.ts). Live-verified on a
fresh docker box: codex state transitions waiting → idle as the trust dialog clears;
all 327 + 1 skipped tests pass; typecheck + lint clean.
… to this PR)

The supervisor `concurrent independent tasks run in parallel` test flaked
on PR #11's CI run with 733ms observed against a 700ms cap — a 33ms slip on a
slow GitHub Actions runner. The test isn't touched by this PR, just caught
in the cross-fire.

Double the per-task delay (400ms → 800ms) so the parallel/sequential gap is
~600ms of headroom instead of ~50ms, and bump the threshold proportionally
(700ms → 1400ms). Sequential execution would now take ≥1600ms + 2× spawn
(~1700-1900ms) so the test still catches the regression it was written for;
local run completes in 829ms.
OpenCode persists the selected model in ~/.local/state/opencode/model.json,
a third XDG dir AgentBox synced in neither direction — so a box booted with
OpenCode's default (Gemini) instead of the host's choice, and a model picked
inside a box was lost on destroy/recreate.

Docker: relocate the state dir into the existing opencode volume via
XDG_STATE_HOME, and seed host model.json into it newest-wins (--update) so a
stale host file can't clobber an in-box selection; the shared volume then
carries it across recreate.

Cloud (Daytona + Hetzner): seed host model.json into the box's default state
path per-create from the shared cloud create flow. Cloud has no persistent
per-box store, so each box reflects the current host model.
Live smoke against opencode 1.15.11 showed the original event map never
produced a `working` state:
- `tool.execute.before` does NOT reach the plugin event bus in 1.15.
- `message.updated` fires once AFTER `session.idle`, so mapping it to
  `working` would leave the box stuck `working` at end of turn.

Switch the working signal to `message.part.delta` (streamed tokens), which
fires only during active generation and always before `session.idle`. Add a
last-state dedupe so a streamed turn costs ~2 `agentbox-ctl` spawns (working
on first delta, idle on session.idle) instead of one per delta (~50).

Verified end-to-end: state transitions idle → working (t+3–8s of a long
generation) → idle (session.idle), sampled live on a docker box.
@madarco madarco merged commit a557665 into main May 28, 2026
1 check passed
madarco added a commit that referenced this pull request May 28, 2026
…expiry)

OIDC dev tokens expire on a ~12h cycle and resolveCredentials has no headless
refresh, so a long `agentbox prepare --provider vercel` (or CI) can outlive the
token. Document the access-token trio as the recommended path for long/headless
jobs in cloud-providers.md, and surface the same guidance in the `vercel login`
prompt + option labels.

Closes backlog #11 (documentation half; auto-refresh stays unbuilt).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant