Skip to content

feat(codex-fleet): plan-watcher supervisor + claude-runtime fallback#46

Merged
NagyVikt merged 1 commit into
mainfrom
agent/claude/plan-watcher-supervisor-and-fleet-spawn-2026-05-14-23-27
May 14, 2026
Merged

feat(codex-fleet): plan-watcher supervisor + claude-runtime fallback#46
NagyVikt merged 1 commit into
mainfrom
agent/claude/plan-watcher-supervisor-and-fleet-spawn-2026-05-14-23-27

Conversation

@NagyVikt
Copy link
Copy Markdown
Contributor

Automated by gx branch finish (PR flow).

Closes the longest-running pain in the fleet: workers stay idle after a
new Colony plan publishes because their OVERRIDE pinning still points at
an exhausted older plan. Previously every plan switch required manually
tmux-send-keys'ing each pane — the screenshot from tonight shows 6/7
workers stuck on phase5/sub-1's stale blocker while typed-layout-dsl/
sub-0 sat untouched.

Three pieces:

1. scripts/codex-fleet/plan-watcher.sh (NEW)
   Every 30s: parses `colony plan status` for plans with available > 0,
   detects idle worker panes (matching "no claimable work" / "polls
   returned" / "stale blocker" in recent stdout), and round-robins an
   OVERRIDE prompt to each idle pane via tmux load-buffer + paste-buffer.
   Per-(pane, plan) cooldown (10 min default) prevents stomping on
   workers mid-task. --dry-run + --once for verification; --loop for
   daemon mode.

2. scripts/codex-fleet/full-bringup.sh wiring
   plan-watcher added as a ticker_window daemon, sibling to
   force-claim / claim-release / stall-watcher. Plus the worker spawn
   line now branches on $CODEX_FLEET_RUNTIME: `codex` (default) or
   `claude` — the latter spawns `claude --print --permission-mode
   bypassPermissions` instead of the codex CLI for the fallback case
   when the codex account pool is exhausted.

3. scripts/codex-fleet/spawn-fleet.sh: add --runtime claude|codex flag
   that forwards CODEX_FLEET_RUNTIME to the bringup. Lets operators
   start a parallel fleet on Claude when codex is capped.

Verification: live test on the running codex-fleet session — `plan-
watcher --once` correctly dispatched 4 idle workers across 3 available
plans (phase5, overlay-modulesplit, typed-layout-dsl) in one tick.
Workers re-entered "Working" state immediately on prompt receipt.

Trade-offs / follow-ups:
- Claude runtime path is the scaffold; the worker-prompt template lives
  in scripts/codex-fleet/worker-prompt.md and was written for codex.
  It works against claude without modification but should grow a
  runtime-aware variant for full parity.
- plan-watcher considers any plan with available > 0, including
  on-disk-only (unpublished) plans. Worker will fail to claim from
  unpublished plans and fall back to task_ready_for_agent. Filtering
  to published-only is a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@NagyVikt NagyVikt merged commit 7aba240 into main May 14, 2026
@NagyVikt NagyVikt deleted the agent/claude/plan-watcher-supervisor-and-fleet-spawn-2026-05-14-23-27 branch May 14, 2026 21:36
NagyVikt added a commit that referenced this pull request May 20, 2026
…t sweep) (#199)

Audited docs/future/PROTOCOL.md against the actual repo + git history and
flipped 20 stale `state: PROPOSED` lines to `state: SHIPPED` for items
whose implementation has clearly landed on main. Each item's acceptance
verification command runs green and the touched script/file already
implements the proposal.

Updates (section → slug → representative PR):

- 1.4.1 Formal lifecycle states for every improvement entry → PR #171
  (scripts/protocol/{protocol-state,check-states}.sh + state lines)
- 7.4.1 --help/--version on stall-watcher.sh → PR #1
- 9.4.1 --help/--version on plan-watcher.sh → PR #46
- 10.4.1 --help/--version on review-queue.sh → PR #37
- 11.4.1 --help/--version on review-pane-scanner.sh → PR #41
- 12.4.1 --help/--version on auto-reviewer.sh → PR #99
- 17.4.1 --help/--version on show-fleet.sh → PR #139
- 18.4.1 --help/--version on token-meter.sh → PR #17
- 19.4.1 --help/--version on warm-pool.sh → PR #1
- 20.4.1 --help/--version on spawn-fleet.sh → PR #1
- 21.4.1 --help/--version on dispatch-plan.sh → PR #133
- 23.4.1 --help/--version on proactive-probe.sh → PR #1
- 24.4.1 --help/--version on claim-trigger.sh → PR #1
- 27.4.1 --help/--version on claude-supervisor.sh → PR #116
- 33.4.1 --help/--version on plan-tree-pin.sh → PR #1
- 36.4.1 --help/--version on supervisor.sh → PR #1
- 37.4.1 --help/--version on patch-codex-prompts.sh → PR #1
- 39.4.1 --help/--version on down.sh → PR #1
- 41.4.1 --help/--version on add-workers.sh → PR #1
- 42.4.1 --help/--version on codex-fleet-2.sh → PR #1

Diff is exactly 20 `- state: PROPOSED` → `- state: SHIPPED` line edits
in docs/future/PROTOCOL.md; nothing else touched. Both governance
checks still pass:

  bash scripts/protocol/check-states.sh                # exit 0
  bash scripts/protocol/protocol-state.sh --summary    # SHIPPED: 0 → 20

PR refs live in this commit body / PR description rather than inline in
the state line, because both check-states.sh and protocol-state.sh
require the state token to be the bare enum value (e.g. `SHIPPED`); any
suffix breaks both checks.

Co-authored-by: NagyVikt <nagy.viktordp@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant