feat(codex-fleet): plan-watcher supervisor + claude-runtime fallback#46
Merged
NagyVikt merged 1 commit intoMay 14, 2026
Conversation
Closes the longest-running pain in the fleet: workers stay idle after a new Colony plan publishes because their OVERRIDE pinning still points at an exhausted older plan. Previously every plan switch required manually tmux-send-keys'ing each pane — the screenshot from tonight shows 6/7 workers stuck on phase5/sub-1's stale blocker while typed-layout-dsl/ sub-0 sat untouched. Three pieces: 1. scripts/codex-fleet/plan-watcher.sh (NEW) Every 30s: parses `colony plan status` for plans with available > 0, detects idle worker panes (matching "no claimable work" / "polls returned" / "stale blocker" in recent stdout), and round-robins an OVERRIDE prompt to each idle pane via tmux load-buffer + paste-buffer. Per-(pane, plan) cooldown (10 min default) prevents stomping on workers mid-task. --dry-run + --once for verification; --loop for daemon mode. 2. scripts/codex-fleet/full-bringup.sh wiring plan-watcher added as a ticker_window daemon, sibling to force-claim / claim-release / stall-watcher. Plus the worker spawn line now branches on $CODEX_FLEET_RUNTIME: `codex` (default) or `claude` — the latter spawns `claude --print --permission-mode bypassPermissions` instead of the codex CLI for the fallback case when the codex account pool is exhausted. 3. scripts/codex-fleet/spawn-fleet.sh: add --runtime claude|codex flag that forwards CODEX_FLEET_RUNTIME to the bringup. Lets operators start a parallel fleet on Claude when codex is capped. Verification: live test on the running codex-fleet session — `plan- watcher --once` correctly dispatched 4 idle workers across 3 available plans (phase5, overlay-modulesplit, typed-layout-dsl) in one tick. Workers re-entered "Working" state immediately on prompt receipt. Trade-offs / follow-ups: - Claude runtime path is the scaffold; the worker-prompt template lives in scripts/codex-fleet/worker-prompt.md and was written for codex. It works against claude without modification but should grow a runtime-aware variant for full parity. - plan-watcher considers any plan with available > 0, including on-disk-only (unpublished) plans. Worker will fail to claim from unpublished plans and fall back to task_ready_for_agent. Filtering to published-only is a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
NagyVikt
added a commit
that referenced
this pull request
May 20, 2026
…t sweep) (#199) Audited docs/future/PROTOCOL.md against the actual repo + git history and flipped 20 stale `state: PROPOSED` lines to `state: SHIPPED` for items whose implementation has clearly landed on main. Each item's acceptance verification command runs green and the touched script/file already implements the proposal. Updates (section → slug → representative PR): - 1.4.1 Formal lifecycle states for every improvement entry → PR #171 (scripts/protocol/{protocol-state,check-states}.sh + state lines) - 7.4.1 --help/--version on stall-watcher.sh → PR #1 - 9.4.1 --help/--version on plan-watcher.sh → PR #46 - 10.4.1 --help/--version on review-queue.sh → PR #37 - 11.4.1 --help/--version on review-pane-scanner.sh → PR #41 - 12.4.1 --help/--version on auto-reviewer.sh → PR #99 - 17.4.1 --help/--version on show-fleet.sh → PR #139 - 18.4.1 --help/--version on token-meter.sh → PR #17 - 19.4.1 --help/--version on warm-pool.sh → PR #1 - 20.4.1 --help/--version on spawn-fleet.sh → PR #1 - 21.4.1 --help/--version on dispatch-plan.sh → PR #133 - 23.4.1 --help/--version on proactive-probe.sh → PR #1 - 24.4.1 --help/--version on claim-trigger.sh → PR #1 - 27.4.1 --help/--version on claude-supervisor.sh → PR #116 - 33.4.1 --help/--version on plan-tree-pin.sh → PR #1 - 36.4.1 --help/--version on supervisor.sh → PR #1 - 37.4.1 --help/--version on patch-codex-prompts.sh → PR #1 - 39.4.1 --help/--version on down.sh → PR #1 - 41.4.1 --help/--version on add-workers.sh → PR #1 - 42.4.1 --help/--version on codex-fleet-2.sh → PR #1 Diff is exactly 20 `- state: PROPOSED` → `- state: SHIPPED` line edits in docs/future/PROTOCOL.md; nothing else touched. Both governance checks still pass: bash scripts/protocol/check-states.sh # exit 0 bash scripts/protocol/protocol-state.sh --summary # SHIPPED: 0 → 20 PR refs live in this commit body / PR description rather than inline in the state line, because both check-states.sh and protocol-state.sh require the state token to be the bare enum value (e.g. `SHIPPED`); any suffix breaks both checks. Co-authored-by: NagyVikt <nagy.viktordp@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated by gx branch finish (PR flow).