fix(add-workers): pass discovered emails to cap-probe + retry discover without --exclude-tmux#137
Merged
NagyVikt merged 1 commit intoMay 15, 2026
Conversation
…r without --exclude-tmux Two cascading bugs in `scripts/codex-fleet/add-workers.sh` made the discovery wrapper report `only 0 healthy unallocated accounts available` and then fail with `no healthy accounts available`, even when ≥1 account in the canonical `agent-auth list` pool passed the 5h<100% / weekly<90% / not-already-active filter. Root cause #1 — cap-probe invoked with no email arguments `pick_accounts()` called `cap-probe.sh "$need"` (only the count). cap-probe's shebang is `<need_n> email1 email2 ...`: after `shift`, it iterates `for email in "$@"` over an empty list, probes nothing, and exits with 0 healthy rows. The wrapper then took the empty result as "0 healthy" and moved on. Root cause #2 — discover-accounts fail-closes when target tmux session is absent `bash discover-accounts.sh --exclude-tmux <session>` runs `tmux list-panes -s -t <session> | sed | sort | tr | sed` under `set -eo pipefail`. On a host where `<session>` doesn't exist on the default tmux server (e.g. running `add-workers.sh` outside the fleet session, or with `CODEX_FLEET_TMUX_SOCKET` unset so the wrapper degrades to the operator's default tmux), tmux exits 1, pipefail kicks in, the helper exits before reaching its python emitter, and the wrapper sees an empty tempfile. The wrapper then treated empty as "all candidates allocated" instead of "tmux filter unusable, retry without it". Fix (surgical, in-file only) 1. After the first discover-accounts call, if the tempfile is empty, retry without `--exclude-tmux`. We still keep `--exclude-active` so accounts already in `fleet-active-accounts.txt` are skipped. 2. Before invoking cap-probe, extract the email column from the discovered TSV and pass each email as a positional arg so cap-probe has something to probe. Empty discovery skips cap-probe entirely. The helper-side bug (discover-accounts.sh exiting 1 when the tmux session is missing instead of treating an empty tmux query as "no live panes to exclude") is left untouched per file-scope contract; the wrapper now compensates for it. Verified on host 2026-05-16: bash -n scripts/codex-fleet/add-workers.sh # exit 0 docker run koalaman/shellcheck:stable …add-workers.sh # only pre-existing findings bash scripts/codex-fleet/add-workers.sh 1 --dry-run # picks admin-mite (1 healthy) bash scripts/codex-fleet/add-workers.sh 2 --dry-run # picks 2 healthy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated by gx branch finish (PR flow).