Skip to content

pkg: initial repo packaging — README, LICENSE, install.sh, skill, docs#1

Merged
NagyVikt merged 1 commit into
mainfrom
pkg/initial-package
May 14, 2026
Merged

pkg: initial repo packaging — README, LICENSE, install.sh, skill, docs#1
NagyVikt merged 1 commit into
mainfrom
pkg/initial-package

Conversation

@NagyVikt
Copy link
Copy Markdown
Contributor

First packaging commit for the extracted repo.

What changed

  • Moves the 296-commit subtree-split history into scripts/codex-fleet/
    so lib/_env.sh's CODEX_FLEET_REPO_ROOT autodetect (\$DIR/../../..)
    still resolves to the repo root.
  • Adds public-facing packaging:
    • README.md — install + usage + deps
    • LICENSE — MIT
    • .gitignore — accounts.yml + caches
    • install.sh — skill symlink + accounts.yml seed + env hints
    • skills/codex-fleet/SKILL.md — orchestrator skill (lifted from
      ~/.claude/skills/codex-fleet/SKILL.md; \$HOME/Documents/recodee
      refs replaced with \$CODEX_FLEET_REPO_ROOT)
    • docs/recodee-extras.md — codex-gpu-embedder + workspace colony
      rebuild recipes pulled out of SKILL.md (recodee-internal)

Test plan

  • Tree builds cleanly (`find . -type f | wc -l`)
  • No `/home/deadpool/Documents/recodee` leaks in SKILL.md
  • Fresh clone → `bash install.sh` symlinks the skill (Phase E)
  • `CODEX_FLEET_REPO_ROOT=$HOME/Documents/recodee bash scripts/codex-fleet/full-bringup.sh --n 2 --no-attach` works (Phase E)

🤖 Generated with Claude Code

Moves the 296-commit subtree-split history into scripts/codex-fleet/ so
the canonical lib/_env.sh CODEX_FLEET_REPO_ROOT autodetect ("$DIR/../../..")
still resolves to the repo root after extraction. Adds the public-facing
packaging files that did not exist when this directory lived inside
recodee/scripts/codex-fleet/:

  README.md              install + usage + dependencies
  LICENSE                MIT
  .gitignore             accounts.yml + log/cache patterns
  install.sh             skill symlink + accounts.yml seed + env hints
  skills/codex-fleet/    Claude Code orchestrator skill (lifted from
                         ~/.claude/skills/codex-fleet/SKILL.md with
                         recodee absolute paths replaced by
                         $CODEX_FLEET_REPO_ROOT)
  docs/recodee-extras.md codex-gpu-embedder + workspace colony rebuild
                         recipes pulled out of SKILL.md — recodee-internal,
                         not relevant to public consumers

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@NagyVikt NagyVikt merged commit 8b8bf27 into main May 14, 2026
@NagyVikt NagyVikt deleted the pkg/initial-package branch May 14, 2026 13:24
NagyVikt added a commit that referenced this pull request May 14, 2026
)

Adds the post-merge quality rail (improvement #1) and the checkpoint
runner (improvement #3's data half), built around a single shared
scorer primitive. The two surfaces invoke the same Python script with
different inputs and different sinks — same prompt, same I/O shape,
just different trigger and reader.

## The scorer (scripts/codex-fleet/lib/score-diff.py)

Reads {diff, criteria, mode, pr_title} JSON on stdin, asks Claude
whether the diff demonstrably satisfies each acceptance criterion in
plan.md, writes {score, criteria_met, criteria_missed, reasoning}
JSON on stdout. Stdlib only (urllib + json) — no pip dependency.

The prompt is deliberately narrow: "did the diff satisfy these listed
criteria" rather than "is this code good." That makes the call
reading-comprehension rather than aesthetic judgment, which is
tractable for an LLM and reasonably deterministic across runs. The
exact same call shape works for "merged diff" (improvement #1) and
"diff so far" (improvement #3), which is what lets one primitive
serve both.

## Improvement #1: post-merge quality rail

`scripts/codex-fleet/score-merged-pr.sh <PR>`:
1. gh pr view → branch, title, agent-id (parsed from agent/<owner>/...)
2. gh pr diff → full unified diff
3. find openspec/plans/<slug>/plan.md whose slug appears in the branch
   or title; extract its `## Acceptance Criteria` section
4. invoke lib/score-diff.py
5. merge the verdict into /tmp/claude-viz/fleet-quality-scores.json
   keyed by agent-id

`rust/fleet-data/src/scores.rs` reads that JSON with a 5s TTL cache.
`fleet::join` now takes `&ScoresFile` as a third argument and folds
each agent's score into `WorkerRow.quality: Option<u8>`. `fleet-state`
renders it as a third rail (Done axis: high → green, low → red)
between the WEEKLY/5H rails and the STATUS chip. Un-scored agents get
a same-width blank so column alignment is preserved.

The score is **advisory**. It does not feed routing, claim, or
autopilot decisions; the dashboard displays it for the operator to
eyeball. The Python prompt scores criteria, not vibe — but it is still
an LLM call, non-deterministic, and gameable, so treat low numbers as
"look at this," not a fail.

## Improvement #3: checkpoint scorer (data half)

`scripts/codex-fleet/score-checkpoint.sh`:
- Scans `.omc/agent-worktrees/*` for active agent worktrees.
- For each, computes the diff `origin/main..HEAD`, finds the matching
  plan.md, calls lib/score-diff.py in `mode: "checkpoint"`.
- Writes /tmp/claude-viz/fleet-checkpoint-warnings.json keyed by
  agent-id.

Intended cron cadence is ~15min. The colony task_post integration that
turns a low checkpoint score into an actual blocker lives in a
follow-up PR (separate repo: recodee/colony) — this lands the data
half so the trigger half has something to read.

## Why one PR for #1 and #3

The user's sharpening of the plan was explicit: "Build the
diff-scorer once, use it post-merge for #1, then reuse it at
checkpoints for #3." Shipping them in one lane keeps the prompt + JSON
schema + reader path single-sourced. A prompt tweak for one improves
both; a schema break breaks both visibly at once.

## Verification

cargo build --workspace: clean.
cargo test -p fleet-data: 42/42 pass (was 34, +6 scores, +2 quality
                          join wiring tests).
bash -n on both wrapper scripts: clean.
python3 -c "ast.parse(...)" on the scorer: clean.

The actual API call cannot be exercised inside this session without
an outbound network + ANTHROPIC_API_KEY — the operator runs
`./scripts/codex-fleet/score-merged-pr.sh <recent-PR>` once to
populate `/tmp/claude-viz/fleet-quality-scores.json` and confirm the
rail lights up in fleet-state. Mock data shape verified by parses_fixture.

Co-authored-by: NagyVikt <nagy.viktordp@gmail.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
NagyVikt added a commit that referenced this pull request May 15, 2026
…r without --exclude-tmux (#137)

Two cascading bugs in `scripts/codex-fleet/add-workers.sh` made the discovery
wrapper report `only 0 healthy unallocated accounts available` and then fail
with `no healthy accounts available`, even when ≥1 account in the canonical
`agent-auth list` pool passed the 5h<100% / weekly<90% / not-already-active
filter.

Root cause #1 — cap-probe invoked with no email arguments

`pick_accounts()` called `cap-probe.sh "$need"` (only the count). cap-probe's
shebang is `<need_n> email1 email2 ...`: after `shift`, it iterates
`for email in "$@"` over an empty list, probes nothing, and exits with 0
healthy rows. The wrapper then took the empty result as "0 healthy" and
moved on.

Root cause #2 — discover-accounts fail-closes when target tmux session is absent

`bash discover-accounts.sh --exclude-tmux <session>` runs
`tmux list-panes -s -t <session> | sed | sort | tr | sed` under
`set -eo pipefail`. On a host where `<session>` doesn't exist on the
default tmux server (e.g. running `add-workers.sh` outside the fleet
session, or with `CODEX_FLEET_TMUX_SOCKET` unset so the wrapper degrades
to the operator's default tmux), tmux exits 1, pipefail kicks in, the
helper exits before reaching its python emitter, and the wrapper sees an
empty tempfile. The wrapper then treated empty as "all candidates
allocated" instead of "tmux filter unusable, retry without it".

Fix (surgical, in-file only)

1. After the first discover-accounts call, if the tempfile is empty,
   retry without `--exclude-tmux`. We still keep `--exclude-active` so
   accounts already in `fleet-active-accounts.txt` are skipped.
2. Before invoking cap-probe, extract the email column from the
   discovered TSV and pass each email as a positional arg so cap-probe
   has something to probe. Empty discovery skips cap-probe entirely.

The helper-side bug (discover-accounts.sh exiting 1 when the tmux
session is missing instead of treating an empty tmux query as
"no live panes to exclude") is left untouched per file-scope contract;
the wrapper now compensates for it.

Verified on host 2026-05-16:
  bash -n scripts/codex-fleet/add-workers.sh             # exit 0
  docker run koalaman/shellcheck:stable …add-workers.sh  # only pre-existing findings
  bash scripts/codex-fleet/add-workers.sh 1 --dry-run    # picks admin-mite (1 healthy)
  bash scripts/codex-fleet/add-workers.sh 2 --dry-run    # picks 2 healthy

Co-authored-by: NagyVikt <nagy.viktordp@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants