chore: upgrade /pr-review and /pr-fix slash commands by 0xbulma · Pull Request #599 · morpho-org/sdks

0xbulma · 2026-05-04T14:31:36Z

Summary

Brings the project's /pr-review and /pr-fix slash commands up to the patterns proven in the personal /ben-pr-{review,fix,local-review} skills, while preserving SDK-specific specialization (7-agent review set, hardcoded pnpm/biome quality gates, packages/<pkg> conventions, JSDoc style guide reference).

The unified /pr-review skill has been split into three focused entry points sharing a common review library:

.agents/
  commands/
    pr-review-ci.md       — CI verdict mode (APPROVE / REQUEST_CHANGES, runs in GitHub Actions)
    pr-review-gh.md       — Local PR review (posts as COMMENT) + --watch
    pr-review-local.md    — Pre-PR local review (terminal-only) + --fix
    pr-fix.md             — Apply fixes for unresolved PR review comments
  lib/
    pr-review-base.md     — Shared Steps 3–6 (diff, project context, 7 review agents, aggregation)

Each entry point is single-mode (no internal CI-vs-local-vs-watch dispatch); shared logic lives in the lib.

What's new vs the previous /pr-review

Three modes, three commands. No more if CI then ... else if --local then ... branching — each command is one shape.
/pr-review-local is a new pre-PR review flow: terminal output, no GitHub interaction, optional --fix to apply fixes locally (refuses on dirty tree — no stash-and-pop machinery).
Adaptive project context discovery. Step 4 reads root AGENTS.md (canonical; CLAUDE.md is a symlink), MISSION.md, CONTRIBUTING.md, biome.json always; conditionally adds docs/jsdoc-style.md (when an exported symbol or JSDoc is touched), SECURITY.md, docs/tibs/TEMPLATE.md. Per-package AGENTS.md/README.md/ARCHITECTURE.md only for packages touched by the diff.
Sentinel discipline. Every terminal step ends with a single grep-able Sentinel: NAME — <prose> line (REVIEW_DONE_PR, REVIEW_CLEAN, REVIEW_INCOMPLETE, WATCH_REJECTED, FIX_DONE_LOCAL, etc.). Each skill's grammar table is inline — no master/god registry.
Whitespace-only validation. gh pr view field validation uses [ -z \"\${X//[[:space:]]/}\" ] (not bare [ -z \"\$X\" ]) — whitespace-only branch names no longer silently corrupt downstream commands.
Watcher robustness in /pr-fix --watch: clean-tree gate at cycle start, orphan-stash detection from prior crashed cycles, `git stash push -u && git stash drop` for lint-failure rollback (recoverable from reflog by stable commit SHA), explicit `WATCH_PR_CLOSED` / `WATCH_REVIEW_CLEAN` / `WATCH_TRANSIENT_ERROR` sentinels for every cycle terminus.
/pr-fix Step 5f routing table. Seven categories × verbatim reply text × resolve-yes/no — auditable contract used by both Step 9 and Step 12 watcher Step 7. Includes a Mixed-purpose variant (Question + Actionable).

What's intentionally not changed

7 SDK-specialized review agents (`Code Quality`, `Module & API Architecture`, `Web3 Security`, `Silent Failure Hunter`, `Style & Conventions Compliance`, `Documentation Analyzer`, `Test Coverage Analyzer`) kept verbatim — this is the project's domain knowledge.
`pnpm exec biome check` / `pnpm lint` / `pnpm -r --if-present build` hardcoded — this repo is always pnpm; dynamic discovery would add noise.
`packages/` conventions in agent prompts (NodeNext `.js` suffix, SDK type reuse, `bigint` for onchain quantities, etc.) preserved.
CI markers (``, ``) preserved in CI mode body for the project's CI gates.
`Co-Authored-By: Claude Opus 4.7 (1M context)` attribution in commits.

Migration

Users invoking the old `/pr-review` should switch to one of:

`/pr-review-ci` in CI environments
`/pr-review-gh` for local PR review (with optional `--watch`)
`/pr-review-local` for pre-PR feedback (with optional `--fix`)

`/pr-fix` is unchanged at the user-facing level (same arguments, same behavior).

Test plan

Run `/pr-review-local` against a small open PR; verify 7 agents launch in parallel, output uses Critical/High/Medium/Low labels, terminates with `Sentinel: REVIEW_DONE_LOCAL` or `REVIEW_CLEAN`.
Run `/pr-review-gh ` against a low-risk open PR; verify atomic review posts to GitHub with `COMMENT` event, body includes severity table, terminates with `Sentinel: REVIEW_DONE_PR`.
Run `/pr-fix ` against the resulting review; verify Step 5d prints triage assessment, Step 9 routes per Step 5f table, Step 11 emits `Sentinel: FIX_DONE_PR`.
Run `/pr-review-local --fix` on a clean tree; verify edits applied unstaged, lint failures revert via in-memory snapshot, terminates with `Sentinel: FIX_DONE_LOCAL`.
Run `/pr-review-local --fix` on a dirty tree; verify it aborts with `Sentinel: FIX_ABORTED — working tree is not clean`.
(Watcher smoke, optional) Run `/pr-review-gh --watch`; verify CronCreate succeeds, push a trivial commit, wait one cron tick, confirm re-review posts. Tear down with `CronDelete`.

🤖 Generated with Claude Code

Brings the project's /pr-review and /pr-fix commands up to the patterns proven in the personal /ben-pr-{review,fix,local-review} skills, while preserving the SDK-specific specialization (7-agent review set, hardcoded pnpm/biome quality gates, packages/<pkg> conventions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tightens the new --local flow and the watcher prompts in response to a parallel local review (5 agents). Themes: - Mode dispatch: explicit argument-validation step at the top of pr-review Step 1 (rules in order) plus a "valid combinations" table. --local always wins over CI env detection; --local + --watch warns and drops --watch; --local + <PR_NUMBER> warns and drops the PR number. - Branch derivation safety: capture default-branch detection into a variable with a verified fallback chain (origin/main → origin/master → abort); detached-HEAD treated as a display-only short SHA before any equality check; gh pr view exit-status checked and all four required fields validated non-empty before downstream commands run. - --fix safety: stash any uncommitted user work BEFORE the fix loop so a failed-lint revert can re-Edit from an in-memory original snippet without destroying pre-existing uncommitted changes; restore via git stash pop after the loop. - Aggregator + sentinels: every parallel agent must return either a JSON array OR an explicit {"agent_error": "..."} sentinel. Step 6 counts failed agents; Step 7 distinguishes "REVIEW_CLEAN" (no findings, zero failures) from "REVIEW_INCOMPLETE" (no findings, some agents crashed). All terminal step variants end with a single grep-able Sentinel: line. - Watcher discipline (both files): explicit CronCreate-time vs cycle-derived placeholder convention with a pre-flight check that aborts CronCreate if any unsubstituted <[A-Z_]+> remains. Cycle-derived ${CYCLE_HEAD_SHA} replaces the SHA-staleness trap where the watcher would otherwise reply with the CronCreate-time SHA forever. Every Run line has an exit-status check; gh API failure (auth/rate-limit/network) no longer falls through to "review everything" / "review none". jq filter binds <BOT_LOGIN> via --arg to dodge shell-quoting injection. Drift reminder added so future Step 5 / Step 5f changes stay propagated to the watcher prompt. - Triage gap (pr-fix Step 5/9 + watcher Step 7): added a per-category routing table (5f) covering all six categories from 5a — actionable+fixed, actionable+skipped, question, discussion, praise, already-addressed, stale — with distinct reply text and resolve/leave-open behavior per category. Watcher Step 7f mirrors the table. - Project-context discovery: tighten find excludes (.git, dist, build, .next, coverage, .turbo, .context); drop the maxdepth-2 contradiction; prefer Glob tool. Add explicit "files outside packages/" handling (root baseline only) and a worked example. Echo the list of context files actually read so omissions are visible. - Pagination preflight (pr-fix Step 4): if reviewThreads.pageInfo .hasNextPage is true, abort loud rather than silently truncating threads beyond the first 100. - CI poll timeout (pr-fix Step 10): track CI_POLL_TIMED_OUT and report "CI: PENDING (timed out after 2.5min — re-check ...)" instead of a bare PENDING that reads as terminal. - Misc bugs: lowercase <pr-number> swept to <PR_NUMBER> in both files; ✅/❌ emojis dropped from CI verdict template; <MERGE_BASE_SHORT> added to the placeholder convention; "Local Code Review" header renamed to "Local-only Code Review" for terminology consistency; pr-fix Step 6a test/ path corrected (this monorepo has no top-level test/ directory); Codex pass validates branch names against ^[A-Za-z0-9._/-]+$ before shell substitution; "skip to Step 8" reworded to "proceed to Step 8"; TWO-PHASE banner mentions the three Step 7 variants and Step 9.5; pairing note in pr-fix Notes mentions /pr-review --local as the pre-PR half of the loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@example

…cs/jsdoc-style.md Iteration 3 of /pr-review and /pr-fix in response to running /pr-review --local on the previous commit. Five HIGH-severity bugs from that pass + the medium and low cleanups, plus a new JSDoc-doc reference per request. JSDoc reference (new requirement): - pr-review Step 4 always-read baseline now includes docs/jsdoc-style.md (canonical guide; operationalizes AGENTS.md §6 + MISSION.md goal #3), read whenever the diff touches an exported symbol or any JSDoc/@example. Backed by docs/tibs/TIB-2026-05-04-jsdoc-coverage-on-exported-symbols.md. - SECURITY.md moved out of "Always read" into "Conditional baseline" (it was always-read but described as conditional). - pr-review Agent 6 (Documentation Analyzer) prompt cites docs/jsdoc-style.md as the canonical shape; its @example checklist now references the runnable-recipe rule from the style guide. - pr-review watcher's Step 7 documentation agent mirrors the same. - pr-fix Step 6a context-gathering reads docs/jsdoc-style.md when the fix touches an exported symbol / @example / JSDoc. HIGH-severity fixes: - Watcher pre-flight `<[A-Z_]+>` regex was a false-positive on legitimate output template tokens like <N>, <X>, <Y>, etc. Replaced with an ALLOWLIST_REGEX that matches ONLY the seven static placeholders by name. Three classes of placeholders are now explicitly documented (CronCreate-time / cycle-derived / report-template) in both watcher prompts. - Invalid bash syntax `${VAR}=$(...)` was used throughout both watcher prompts. Bash assignment is bare-LHS (`VAR=$(...)`); the brace form parses as a command. Rewritten to natural-language `set CYCLE_VAR = ...` form with an explicit note clarifying that the watcher reads steps as instructions, not as a verbatim bash script. - Invalid `gh api --jq --arg login ...` invocation: gh's --jq accepts only the filter as a single arg; --arg is a jq flag. Rewritten to capture gh output then pipe into a separate `jq --arg login` call. - pr-fix Step 9.5 reconciliation only acknowledged 2 outcomes (fixed / skipped) but Step 5f had just defined 7 categories. Rewrote Step 9.5 to enumerate all 7 terminal states (with the exact reply text per category) + a Sentinel: RECONCILE_OK / RECONCILE_FAILED line that confirms the pass actually ran. - pr-fix watcher lint-failure path now reverts the unstaged Edit-tool changes (`git checkout -- <files>`) before ending the cycle, otherwise the next cycle's `gh pr checkout` runs on a dirty tree and `git merge` refuses with a misleading transient-error sentinel. Emits Sentinel: WATCH_LINT_FAILED so the user can grep for it. - pr-review Step 7 (alt 2b) stash-pop conflict now emits a dedicated Sentinel: FIX_DONE_WITH_STASH_CONFLICTS line with the conflicting file list — the previous "Changes are unstaged. Review with: git diff" sign-off was actively misleading when conflict markers were present. Sentinel namespacing: - REVIEW_DONE → REVIEW_DONE_PR (CI / Local PR modes) and REVIEW_DONE_LOCAL (Local-only mode). Both terminal sentinels are now distinct strings so downstream parsers can grep on the trailer prefix alone. - New Sentinel: FIX_DONE_PR added to pr-fix Step 11 so /pr-fix has a terminal grep target matching /pr-review's structure (the "match the format" claim that was previously misleading is now honestly aligned). - pr-fix watcher's WATCH_FIX_DONE now reports both ${CYCLE_MERGE_SHA} (previously captured-but-unused) and ${CYCLE_HEAD_SHA}. Aggregator and dedup tightening (pr-review Step 6): - Per-element schema validation: agents that return arrays containing malformed finding objects (missing severity / non-numeric line / etc.) count as PARTIALLY failed — keep the valid findings but include the agent in <FAILED_AGENTS>. - Dedup rule tightened: ±3 line tolerance now requires BOTH severity AND description overlap. Distinct findings on adjacent lines (e.g. one agent flags missing JSDoc on line 42 export, another flags swallowed catch on line 42) are no longer silently merged. Other content fixes: - pr-review valid-argument-combinations table at the top (replaces three scattered statements of the rule). - pr-review Step 1 sub-section heading prefix consistency: 1c, 1d added. - pr-review placeholder convention split into Static vs Computed-at-runtime tables; <MERGE_BASE_SHORT> formally added; <MERGE_BASE> table entry notes the watcher's cycle-derived form. - pr-review Validating --local end-to-end smoke-test recipe added, documenting the determinism boundary (LLM finding-text drift across runs). - pr-review Step 7 (alt 2) sentinel suppressed when --fix is set so users see one combined run with one terminal sentinel. - pr-review Codex pass validates branch names against ^[A-Za-z0-9._/-]+$ before substitution, refusing to splice unsafe characters into the shell. - pr-fix Step 4 GraphQL query now requests `createdAt` (the <comment_created_at> placeholder used by Step 5c was previously undefined). - pr-fix Step 5a documents Mixed-purpose comments (Question + Actionable) with an extended reply text that acknowledges both halves. - pr-fix Step 5f routing table includes the Mixed-purpose row, and adds a smoke-test recipe — eight throwaway threads, one per category — that catches future regressions where two categories silently merge. - pr-fix Step 6b confidence assessment now uses `Confidence: HIGH/MEDIUM/LOW` prefix so the tokens never collide with severity (Critical/High/Medium/Low). - pr-fix Step 10 disambiguates `gh pr checks --watch` (a gh CLI flag) from /pr-fix --watch (the user-facing skill flag). - Whitespace-only field validation: both files now reject whitespace-only values from `gh pr view` (previously `[ -z "$X" ]` passed for " "). - pr-fix watcher merge-conflict commit message uses heredoc form (was vulnerable to <BASE_BRANCH> containing $ or backticks). - pr-fix watcher heredoc reply tag standardized on REPLY_EOF (was REPLYEOF). - pr-fix watcher code-fence tagged ```text for cleaner rendering. - pr-fix watcher cycle-local variables renamed to ${CYCLE_*} convention (CYCLE_PR_STATE, CYCLE_FAILED_CHECKS, CYCLE_RUN_ID, CYCLE_EXPECTED_HEAD_SHA). - pr-fix watcher Step 7f reply-text contract now spelled out per category with explicit "NEVER reference ${CYCLE_HEAD_SHA} here" notes for non- actionable categories where step e never runs. - pr-review --local --fix Fix Summary format renamed and acknowledged as intentionally distinct from /pr-fix Step 11 (the previous "match the format" claim was misleading; both ends now end with grep-able sentinels). - pr-review Notes ⚠ emoji warnings replaced with WARNING: prose. - Drift reminders: pr-review Step 7 now flags drift between --local --fix and /pr-fix Step 6; pr-review watcher Step 7 already flagged drift vs Step 5; pr-fix watcher already flagged drift vs Step 5f. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…try, lint cleanup safety Removes .agents/guidelines.md and .agents/skills/ references — neither exists in this repo (per user request). Cross-skill cycle-variable naming parity: - pr-review watcher renamed bare ${PR_STATE} / ${LAST_REVIEWED_SHA} / ${MERGE_BASE} / ${FAILED_AGENTS} / ${LAST_REVIEWED_RAW} to ${CYCLE_PR_STATE} / ${CYCLE_LAST_REVIEWED_SHA} / ${CYCLE_MERGE_BASE} / ${CYCLE_FAILED_AGENTS} / ${CYCLE_LAST_REVIEWED_RAW}, matching pr-fix's "every cycle-local variable uses CYCLE_*" convention. Static placeholder table footnote updated accordingly. Watcher cycle-terminus sentinel coverage: - Added Sentinel: WATCH_REVIEW_CLEAN for the "no new commits" path in pr-review (previously silent). - Added Sentinel: WATCH_PR_CLOSED in both files for the "PR not OPEN" termination path (previously silent in both watchers). - Both make every cycle terminus grep-able. Empty-prompt guard before ALLOWLIST_REGEX: - Both watchers now guard `[ -z "$ASSEMBLED_PROMPT" ]` BEFORE the allowlist grep. An unset/empty prompt would otherwise pass the check by virtue of having nothing to match, and the watcher would be scheduled with an empty prompt. pr-fix watcher Step 2 clean-tree gate: - Refuses to run when `git status --porcelain` is non-empty at cycle start. A prior cycle that crashed mid-Edit could have left orphaned modifications that gh pr checkout doesn't clean — the next steps would silently carry them forward. pr-fix watcher Step 7b lint-failure cleanup safety: - Switched from raw `git checkout -- <files>` to `git stash push -u + git stash drop`. `git checkout --` is destructive (it discards ALL unstaged changes to the file, not just this cycle's Edit-tool changes); the stash-and-drop form preserves prior-cycle work in the reflog so a human can recover if needed. pr-fix watcher Step 3 — restored <BASE_BRANCH> in heredoc commit subject: - The previous "merge: resolve conflicts with base branch" literal dropped traceability info from git history. <BASE_BRANCH> is in the CronCreate-time allowlist so it's substituted into the prompt before the watcher receives it; the single-quoted MERGEEOF preserves the substituted literal verbatim with no shell injection. WATCH_FIX_DONE / RECONCILE_OK label harmonization: - Renamed token <S> for "stale" → <ST> (collided with <S>=skipped-with- reply in RECONCILE_OK). Both sentinels now use identical bucket labels <F> / <SK> / <Q> / <D> / <P> / <A> / <ST> so a downstream parser can grep either with the same regex. - RECONCILE_OK on the empty-list early-exit path (Step 4 → 9.5 → 11) documented to emit zero counts. pr-review --fix pre-flight stale-stash check: - Before stashing this run's uncommitted work, refuses to start if a stale `pr-review --local --fix safety stash` already exists in `git stash list`. Catches the regression where a prior crashed run left a stash; the new run would otherwise stack another with the same message and risk popping the wrong one. pr-fix outside-packages baseline scope = 5 items: - Step 6a's "outside packages/" clause now includes CONTRIBUTING.md and biome.json (matching pr-review Step 4's always-read baseline). Files under .agents/commands/ etc. now get full root-level context in /pr-fix, not the narrow 3-file subset. FIX_DONE_PR sentinel — CI timeout signal + early-exit fields: - Added `ci=PENDING_TIMEOUT` (CI poll loop timed out — distinct from plain PENDING) and `ci=NA` (Step 10 was skipped on the empty-list early-exit path). - Documented exact field values for the empty-list early-exit case so the sentinel is always grep-able and clearly distinguishable from a "real" run. Step 5f / Step 9.5 / smoke-test recipe — category count consistency: - Resolved the 6-vs-7-vs-8 confusion: Step 5a defines SEVEN terminal categories; Step 5f routing has eight ROWS (seven categories + a Mixed-purpose VARIANT of actionable+fixed); Step 5d's printed bucket counts use the seven Step-5a labels. - Smoke-test recipe step 3 corrected: "actionable: 3" (fixed + skipped + mixed-purpose, all classified as actionable in Step 5a) plus 1 each for the other six. The previous "every column should show 1" was unsatisfiable. - Step 5f routing rule for distinguishing rows by reply-text grep is now a deterministic, ordered ladder (first match wins), with Mixed-purpose distinguished BY its `(Re your question:` postscript. Reciprocal drift reminder + cross-reference narrowing: - Added DRIFT-CHECK reminder at the top of /pr-fix Step 6 mirroring the one already in /pr-review Step 7 (alt 2b). Both narrow the cross-reference to "Step 6c-6d" (the actual apply-and-validate mechanics — the wider Step 6 includes the context-gathering surface which is /pr-fix-specific). Sentinel grammar registry (NEW — single source of truth in each file): - Added a "Sentinel grammar registry" section before "Error Handling" in both files. Lists every sentinel with its owning step, fire condition, and exact trailer grammar. Adding a new sentinel now requires adding a row to the registry in the same PR — this is the documented contract. pr-review watcher code-fence language tag: - Tagged ```text on the watcher CronCreate prompt block, matching pr-fix's existing ```text tag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…stash robustness HIGH: - Whitespace-only ASSEMBLED_PROMPT silent-pass (both files): both empty-prompt guards switched from `[ -z "$X" ]` to `[ -z "${X//[[:space:]]/}" ]` so a whitespace-only prompt aborts loud (matching the discipline already used for <BASE_BRANCH>/<HEAD_BRANCH> validation). - pr-review REVIEW_INCOMPLETE registry grammar fix: the row's trailer used `<CYCLE_FAILED_AGENTS>` but the sentinel is owned by Step 7 (alt 2) — Local-only mode, NOT the watcher — and emits using `<FAILED_AGENTS>`. Registry now matches the actual emission. MEDIUM: - pr-fix Step 11 conditional leading line: on the empty-list early-exit path the prose previously said "fixes applied and pushed" even though no commit was made. Now picks between two leading lines based on N>0. - pr-fix RECONCILE_FAILED grammar disambiguated to a single summary form: `threads [<id>, <id>, ...] in unknown state.` — one sentinel per run, matching the pattern of every other sentinel in the skill. - pr-fix watcher Step 2 now explicitly checks `git status` exit code separately from the captured stdout (corrupted index / lock contention no longer silently passes the clean-tree gate). Dirty-tree sentinel output truncated to a count + first-path summary so it stays a single grep-able line. - pr-fix watcher Step 2 also detects orphan watcher stashes from a prior crashed cycle (`git stash list --format='%gs' | grep -E 'pr-fix watcher : lint-aborted cycle'`) and emits WATCH_TRANSIENT_ERROR with manual- resolution guidance. - pr-fix watcher Step 7b now checks `git stash drop` exit code and emits a degraded sentinel when drop fails (stash retained at stash@{0} — drop failed: <stderr>) instead of falsely claiming "stashed-and- dropped". - pr-fix watcher Step 7b documents the recovery procedure for inspecting a discarded fix from the reflog (`git fsck --unreachable | grep ...` + `git stash apply <sha>`). - pr-review --fix stale-stash check anchored with `git stash list --format ='%gs' | grep -Fxq 'pr-review --local --fix safety stash'` (whole-line fixed-string match) so a user-named stash that merely *contains* the substring no longer false-positives. Plus a documented escape hatch via PR_REVIEW_FIX_BYPASS_STALE_STASH=1 env var for legitimate-collision cases. - pr-review Step 4 baseline classification: moved docs/jsdoc-style.md from "Always read" to "Conditional baseline" (the trigger language was already conditional). Merged the two split Conditional subsections into one (SECURITY.md + jsdoc-style.md + docs/tibs/TEMPLATE.md). Worked example and outside-packages note updated to reflect the new structure. - Sentinel grammar registry stability/versioning policy added to BOTH files: declares the registries are stable contracts that downstream parsers MAY rely on; describes which changes are patch/minor (additive only) vs major (renames/removals/value-space narrowing — require the 4-step deprecation flow from AGENTS.md §7). LOW: - pr-review L1032 cross-registry symmetry: WATCH_PR_CLOSED moved from /pr-fix's "own set" to the "shared" parenthetical (it's defined in BOTH registries with identical semantics, matching pr-fix's reciprocal classification). - pr-fix placeholder convention (L46) and report-template exemption list (L776): dropped stale `<S>`, added `<SK>` and `<ST>` to match the iter-4 bucket-label rename. - pr-fix L389 drift-check comment: trailing period dropped for parity with pr-review's drift comment style. - pr-review L963: bare $CYCLE_LAST_REVIEWED_RAW → ${CYCLE_LAST_REVIEWED_ RAW} for parity with the documented brace convention. - pr-fix Step 5f gained an empty-list smoke-test recipe (PR with no unresolved threads → expect Sentinel: RECONCILE_OK with 0 counts + Sentinel: FIX_DONE_PR with ci=NA + the conditional Step 11 leading line). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@deprecated

…ty, smoke tests HIGH: - pr-fix watcher Step 2 Pre-condition B (orphan-stash detection) was silently swallowing `git stash list` failures via `|| true`. Now splits into two steps: capture `CYCLE_STASH_LIST` with explicit exit-code check, then grep the captured value. Also reports the count of orphan stashes instead of a vague "an orphan exists" message and instructs the user to drop EACH ref individually. - pr-fix watcher Step 7b (lint-failure stash-and-drop) was pointing the user at the unstable positional ref `stash@{0}` for recovery — that ref shifts whenever any other stash is created, so the suggested `git stash drop stash@{0}` could destroy the wrong stash. Now captures the stash COMMIT SHA via `git rev-parse stash@{0}` BEFORE the drop attempt and emits that stable SHA in both the success and degraded WATCH_LINT_FAILED sentinels. Recovery instruction is now `git stash drop <SHA>` which works regardless of positional shift. MEDIUM: - Sentinel grammar registry stability policy: 4-step deprecation flow had steps 2/3 swapped vs AGENTS.md §7. Restored canonical order: introduce successor → mark @deprecated → maintain BOTH for one minor → remove in next major. - Sentinel grammar registry now has an "Effective from this commit forward" date anchor and a "Shell requirement" clause stating the watcher prompt uses bash-only constructs (rules out POSIX-portable rewrites that would break the whitespace-stripped guard introduced in iter-5). - Stability policy parity: the PENDING_TIMEOUT example now appears in both pr-review and pr-fix Major bullets (was only in pr-review). - pr-review Step 9 watcher Step 6 (READ PROJECT CONTEXT) restructured to mirror the main flow's Step 4 merged "Conditional baseline" (items 5-7 for jsdoc-style.md / SECURITY.md / TEMPLATE.md), instead of having jsdoc-style.md inlined into the Always-read line and TEMPLATE.md alone in a separate Conditional/adaptive line. - pr-fix Step 11 conditional leading line now covers a third terminal state: N>0 actionable comments but ALL confidence-gated to LOW (no fix landed). Step 6 must track FIXES_APPLIED + ACTIONABLE_COUNT counters to disambiguate. Three branches: Normal / Empty-list / Triage-skipped-all. - pr-fix empty-list smoke-test recipe now caveats: Step 3 may push a merge commit before Step 4 short-circuits, in which case HEAD has moved and the recipe's exact-output assertions don't apply. Pre-condition: PR must be cleanly mergeable. - pr-review --fix stale-stash check: bypass UX gates the warning prose on the bypass env var (an opted-in user no longer sees the scary "stale stash detected" warning before the bypass confirmation), and surfaces the count + list of matching stash refs (instead of generic "an entry exists"). - WATCH_FIX_DONE now carries a ci= field for grammar parity with FIX_DONE_PR. Same value space (PASS / FAIL / PENDING / PENDING_TIMEOUT / NA) so a single regex parses either. LOW: - RECONCILE_FAILED grammar switched from `[<id>, <id>, ...]` (bracket+comma, shell-hostile) to `<N> threads in unknown state: <id1> <id2> <id3>.` (space-separated). Aligns with the registry's prevailing field-with- count grammar and is friendly to ad-hoc `for id in ...` recovery loops. - Sentinel registry deprecation marking convention added to both files: strikethrough the sentinel name + append `(deprecated, removal: ...; superseded by ...)` to the trailer cell. Includes a hypothetical row example so authors have a concrete template. - New "Smoke tests" section added to BOTH files covering the iter-5/iter-6 behaviors that previously had no recipe: whitespace-only ASSEMBLED_PROMPT guard, orphan-stash detection, degraded WATCH_LINT_FAILED, RECONCILE_FAILED summary grammar, WATCH_FIX_DONE ci= field, stale-stash check + bypass env var, stash-pop conflict path. - pr-review Usage region gained an "Environment variables" subsection documenting `PR_REVIEW_FIX_BYPASS_STALE_STASH=1` (previously only discoverable from inside the Step 7 alt 2b stash check). - pr-fix watcher's report-template list (~L792) re-aligned with the top-of-file Computed-at-runtime placeholder table — now includes <M> and <H> (previously omitted) so the two lists are in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… executable smoke tests HIGH: - pr-review stale-stash check was silently broken: the awk parser at Step 7 alt 2b Check 1 split git stash list output on space and stripped the leading %gd token but did NOT strip the "On <branch>: " prefix that git stash auto-prepends to %gs. The exact-equality check therefore compared "On main: pr-review --local --fix safety stash" against the bare canonical subject and never matched. The whole safety check has been a no-op. Rewrote to use --format='%gd%x09%gs' (tab-separated, no OFS-collapse hazard) and explicitly strip the "On <branch>: " prefix in awk via sub(/^On [^:]+: /, "", subj). Also: defensive ${MATCHING_STASH_COUNT:-0} comparison, while-read loop instead of unquoted $MATCHING_STASH_REFS expansion, recovery instructions now key on commit SHA (positional refs shift between drops). - WATCH_FIX_DONE ci= field referenced CI_POLL_TIMED_OUT, but that flag is set only inside the main-flow Step 10 poll loop; the watcher's Step 4 is a one-shot status snapshot with no poll. PENDING_TIMEOUT could never legitimately appear in WATCH_FIX_DONE. Restricted the watcher's ci= value-space to PASS / FAIL / PENDING / NA (a strict subset of FIX_DONE_PR's). Documented the difference inline at Step 7h and in the registry row. - Step 11's "Triage-skipped-all" leading-line case (added in iter-6) required Step 5/6 to track FIXES_APPLIED and ACTIONABLE_COUNT counters, but Step 5/6 never set them. Added explicit "set ACTIONABLE_COUNT = ..." to Step 5d (just before the bucket summary) and "set FIXES_APPLIED = 0" + "increment FIXES_APPLIED by 1 on each successful Edit + biome-clean validate + push" to Step 6 entry. The third leading-line case is now implementable. - Smoke-test recipes were non-executable as written. Fixed: * Orphan-stash recipe required uncommitted changes (undocumented pre-req) — now creates /tmp/orphan-trigger first. * Degraded-WATCH_LINT_FAILED recipe ran chmod -w .git/refs/stash BEFORE the first stash push (file doesn't exist yet → chmod fails silently → drop succeeds → recipe produces a false PASS). Now bootstraps the refs with a throwaway stash push+drop, THEN chmod. Also documents macOS APFS / packed-refs caveat with chflags uchg fallback. * RECONCILE_FAILED recipe required forking the skill prompt — heavy. Replaced with a lighter alternative: post a comment from a fresh bot account whose login isn't in the recognized-author allowlist. * WATCH_FIX_DONE ci= recipe was scoped to "exactly ONE match per cycle" but greppped a multi-cycle log. Now requires single-cycle stdout capture and adds 5 sub-cases for the value-space mapping (PASS / FAIL / PENDING / NA — PENDING_TIMEOUT explicitly noted as not in the watcher's value space). - New "Triage-skipped-all" smoke-test recipe added covering the third Step 11 leading-line branch. MEDIUM: - Stability policy 4-step flow's "Status column" reference was a typo (registry tables have no Status column — convention is cell-based). Both files now reference "the cell-based marking convention below" with a pointer to the strikethrough + trailer-cell append rule. LOW: - pr-fix Usage gained an "Environment variables" subsection documenting that no env vars are read today, for symmetry with /pr-review's equivalent subsection. - "Effective from this commit forward" anchor expanded to include the date (2026-05-04) and a `git log --diff-filter=A` recipe so a future reader can pin the registry's introduction commit. - Deprecation example row wrapped in  /  HTML comments to make accidental copy-paste-into-prod loud-fail visible in raw view. - Orphan-stash sentinel (pr-fix Step 2 Pre-condition B) now emits a single-line summary (count + first match) for the sentinel itself, with the full multi-line list going to stderr — preserves the "single grep-able line" invariant. Recovery instruction switched to by-SHA (git rev-parse <ref> first, then git stash drop <sha>) for positional-ref-shift safety. - RECONCILE_FAILED grammar gained a 20-ID line-length cap with "(+<N-20> more)" overflow indicator + full list to stderr. - Stash-pop conflict recipe in pr-review converted from prose pseudocode to executable bash (with $F and $V_FIX as named pre-conditions and an explicit cleanup block). - Whitespace-only ASSEMBLED_PROMPT recipe duplicated verbatim across both files now has a "DRIFT-CHECK: keep in sync with /pr-fix smoke-tests" / mirror reminder; recipe is self-contained (inlines GUARD as an env-passed snippet instead of "<paste the pre-flight guard>" indirection). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ater The previous 7 iterations of additive fixes pushed both files past 1180 lines each. User explicitly asked to "not overcomplicate" — this iteration removes content that wasn't earning its keep. One real bug fixed; the rest is deletion. CONFUSION fix: - pr-review CI mode: the verdict table allowed only APPROVE/REQUEST_CHANGES but the agent-failure path defaulted to COMMENT (not in the table). Simplified to: on agent failure, never APPROVE — REQUEST_CHANGES with the WARNING line so a human resolves it. No third event variant needed. DROP — over-engineered sections removed: - "Smoke tests" sections from both files (~196 lines): theater recipes requiring exotic preconditions (`chmod -w .git/refs/stash` + chflags uchg fallback, fresh GitHub bot accounts, draft PRs with 5+ minute CI fixtures, custom GraphQL author injection). These were written but never run. Removing them does not reduce the safety of any guarded path — the guards are still in the watcher prompt where they execute on real invocations. - Stability policy + Shell requirement + Deprecation marking convention + hypothetical example row (~40 lines per file): speculative governance for downstream parsers that don't exist. Replaced with a single sentence: "Sentinel names and trailer field order are part of the contract — rename or reorder fields only with a deprecation note in the table. Cross-skill changes must land in both registries in the same PR." - pr-fix Step 11 third leading-line case (Triage-skipped-all) + the FIXES_APPLIED / ACTIONABLE_COUNT runtime counters threaded through Steps 5d / 6 / 11: collapsed into two cases (fix pushed / no fix pushed). The body's existing "Skipped: <M> findings (see thread replies for reasons)" line spells out the cause — no separate counter needed. - PR_REVIEW_FIX_BYPASS_STALE_STASH env var + "Environment variables" Usage subsection in pr-review (~15 lines): a knob nobody sets. The bypass logic existed for a hypothetical user who deliberately named their stash with the skill's exact internal subject string. If the stale-stash check ever false-positives, the recovery is one shell command (`git stash drop <ref>`). - pr-fix "Environment variables" symmetry-only subsection: a header announcing it had no content. - pr-fix Step 9.5 RECONCILE_FAILED line-length cap with `(+<N-20> more)` overflow: triggered only on PRs with 20+ simultaneously-defective threads — a state that doesn't happen in normal use. - Inline `# DRIFT-CHECK: keep in sync ...` marker comments in markdown prose: nobody greps markdown files for HTML-style code-comment markers. The cross-skill drift information is already in the surrounding prose. - pr-fix Step 12's standalone "Failure handling discipline" subsection + pr-review Step 9's equivalent: the requirement was restated verbatim inside the watcher prompt body just below. Dropped the duplicate subsections. - pr-review duplicate "Caveat (idempotency boundary)" paragraph at the top of file (line 62): duplicate of the operative caveat inside Step 7 (alt 2) at line 747. Net: pr-review.md 1189 → 1065 lines (-124); pr-fix.md 1188 → 1028 lines (-160). Total -284 lines. Functional behavior unchanged for normal invocations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…+ watcher silent failures After iter-8's 284-line simplification, two orphan references and several real silent-failure paths surfaced. Fixed: HIGH: - pr-review L761 had a trailing clause "we gate the warning prose on the bypass env var so an opted-in user doesn't see the 'stale stash' warning before the bypass confirmation" — orphan from iter-8's deletion of PR_REVIEW_FIX_BYPASS_STALE_STASH. The surviving code unconditionally aborts with FIX_ABORTED. Dropped the trailing clause. - pr-fix Step 5f empty-list smoke-test recipe asserted the pre-iter-8 leading line "PR #<PR> had no actionable comments — no commit, no push." but iter-8 collapsed Step 11 to "PR #<PR_NUMBER> — no commit, no push." (no "had no actionable comments" phrase). Updated the recipe's expected text. MEDIUM (real silent-failure paths): - WATCH_FIX_DONE ci= value space narrowed from PASS/FAIL/PENDING/NA to FAIL/NA/UNKNOWN. Watcher Step 4 captures only ${CYCLE_FAILED_CHECKS} (failed checks); pending checks are NEVER queried. The PASS and PENDING branches were therefore unreachable as written. Replaced with UNKNOWN (fix pushed AND no failed checks, but the watcher does not distinguish PASS vs PENDING — the next 2-minute cycle will pick up any state change). FIX_DONE_PR retains its full PASS/FAIL/PENDING/PENDING_TIMEOUT/NA value space because the main flow's Step 10c does poll with timeout. - Watcher Step 3 ambiguous-conflict path was a silent failure: after `git merge --abort` the cycle continued to Step 6, where it could reach WATCH_FIX_GREEN despite the unresolved base-branch conflict (false-green signal). Added CYCLE_MERGE_AMBIGUOUS flag in Step 3; Step 6 now blocks WATCH_FIX_GREEN when the flag is set and emits WATCH_TRANSIENT_ERROR identifying the ambiguous-conflict files instead. - Watcher Step 4 retry-exhaustion: the "max 2 retries" rule had no documented fallthrough. Added explicit "if both retries fail, stop, fall through to step 5 with ci=FAIL; next cycle re-evaluates" so the user understands retries don't loop indefinitely. LOW: - pr-review registry rows for REVIEW_CLEAN / REVIEW_INCOMPLETE attributed owning-step only to "Step 7 (alt 2)" but the same sentinel strings are also emitted in Step 7 (alt) Local PR mode body (line 674). Updated rows to mention both. - pr-fix Step 11 empty-list path now has a defensive `commit=unknown` fallback if Step 2b's SHA capture is unset (instead of leaking an unrendered placeholder). - pr-fix WATCH_TRANSIENT_ERROR registry row now notes that permanent failures (branch-protection rejection on `git push`, expired auth) flow through this same sentinel and will repeat every cycle until the watcher expires or the user runs CronDelete — so users know to watch for repeated identical sentinels. Net: +14/-14 lines (in-place fixes, no new content of substance). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Splits the unified /pr-review skill into three focused entry points and lifts maintainer-facing material out of every-invocation runtime prompts. Net effect: ~60% smaller per-invocation prompt for pr-review, single sentinel registry, no more stash dance, no more vestigial Codex pass. New structure: .agents/ commands/ AGENTS.md ← maintainer notes (sentinel registry, drift conventions, deprecation flow, bash requirement); loaded ZERO times per user invocation. pr-review-ci.md ← CI verdict mode (APPROVE / REQUEST_CHANGES, CLAUDE_REVIEW_COMPLETE markers, Guidelines Compliance checklist). 168 lines. pr-review-gh.md ← Local PR mode + --watch (COMMENT event, watcher cycle that references base Step 5 instead of duplicating agent definitions). 196 lines. pr-review-local.md ← Pre-PR local review + --fix (terminal-only output; --fix REFUSES on dirty tree — no stash dance). 190 lines. pr-review.md ← DELETED (was 1065 lines). pr-fix.md ← cross-refs updated; sentinel registry section now points at .agents/commands/AGENTS.md. lib/ pr-review-base.md ← shared Steps 3-6 (diff retrieval, project context discovery, 7 SDK-specialized review agents, aggregation). The three pr-review-* skills delegate to this file. 250 lines. Per-invocation runtime prompt cost (entry-point + base): pr-review-ci: 418 lines (was 1065 — 60% reduction) pr-review-gh: 446 lines (was 1065 — 58% reduction) pr-review-local: 440 lines (was 1065 — 59% reduction) Simplifications applied alongside the split (per the user's "where they've outgrown the medium" critique): - Codex pass removed from all three pr-review-* skills. The previous fire-and-forget `codex exec` background invocation was vestigial — no output integration with Step 6's aggregator, optional, no dedup with Claude's findings. If a real Codex co-reviewer is wanted later, build it as Agent 8 in pr-review-base.md Step 5. - /pr-review-local --fix REFUSES on dirty tree (mirrors /pr-fix Step 2a). The previous "stash discipline" — stale-stash awk parser, in-memory revert source, stash-pop conflict sentinel, FIX_DONE_WITH_STASH_CONFLICTS branch — was ~80 lines of plumbing handling the 3-condition edge case (uncommitted work + crashed prior run + lint rejection). Refusing on dirty tree collapses the entire dance: user runs `git stash push -u`, invokes `--fix`, then `git stash pop`. Skill stays out of stash mgmt. FIX_DONE_WITH_STASH_CONFLICTS sentinel removed from registry. - Watcher cycle in pr-review-gh delegates Step 5 (agent fan-out) to pr-review-base.md instead of duplicating the 7 agent prompt blocks inline. Drift hazard between the main-flow Step 5 and the watcher's inline Step 7.a-g is gone — both now point at the same source. - Sentinel grammar registry consolidated into .agents/commands/AGENTS.md. Each entry-point skill no longer carries its own per-skill registry table; they reference the central one. Easier to keep in sync, smaller runtime prompts. - Three skills means each one's argument validation is trivial — no more three-mode dispatch table at the top of a single file. /pr-review-ci refuses if not in CI; /pr-review-gh refuses if in CI; /pr-review-local has no PR_NUMBER. Each is unambiguous about its own preconditions. Cross-skill drift reminders updated: - /pr-fix Step 6 drift reminder now points at /pr-review-local Step 7b (the renamed --fix sub-step). - /pr-fix Notes pairing block now lists all three pr-review-* skills. Files outside packages/ baseline reference now points at .agents/lib/ pr-review-base.md (the canonical shared Step 4) instead of the deleted unified /pr-review Step 4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…try → AGENTS.md Mirrors the iter-10 split for pr-fix: extract the per-category routing (the actual contract) into .agents/lib/pr-fix-routing.md, and lift the duplicated sentinel registry + smoke-test recipes into the maintainer-only .agents/commands/AGENTS.md. The pr-fix watcher's Step 7 now references the lib instead of inlining a parallel copy of the routing. New files: - .agents/lib/pr-fix-routing.md (76 lines) — routing table (8 rows × exact reply text × resolve-yes/no), one reply-mechanism heredoc template, one resolve-mechanism graphql mutation, output contract (bucket counts). Single source of truth for both main flow Step 9 and watcher Step 7. Changes: - pr-fix.md Step 5f collapsed to a 5-line pointer ("read .agents/lib/ pr-fix-routing.md and treat its routing table as the contract for Step 9 below"). The 8-row routing table + ~30 lines of smoke-test recipe + ~15 lines of empty-list recipe — all gone from runtime. - pr-fix.md Step 9 collapsed from ~110 lines (six per-category bash blocks: 9a, 9b, 9c-skipped, 9c-question, 9c-discussion, 9c-praise, 9c-already, 9c-stale) to ~10 lines that defer to the lib. - pr-fix.md Step 12 watcher Step 7f collapsed from ~25 lines (per-category bullets + heredoc template + per-category "NEVER reference ${CYCLE_HEAD_SHA}" notes) to ~5 lines that defer to the lib with the cycle-substitution rule. Step 7g (resolve mutation) is now part of the lib; the standalone "g." substep is gone — what was step h (CI verdict + sentinel) becomes the new step g. - pr-fix.md "Drift reminder" subsection of Step 12 deleted. The drift hazard the sticker existed to mitigate (parallel routing implementations in main flow vs watcher) is gone now that both reference the lib. - pr-fix.md local sentinel registry table (12 rows × 5 columns) deleted in favor of a one-line pointer to .agents/commands/AGENTS.md. The local table was a duplicate maintained by hand against the canonical one. - AGENTS.md gained a "Smoke-test recipes (maintainer-only, not loaded at runtime)" section with the per-category routing recipe (~30 lines) and the empty-list early-exit recipe (~15 lines) lifted from pr-fix.md. Net: pr-fix.md 1028 → 846 lines (−182). Lib (76 lines) + AGENTS.md content (+33 lines) compensates for some of that, but the runtime-prompt cost dropped substantially: /pr-fix invocation runtime: 846 + 76 = 922 lines (was 1028) /pr-fix --watch invocation runtime: same (lib loaded once) AGENTS.md (maintainer-only): loaded ZERO times per user invocation. The drift hazard between main-flow Step 9 and watcher Step 7 — the smell the iter-9 review flagged — is gone. Both paths now resolve through the same lib, the same way the three /pr-review-* skills resolve through .agents/lib/pr-review-base.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Removes the centralized governance files introduced in the prior two iterations (AGENTS.md sentinel registry + conventions, pr-fix-routing.md). Each skill now owns its own sentinel grammar inline, and pr-fix's routing lives in the file that uses it (with the watcher referencing main-flow Step 9 directly instead of an external lib). Deleted: - .agents/commands/AGENTS.md — was a master file (sentinel registry across all skills, conventions, smoke recipes). Replaced by per-skill inline registries covering only what each skill emits. - .agents/lib/pr-fix-routing.md — was a one-skill helper masquerading as a library. The watcher and the main flow are halves of the SAME skill, not separate consumers — a lib doesn't earn modularity here, it just adds an extra Read per cycle. Routing table + reply mechanism + resolve mechanism + per-category sub-steps (9a / 9c-skipped / 9c-question / 9c-discussion / 9c-praise / 9c-already / 9c-stale) restored inline in pr-fix.md Step 5f and Step 9. Watcher Step 7f now reads "follow main-flow Step 9, substituting ${CYCLE_HEAD_SHA} for <HEAD_SHA>" — single source of truth via cross-section reference within the same file, no external lib. Smoke-test recipes (per-category routing + empty-list early-exit) deleted entirely. They were theater per iter-8 — never actually run, required exotic preconditions (fresh GitHub bot accounts, draft PRs with 5-minute CI, chmod on .git internals). The maintainer-only-not-runtime classification was the right idea but the recipes themselves don't earn the bytes anywhere. Per-skill sentinel grammar restored: - pr-fix.md sentinel registry covers only its own sentinels (RECONCILE_OK, RECONCILE_FAILED, FIX_DONE_PR, WATCH_REJECTED, WATCH_TRANSIENT_ERROR, WATCH_PR_CLOSED, WATCH_FIX_GREEN, WATCH_LINT_FAILED, WATCH_FIX_DONE). - pr-review-ci.md adds an inline grammar table (one row: REVIEW_DONE_PR). - pr-review-gh.md adds an inline grammar table (8 rows: REVIEW_CLEAN, REVIEW_INCOMPLETE, REVIEW_DONE_PR, WATCH_REJECTED, WATCH_TRANSIENT_ERROR, WATCH_PR_CLOSED, WATCH_REVIEW_CLEAN, WATCH_REVIEW_DONE). - pr-review-local.md adds an inline grammar table (5 rows: REVIEW_CLEAN, REVIEW_INCOMPLETE, REVIEW_DONE_LOCAL, FIX_DONE_LOCAL, FIX_ABORTED). Each skill's grammar is now next to its emit sites — no master registry to drift against. What's kept: - .agents/lib/pr-review-base.md — three distinct skills (pr-review-ci, -gh, -local) depend on it for Steps 3-6 (diff, project context discovery, the 7 SDK-specialized review agents, aggregation). That's a real cross-skill library boundary, not a master file. Say if you want this inlined too. Net runtime cost per invocation: - /pr-review-ci: 172 + 250 = 422 lines (+4 vs prior) - /pr-review-gh: 207 + 250 = 457 lines (+11 — gained inline registry) - /pr-review-local: 198 + 250 = 448 lines (+8 — gained inline registry) - /pr-fix: 905 lines (was 846 — gained inline routing + registry) - /pr-fix --watch: same single file load (was 846 + 76 lib = 922; now 905, watcher refers to in-file Step 9 instead of external file) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…s + redundancy Five findings from /ben-local-review on the iter-12 master-file removal commit. Three orphan-bullet leftovers + one redundant inline block + one cross-skill sentinel-trailer drift. HIGH: - pr-fix.md L777 stale `Step 7h` cross-reference. The watcher's terminal sentinel was renumbered to Step 7g when the lint-failed sub-step was inlined, but a "(so Step 7h emits ci=FAIL)" pointer in Step 4c still referenced the old name. Updated to Step 7g. - pr-review-gh.md Notes — two orphan bullets that survived iter-12's "no master/god files" sweep: - "No Codex pass: previous iterations included an optional fire-and-forget codex exec invocation. Removed in the iter-10 split — it was vestigial." - "Cross-skill watcher considerations: if a user runs /pr-review-gh <PR> --watch AND /pr-fix <PR> --watch together..." Both are runtime archaeology / governance content, exactly the shape of the lines that DID get removed. Deleted. - pr-review-local.md Notes — surviving "Drift reminder for --fix mechanics" bullet (cross-skill governance reminder). Same pattern, also deleted. MEDIUM: - pr-fix.md Step 9 had a "Per-category sub-steps" bullet list (9a / 9c-skipped / 9c-question / 9c-discussion / 9c-praise / 9c-already / 9c-stale) that restated in prose what the Step 5f routing table just specified in tabular form. Pure repetition. Deleted; the Step 5f table + the reply/resolve mechanism + the "Resolves fire for: ... Leave unresolved for: ..." summary line above it are sufficient. - pr-review-gh.md REVIEW_INCOMPLETE trailer was missing the (<names>) field that pr-review-local.md has. Same sentinel name = same trailer is the contract. Added (<names>) to gh's main-flow body, watcher prompt body, and registry-table row so a downstream parser greping `Sentinel: REVIEW_INCOMPLETE — \d+ of 7 agents failed $<names>$` matches both skills. Net: −17 / +4 lines. pr-fix.md 905 → 895; pr-review-gh.md slightly slimmer on the bullet deletion and slightly fatter on the (<names>) addition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

0xbulma · 2026-05-05T09:14:42Z

Self-review (iteration history + structural notes)

This PR landed via 13 review-and-fix iterations. The first commit was a faithful port from the personal /ben-pr-{review,fix} skills; everything after was refining for this monorepo. Posting the journey here for reviewers who want the "why" behind the structure, and for future maintainers.

Phases

Phase 1 — Initial port (iter-1). Took the personal /ben-pr-review and /ben-pr-fix skills and merged them with the SDK-specific /pr-review and /pr-fix that were already in the repo. Hybrid: kept the 7 SDK-specialized review agents (Module & API Architecture, Web3 Security, Documentation Analyzer, etc.) as the project's baked-in agent set; adopted ben-*'s structural improvements (frontmatter-less, severity scheme normalized to Critical/High/Medium/Low, adaptive project context detection, watcher SHA fallback). Also added references to the project's authoritative AGENTS.md (root, canonical) + MISSION.md + per-package AGENTS.md/README.md/ARCHITECTURE.md. Added --local to /pr-review matching /ben-pr-local-review.

Phase 2 — Dogfooding rounds (iter-2 through iter-7). Ran /pr-review --local on each commit's diff and applied the findings. Caught real bugs: stale-stash awk parser silently failing because it didn't strip the On <branch>: prefix (the entire safety check was a no-op for several iterations); ${VAR}=$(...) invalid bash syntax peppered through both watcher prompts (would die on cycle 1 if interpreted literally); <[A-Z_]+> pre-flight regex false-positive on legitimate output template tokens; WATCH_FIX_DONE ci=PENDING branch unreachable because Step 4 never queries pending checks; sentinel format drift between modes; cycle-derived placeholders missing the CYCLE_ prefix in pr-review while pr-fix had them. Each iteration tightened a specific class of silent failure or contract drift.

Phase 3 — Simplification (iter-8). Cut 284 lines of accreted theater after the user pointed out the files had grown to 1180+ lines each. Deleted: smoke-test sections (theater recipes requiring chmod -w .git/refs/stash and fresh GitHub bot accounts — written but never run); a Stability policy + Shell requirement + deprecation example block (speculative governance for parsers that don't exist); a third Step 11 leading-line case + FIXES_APPLIED/ACTIONABLE_COUNT runtime counters (collapsed to two cases); PR_REVIEW_FIX_BYPASS_STALE_STASH env var (knob nobody sets); RECONCILE_FAILED line-length cap (triggers only on 20+ defective threads — never happens); inline # DRIFT-CHECK: markers (nobody greps markdown for them); a duplicated Caveat (idempotency boundary) paragraph.

Phase 4 — Post-simplification cleanup (iter-9). Two HIGH orphan references from iter-8's deletions (a bypass env var clause referencing a deleted env var, an empty-list smoke recipe asserting pre-iter-8 leading-line text). Three real silent-failure paths in the watcher: ambiguous-conflict path could reach WATCH_FIX_GREEN despite unresolved conflicts; Step 4 retry exhaustion had no documented fallthrough; WATCH_FIX_DONE ci= value space narrowed to FAIL/NA/UNKNOWN after discovering Step 4 never captures pending checks (so PASS and PENDING branches were unreachable as written).

Phase 5 — The split (iter-10). Split the unified /pr-review into three single-mode entry points (pr-review-ci, pr-review-gh, pr-review-local) with a shared library (.agents/lib/pr-review-base.md) for Steps 3–6 (diff, project context, 7 agents, aggregation). Each entry point handles only its mode-specific Steps 1–2 and Steps 7+. Plus simplifications: dropped the vestigial Codex pass; replaced --local --fix's ~80-line stash-and-pop discipline with a one-line refuse-on-dirty-tree gate; watcher cycle in pr-review-gh delegates Step 5 to the base instead of inlining a parallel copy of the 7-agent prompt block.

Phase 6 — Symmetric extraction for pr-fix (iter-11). Same trick on /pr-fix: extracted Step 5f routing table + Step 9 reply/resolve mechanisms into .agents/lib/pr-fix-routing.md; watcher Step 7 referenced the lib; sentinel registry + smoke-test recipes lifted into .agents/commands/AGENTS.md (maintainer-only, not loaded at runtime).

Phase 7 — Drop the master files (iter-12). Maintainer feedback: the AGENTS.md registry was a god file, and pr-fix-routing.md was a one-skill helper masquerading as a library (the watcher and main flow are halves of the same skill, not separate consumers). Deleted both. Re-inlined the routing in pr-fix.md Step 5f + Step 9. Watcher now references main-flow Step 9 directly: "follow Step 9, substituting ${CYCLE_HEAD_SHA} for <HEAD_SHA>" — single source within the same file. Each /pr-review-* skill got its own inline sentinel-grammar table covering only what it emits. Smoke-test recipes deleted entirely (theater). Kept .agents/lib/pr-review-base.md because it's a real cross-skill library used by 3 distinct consumers, not a master file.

Phase 8 — Final orphan cleanup (iter-13). Three orphan bullets/cross-references survived iter-12's master-file sweep (a stale Step 7h pointer; "No Codex pass" archaeology; "Cross-skill watcher considerations" governance note; a "Drift reminder for --fix mechanics" sticker). Plus one redundant inline block in pr-fix Step 9 (a per-category sub-step list that restated the Step 5f routing table in prose) and a <names> field missing from pr-review-gh's REVIEW_INCOMPLETE trailer (drifted from pr-review-local).

Final structure

.agents/
  commands/
    pr-review-ci.md       172 lines  ← inline sentinel grammar (1 row)
    pr-review-gh.md       205 lines  ← inline sentinel grammar (8 rows)
    pr-review-local.md    197 lines  ← inline sentinel grammar (5 rows)
    pr-fix.md             895 lines  ← inline routing + reply/resolve + sentinel registry
  lib/
    pr-review-base.md     250 lines  ← 3 distinct skills depend on this; real library

Per-invocation runtime prompt cost vs the previous 1065-line monolith:

/pr-review-ci: 422 lines (60% reduction)
/pr-review-gh: 455 lines (57% reduction)
/pr-review-local: 447 lines (58% reduction)
/pr-fix: 895 lines (was 1188 — 25% reduction)

What I'd flag for human review

Test on a fresh session. All 13 self-reviews ran in the same conversation, which means cached context made some checks easier than they'd be cold. The /pr-review-{ci,gh,local} commands say "Read .agents/lib/pr-review-base.md and follow Steps 3–6 there" — verify on a fresh invocation that the model actually does the Read tool call instead of hallucinating Steps 3–6 from training-data analogs.
Watcher pairing semantics. Running /pr-review-gh <PR> --watch AND /pr-fix <PR> --watch together: both crons fire every 2 min, no cross-cron coordination. The pairing works (review posts → fix replies → next cycle picks up new state), but I removed the doc note about this in iter-13's "stop describing absences" round. If the loop semantics bite a user, the right response is to either fix the structure (e.g. a single /pr command with --watch doing both) or add the note back — not to leave a sticker.
pr-review-base.md as a kept dependency. It's 250 lines, depended on by 3 skills. I argued it's a real library (3 consumers, byte-identical Steps 3–6) rather than a master file. If you disagree and want strict per-skill isolation, inlining tripples to ~750 lines added across the three entry points and the contract is still byte-identical — so the maintenance cost is concrete.
/pr-fix watcher and main flow share a routing table inline (no lib). The watcher's Step 7f reads "follow main-flow Step 9, substituting ${CYCLE_HEAD_SHA} for <HEAD_SHA>" — cross-section reference within one file. If a future change to Step 9 forgets to keep the watcher's substitution rule valid, the drift is silent. Both files are short enough that a careful reviewer should catch it; if it bites, extract a real shared piece.
Sentinel grammars are the documented contract. Downstream parsers MAY rely on them. The per-skill inline tables are the source of truth. No central registry to drift against (intentionally).

Stats

13 commits, all signed (Claude Opus 4.7 (1M context) co-authorship).
Net diff vs main: ~+870 / −1110 lines depending on which baseline you compare to (the unified /pr-review.md was 1065 lines and is now deleted).
6 files added (3 entry points + 1 library + the 13 commit messages serve as the change log).

🤖 Generated with Claude Code

Six unique defects fixed across the four skill files (one verdict-table gap was flagged by both reviewers). All concrete bugs an AI agent following the prompt literally would hit; all fixes confirm and apply the suggested correction. - pr-review-ci.md verdict table (Codex P1 + Devin): the "Request Changes" row said "Any critical, OR multiple high, OR <FAILED_AGENTS> non-zero" — a PR with exactly 1 high and 0 critical and 0 failed agents matched neither APPROVE (which requires zero high) nor REQUEST_CHANGES. Changed "multiple high" → "any high" so the two rows are exhaustive complements. - pr-review-ci.md fallback API call (Devin): the fallback `gh api ...issues/ comments` had no `--method POST` and no body field — `gh api` defaults to GET, so the literal command would list comments instead of posting one. Added `--method POST -f body="<REVIEW_BODY>"`. - pr-review-local.md validation table (Devin): row 3 ("Findings + agent crash") asserted REVIEW_INCOMPLETE, but Step 7's sentinel logic emits REVIEW_DONE_LOCAL for any non-zero findings (with WARNING prepended). Updated the table row to match Step 7's actual output and added a separate row for "Zero findings + agent crash". Step 7 prose updated to say the WARNING line is prepended in the findings + crash case. - pr-review-local.md same-branch shortcut (Codex P2): the early-exit on HEAD_BRANCH == BASE_BRANCH AND clean tree skipped review even when the branch had unpushed commits ahead of origin/<BASE_BRANCH> — common case for `main` with local commits. Replaced with a real commit-range diff check (git diff --name-only "$MERGE_BASE..HEAD") AND a clean-tree check. - pr-fix.md orphan-stash recovery (Codex P1): the recovery instruction said "for each orphan, run `git rev-parse <ref>` to get its SHA, then `git stash drop <sha>`". `git stash drop` only accepts stash refs (stash@{N}), not commit SHAs — the documented cleanup path failed exactly when orphan stashes were detected. Rewrote to instruct dropping by ref highest-index- first so positional refs of remaining stashes don't shift. - Markdown blank-line fix (Devin): `## Sentinel grammar` headings in pr-review-{ci,gh,local}.md immediately followed list items with no blank line. CommonMark-strict parsers treat that as list continuation, not a heading. Added blank lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tion The degraded sentinel emitted when `git stash drop stash@{0}` fails told the user to run `git stash drop ${CYCLE_LINT_STASH_SHA}`, but that variable is a commit SHA (from `git rev-parse stash@{0}`). `git stash drop` accepts only stash@{N} refs, not commit SHAs — line 741's orphan-stash recovery was already corrected to state this constraint, but the line 804 path was missed. Replace the broken `git stash drop <SHA>` suggestion with the same idiom used at line 741: `git stash list --format='%H %gd' | grep ^<SHA>` to find the current stash@{N} ref, then drop by ref. Restate the constraint so the two recovery paths agree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n on lint failure Devin (PR #599) flagged a real bug in /pr-review-local --fix's apply-fixes loop: when fix N+1 on the same file fails lint, `git checkout -- <file>` reverts the ENTIRE file to HEAD — including fix N which already passed lint and was tracked as "applied". Counter says `Fixed: X` but `git diff` shows nothing for that file. Silent destruction. Replaced the per-finding apply-then-validate loop with a per-file batch: process all findings on a given file as one unit, run lint once, revert all-or-nothing. If lint passes the batch, every finding for the file is applied. If lint fails the batch, the entire file is reverted and every finding for the file is marked skipped: lint rejected the batch. The user sees a consistent picture in `git diff` — what's reported as applied is what's actually on disk. Same pattern as /pr-fix's watcher Step 7b stash-based revert (revert the batch as a unit), now consistent across both --fix flows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…cher Devin (PR #599) caught a critical bug in the pr-review-gh watcher's Step 2 CYCLE_LAST_REVIEWED_SHA jq filter. `null | test("...")` is a fatal jq error, and any review with `body: null` triggers it (extremely common — a human clicking "Approve" in the GitHub UI without typing text produces a review with null body). jq's `or` short-circuits left-to-right: when `.user.login == $login` is false (any non-bot user), evaluation moves to the right side, which crashes if `.body` is null. The crash exits jq non-zero, the watcher sees a non-zero exit, emits WATCH_TRANSIENT_ERROR, and ends the cycle. Because the null-body review persists in the API response, EVERY subsequent cycle hits the same crash — the watcher is permanently dead until the 3-day expiry or manual CronDelete. Whole watcher loop is busted by a single human approval-without-text on the PR. Devin's fix is correct: null-coalesce body to empty string before test(). (.body | test("...")) → ((.body // "") | test("...")) The old pr-review.md watcher (pre-iter-10 split) only filtered on .user.login and didn't have this issue — the body-pattern fallback was introduced in this PR to catch reviews after a bot rename. Keeping the fallback (it's load-bearing for the bot-rename case) but null-guarding it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

0xbulma added the enhancement New feature or request label May 4, 2026

0xbulma self-assigned this May 4, 2026

0xbulma and others added 2 commits May 4, 2026 17:01

0xbulma force-pushed the 0xbulma/upgrade-pr-skills branch from 51be1e8 to a477706 Compare May 4, 2026 15:02

0xbulma and others added 11 commits May 4, 2026 17:29

0xbulma requested a review from Foulks-Plb May 5, 2026 09:13

0xbulma marked this pull request as ready for review May 5, 2026 09:13

This comment was marked as resolved.

Sign in to view

Foulks-Plb approved these changes May 5, 2026

View reviewed changes

0xbulma merged commit 85fc5f8 into main May 5, 2026
19 checks passed

0xbulma deleted the 0xbulma/upgrade-pr-skills branch May 5, 2026 16:27

0xbulma mentioned this pull request May 7, 2026

chore: sync .claude and .codex symlinks with split pr-review commands #636

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: upgrade /pr-review and /pr-fix slash commands#599

chore: upgrade /pr-review and /pr-fix slash commands#599
0xbulma merged 17 commits into
mainfrom
0xbulma/upgrade-pr-skills

0xbulma commented May 4, 2026 •

edited

Loading

Uh oh!

0xbulma commented May 5, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0xbulma commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's new vs the previous /pr-review

What's intentionally not changed

Migration

Test plan

Uh oh!

0xbulma commented May 5, 2026

Self-review (iteration history + structural notes)

Phases

Final structure

What I'd flag for human review

Stats

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0xbulma commented May 4, 2026 •

edited

Loading