Skip to content

chore: upgrade /pr-review and /pr-fix slash commands#599

Merged
0xbulma merged 17 commits into
mainfrom
0xbulma/upgrade-pr-skills
May 5, 2026
Merged

chore: upgrade /pr-review and /pr-fix slash commands#599
0xbulma merged 17 commits into
mainfrom
0xbulma/upgrade-pr-skills

Conversation

@0xbulma
Copy link
Copy Markdown
Collaborator

@0xbulma 0xbulma commented May 4, 2026

Summary

Brings the project's /pr-review and /pr-fix slash commands up to the patterns proven in the personal /ben-pr-{review,fix,local-review} skills, while preserving SDK-specific specialization (7-agent review set, hardcoded pnpm/biome quality gates, packages/<pkg> conventions, JSDoc style guide reference).

The unified /pr-review skill has been split into three focused entry points sharing a common review library:

.agents/
  commands/
    pr-review-ci.md       — CI verdict mode (APPROVE / REQUEST_CHANGES, runs in GitHub Actions)
    pr-review-gh.md       — Local PR review (posts as COMMENT) + --watch
    pr-review-local.md    — Pre-PR local review (terminal-only) + --fix
    pr-fix.md             — Apply fixes for unresolved PR review comments
  lib/
    pr-review-base.md     — Shared Steps 3–6 (diff, project context, 7 review agents, aggregation)

Each entry point is single-mode (no internal CI-vs-local-vs-watch dispatch); shared logic lives in the lib.

What's new vs the previous /pr-review

  • Three modes, three commands. No more if CI then ... else if --local then ... branching — each command is one shape.
  • /pr-review-local is a new pre-PR review flow: terminal output, no GitHub interaction, optional --fix to apply fixes locally (refuses on dirty tree — no stash-and-pop machinery).
  • Adaptive project context discovery. Step 4 reads root AGENTS.md (canonical; CLAUDE.md is a symlink), MISSION.md, CONTRIBUTING.md, biome.json always; conditionally adds docs/jsdoc-style.md (when an exported symbol or JSDoc is touched), SECURITY.md, docs/tibs/TEMPLATE.md. Per-package AGENTS.md/README.md/ARCHITECTURE.md only for packages touched by the diff.
  • Sentinel discipline. Every terminal step ends with a single grep-able Sentinel: NAME — <prose> line (REVIEW_DONE_PR, REVIEW_CLEAN, REVIEW_INCOMPLETE, WATCH_REJECTED, FIX_DONE_LOCAL, etc.). Each skill's grammar table is inline — no master/god registry.
  • Whitespace-only validation. gh pr view field validation uses [ -z \"\${X//[[:space:]]/}\" ] (not bare [ -z \"\$X\" ]) — whitespace-only branch names no longer silently corrupt downstream commands.
  • Watcher robustness in /pr-fix --watch: clean-tree gate at cycle start, orphan-stash detection from prior crashed cycles, `git stash push -u && git stash drop` for lint-failure rollback (recoverable from reflog by stable commit SHA), explicit `WATCH_PR_CLOSED` / `WATCH_REVIEW_CLEAN` / `WATCH_TRANSIENT_ERROR` sentinels for every cycle terminus.
  • /pr-fix Step 5f routing table. Seven categories × verbatim reply text × resolve-yes/no — auditable contract used by both Step 9 and Step 12 watcher Step 7. Includes a Mixed-purpose variant (Question + Actionable).

What's intentionally not changed

  • 7 SDK-specialized review agents (`Code Quality`, `Module & API Architecture`, `Web3 Security`, `Silent Failure Hunter`, `Style & Conventions Compliance`, `Documentation Analyzer`, `Test Coverage Analyzer`) kept verbatim — this is the project's domain knowledge.
  • `pnpm exec biome check` / `pnpm lint` / `pnpm -r --if-present build` hardcoded — this repo is always pnpm; dynamic discovery would add noise.
  • `packages/` conventions in agent prompts (NodeNext `.js` suffix, SDK type reuse, `bigint` for onchain quantities, etc.) preserved.
  • CI markers (``, ``) preserved in CI mode body for the project's CI gates.
  • `Co-Authored-By: Claude Opus 4.7 (1M context)` attribution in commits.

Migration

Users invoking the old `/pr-review` should switch to one of:

  • `/pr-review-ci` in CI environments
  • `/pr-review-gh` for local PR review (with optional `--watch`)
  • `/pr-review-local` for pre-PR feedback (with optional `--fix`)

`/pr-fix` is unchanged at the user-facing level (same arguments, same behavior).

Test plan

  • Run `/pr-review-local` against a small open PR; verify 7 agents launch in parallel, output uses Critical/High/Medium/Low labels, terminates with `Sentinel: REVIEW_DONE_LOCAL` or `REVIEW_CLEAN`.
  • Run `/pr-review-gh ` against a low-risk open PR; verify atomic review posts to GitHub with `COMMENT` event, body includes severity table, terminates with `Sentinel: REVIEW_DONE_PR`.
  • Run `/pr-fix ` against the resulting review; verify Step 5d prints triage assessment, Step 9 routes per Step 5f table, Step 11 emits `Sentinel: FIX_DONE_PR`.
  • Run `/pr-review-local --fix` on a clean tree; verify edits applied unstaged, lint failures revert via in-memory snapshot, terminates with `Sentinel: FIX_DONE_LOCAL`.
  • Run `/pr-review-local --fix` on a dirty tree; verify it aborts with `Sentinel: FIX_ABORTED — working tree is not clean`.
  • (Watcher smoke, optional) Run `/pr-review-gh --watch`; verify CronCreate succeeds, push a trivial commit, wait one cron tick, confirm re-review posts. Tear down with `CronDelete`.

🤖 Generated with Claude Code

@0xbulma 0xbulma added the enhancement New feature or request label May 4, 2026
@0xbulma 0xbulma self-assigned this May 4, 2026
0xbulma and others added 2 commits May 4, 2026 17:01
Brings the project's /pr-review and /pr-fix commands up to the patterns
proven in the personal /ben-pr-{review,fix,local-review} skills, while
preserving the SDK-specific specialization (7-agent review set, hardcoded
pnpm/biome quality gates, packages/<pkg> conventions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tightens the new --local flow and the watcher prompts in response to a
parallel local review (5 agents). Themes:

- Mode dispatch: explicit argument-validation step at the top of pr-review
  Step 1 (rules in order) plus a "valid combinations" table. --local always
  wins over CI env detection; --local + --watch warns and drops --watch;
  --local + <PR_NUMBER> warns and drops the PR number.

- Branch derivation safety: capture default-branch detection into a variable
  with a verified fallback chain (origin/main → origin/master → abort);
  detached-HEAD treated as a display-only short SHA before any equality
  check; gh pr view exit-status checked and all four required fields
  validated non-empty before downstream commands run.

- --fix safety: stash any uncommitted user work BEFORE the fix loop so a
  failed-lint revert can re-Edit from an in-memory original snippet without
  destroying pre-existing uncommitted changes; restore via git stash pop
  after the loop.

- Aggregator + sentinels: every parallel agent must return either a JSON
  array OR an explicit {"agent_error": "..."} sentinel. Step 6 counts
  failed agents; Step 7 distinguishes "REVIEW_CLEAN" (no findings,
  zero failures) from "REVIEW_INCOMPLETE" (no findings, some agents
  crashed). All terminal step variants end with a single grep-able
  Sentinel: line.

- Watcher discipline (both files): explicit CronCreate-time vs cycle-derived
  placeholder convention with a pre-flight check that aborts CronCreate if
  any unsubstituted <[A-Z_]+> remains. Cycle-derived ${CYCLE_HEAD_SHA}
  replaces the SHA-staleness trap where the watcher would otherwise reply
  with the CronCreate-time SHA forever. Every Run line has an exit-status
  check; gh API failure (auth/rate-limit/network) no longer falls through
  to "review everything" / "review none". jq filter binds <BOT_LOGIN> via
  --arg to dodge shell-quoting injection. Drift reminder added so future
  Step 5 / Step 5f changes stay propagated to the watcher prompt.

- Triage gap (pr-fix Step 5/9 + watcher Step 7): added a per-category
  routing table (5f) covering all six categories from 5a — actionable+fixed,
  actionable+skipped, question, discussion, praise, already-addressed,
  stale — with distinct reply text and resolve/leave-open behavior per
  category. Watcher Step 7f mirrors the table.

- Project-context discovery: tighten find excludes (.git, dist, build,
  .next, coverage, .turbo, .context); drop the maxdepth-2 contradiction;
  prefer Glob tool. Add explicit "files outside packages/" handling
  (root baseline only) and a worked example. Echo the list of context
  files actually read so omissions are visible.

- Pagination preflight (pr-fix Step 4): if reviewThreads.pageInfo
  .hasNextPage is true, abort loud rather than silently truncating
  threads beyond the first 100.

- CI poll timeout (pr-fix Step 10): track CI_POLL_TIMED_OUT and report
  "CI: PENDING (timed out after 2.5min — re-check ...)" instead of a bare
  PENDING that reads as terminal.

- Misc bugs: lowercase <pr-number> swept to <PR_NUMBER> in both files;
  ✅/❌ emojis dropped from CI verdict template; <MERGE_BASE_SHORT>
  added to the placeholder convention; "Local Code Review" header renamed
  to "Local-only Code Review" for terminology consistency; pr-fix Step 6a
  test/ path corrected (this monorepo has no top-level test/ directory);
  Codex pass validates branch names against ^[A-Za-z0-9._/-]+$ before
  shell substitution; "skip to Step 8" reworded to "proceed to Step 8";
  TWO-PHASE banner mentions the three Step 7 variants and Step 9.5;
  pairing note in pr-fix Notes mentions /pr-review --local as the pre-PR
  half of the loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0xbulma 0xbulma force-pushed the 0xbulma/upgrade-pr-skills branch from 51be1e8 to a477706 Compare May 4, 2026 15:02
0xbulma and others added 11 commits May 4, 2026 17:29
…cs/jsdoc-style.md

Iteration 3 of /pr-review and /pr-fix in response to running /pr-review --local
on the previous commit. Five HIGH-severity bugs from that pass + the medium and
low cleanups, plus a new JSDoc-doc reference per request.

JSDoc reference (new requirement):
- pr-review Step 4 always-read baseline now includes docs/jsdoc-style.md
  (canonical guide; operationalizes AGENTS.md §6 + MISSION.md goal #3),
  read whenever the diff touches an exported symbol or any JSDoc/@example.
  Backed by docs/tibs/TIB-2026-05-04-jsdoc-coverage-on-exported-symbols.md.
- SECURITY.md moved out of "Always read" into "Conditional baseline"
  (it was always-read but described as conditional).
- pr-review Agent 6 (Documentation Analyzer) prompt cites
  docs/jsdoc-style.md as the canonical shape; its @example checklist now
  references the runnable-recipe rule from the style guide.
- pr-review watcher's Step 7 documentation agent mirrors the same.
- pr-fix Step 6a context-gathering reads docs/jsdoc-style.md when the fix
  touches an exported symbol / @example / JSDoc.

HIGH-severity fixes:
- Watcher pre-flight `<[A-Z_]+>` regex was a false-positive on legitimate
  output template tokens like <N>, <X>, <Y>, etc. Replaced with an
  ALLOWLIST_REGEX that matches ONLY the seven static placeholders by name.
  Three classes of placeholders are now explicitly documented (CronCreate-time
  / cycle-derived / report-template) in both watcher prompts.
- Invalid bash syntax `${VAR}=$(...)` was used throughout both watcher
  prompts. Bash assignment is bare-LHS (`VAR=$(...)`); the brace form parses
  as a command. Rewritten to natural-language `set CYCLE_VAR = ...` form
  with an explicit note clarifying that the watcher reads steps as
  instructions, not as a verbatim bash script.
- Invalid `gh api --jq --arg login ...` invocation: gh's --jq accepts only
  the filter as a single arg; --arg is a jq flag. Rewritten to capture
  gh output then pipe into a separate `jq --arg login` call.
- pr-fix Step 9.5 reconciliation only acknowledged 2 outcomes (fixed /
  skipped) but Step 5f had just defined 7 categories. Rewrote Step 9.5 to
  enumerate all 7 terminal states (with the exact reply text per category)
  + a Sentinel: RECONCILE_OK / RECONCILE_FAILED line that confirms the
  pass actually ran.
- pr-fix watcher lint-failure path now reverts the unstaged Edit-tool
  changes (`git checkout -- <files>`) before ending the cycle, otherwise
  the next cycle's `gh pr checkout` runs on a dirty tree and `git merge`
  refuses with a misleading transient-error sentinel. Emits
  Sentinel: WATCH_LINT_FAILED so the user can grep for it.
- pr-review Step 7 (alt 2b) stash-pop conflict now emits a dedicated
  Sentinel: FIX_DONE_WITH_STASH_CONFLICTS line with the conflicting file
  list — the previous "Changes are unstaged. Review with: git diff" sign-off
  was actively misleading when conflict markers were present.

Sentinel namespacing:
- REVIEW_DONE → REVIEW_DONE_PR (CI / Local PR modes) and REVIEW_DONE_LOCAL
  (Local-only mode). Both terminal sentinels are now distinct strings so
  downstream parsers can grep on the trailer prefix alone.
- New Sentinel: FIX_DONE_PR added to pr-fix Step 11 so /pr-fix has a
  terminal grep target matching /pr-review's structure (the "match the
  format" claim that was previously misleading is now honestly aligned).
- pr-fix watcher's WATCH_FIX_DONE now reports both ${CYCLE_MERGE_SHA}
  (previously captured-but-unused) and ${CYCLE_HEAD_SHA}.

Aggregator and dedup tightening (pr-review Step 6):
- Per-element schema validation: agents that return arrays containing
  malformed finding objects (missing severity / non-numeric line / etc.)
  count as PARTIALLY failed — keep the valid findings but include the
  agent in <FAILED_AGENTS>.
- Dedup rule tightened: ±3 line tolerance now requires BOTH severity AND
  description overlap. Distinct findings on adjacent lines (e.g. one agent
  flags missing JSDoc on line 42 export, another flags swallowed catch on
  line 42) are no longer silently merged.

Other content fixes:
- pr-review valid-argument-combinations table at the top (replaces three
  scattered statements of the rule).
- pr-review Step 1 sub-section heading prefix consistency: 1c, 1d added.
- pr-review placeholder convention split into Static vs Computed-at-runtime
  tables; <MERGE_BASE_SHORT> formally added; <MERGE_BASE> table entry notes
  the watcher's cycle-derived form.
- pr-review Validating --local end-to-end smoke-test recipe added,
  documenting the determinism boundary (LLM finding-text drift across runs).
- pr-review Step 7 (alt 2) sentinel suppressed when --fix is set so users
  see one combined run with one terminal sentinel.
- pr-review Codex pass validates branch names against ^[A-Za-z0-9._/-]+$
  before substitution, refusing to splice unsafe characters into the shell.
- pr-fix Step 4 GraphQL query now requests `createdAt` (the
  <comment_created_at> placeholder used by Step 5c was previously undefined).
- pr-fix Step 5a documents Mixed-purpose comments (Question + Actionable)
  with an extended reply text that acknowledges both halves.
- pr-fix Step 5f routing table includes the Mixed-purpose row, and adds a
  smoke-test recipe — eight throwaway threads, one per category — that
  catches future regressions where two categories silently merge.
- pr-fix Step 6b confidence assessment now uses `Confidence: HIGH/MEDIUM/LOW`
  prefix so the tokens never collide with severity (Critical/High/Medium/Low).
- pr-fix Step 10 disambiguates `gh pr checks --watch` (a gh CLI flag) from
  /pr-fix --watch (the user-facing skill flag).
- Whitespace-only field validation: both files now reject whitespace-only
  values from `gh pr view` (previously `[ -z "$X" ]` passed for "  ").
- pr-fix watcher merge-conflict commit message uses heredoc form (was
  vulnerable to <BASE_BRANCH> containing $ or backticks).
- pr-fix watcher heredoc reply tag standardized on REPLY_EOF (was REPLYEOF).
- pr-fix watcher code-fence tagged ```text for cleaner rendering.
- pr-fix watcher cycle-local variables renamed to ${CYCLE_*} convention
  (CYCLE_PR_STATE, CYCLE_FAILED_CHECKS, CYCLE_RUN_ID, CYCLE_EXPECTED_HEAD_SHA).
- pr-fix watcher Step 7f reply-text contract now spelled out per category
  with explicit "NEVER reference ${CYCLE_HEAD_SHA} here" notes for non-
  actionable categories where step e never runs.
- pr-review --local --fix Fix Summary format renamed and acknowledged as
  intentionally distinct from /pr-fix Step 11 (the previous "match the
  format" claim was misleading; both ends now end with grep-able sentinels).
- pr-review Notes ⚠ emoji warnings replaced with WARNING: prose.
- Drift reminders: pr-review Step 7 now flags drift between --local --fix
  and /pr-fix Step 6; pr-review watcher Step 7 already flagged drift vs
  Step 5; pr-fix watcher already flagged drift vs Step 5f.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…try, lint cleanup safety

Removes .agents/guidelines.md and .agents/skills/ references — neither
exists in this repo (per user request).

Cross-skill cycle-variable naming parity:
- pr-review watcher renamed bare ${PR_STATE} / ${LAST_REVIEWED_SHA} /
  ${MERGE_BASE} / ${FAILED_AGENTS} / ${LAST_REVIEWED_RAW} to
  ${CYCLE_PR_STATE} / ${CYCLE_LAST_REVIEWED_SHA} / ${CYCLE_MERGE_BASE} /
  ${CYCLE_FAILED_AGENTS} / ${CYCLE_LAST_REVIEWED_RAW}, matching pr-fix's
  "every cycle-local variable uses CYCLE_*" convention. Static placeholder
  table footnote updated accordingly.

Watcher cycle-terminus sentinel coverage:
- Added Sentinel: WATCH_REVIEW_CLEAN for the "no new commits" path in
  pr-review (previously silent).
- Added Sentinel: WATCH_PR_CLOSED in both files for the "PR not OPEN"
  termination path (previously silent in both watchers).
- Both make every cycle terminus grep-able.

Empty-prompt guard before ALLOWLIST_REGEX:
- Both watchers now guard `[ -z "$ASSEMBLED_PROMPT" ]` BEFORE the
  allowlist grep. An unset/empty prompt would otherwise pass the check
  by virtue of having nothing to match, and the watcher would be
  scheduled with an empty prompt.

pr-fix watcher Step 2 clean-tree gate:
- Refuses to run when `git status --porcelain` is non-empty at cycle
  start. A prior cycle that crashed mid-Edit could have left orphaned
  modifications that gh pr checkout doesn't clean — the next steps
  would silently carry them forward.

pr-fix watcher Step 7b lint-failure cleanup safety:
- Switched from raw `git checkout -- <files>` to `git stash push -u +
  git stash drop`. `git checkout --` is destructive (it discards ALL
  unstaged changes to the file, not just this cycle's Edit-tool
  changes); the stash-and-drop form preserves prior-cycle work in the
  reflog so a human can recover if needed.

pr-fix watcher Step 3 — restored <BASE_BRANCH> in heredoc commit subject:
- The previous "merge: resolve conflicts with base branch" literal
  dropped traceability info from git history. <BASE_BRANCH> is in the
  CronCreate-time allowlist so it's substituted into the prompt before
  the watcher receives it; the single-quoted MERGEEOF preserves the
  substituted literal verbatim with no shell injection.

WATCH_FIX_DONE / RECONCILE_OK label harmonization:
- Renamed token <S> for "stale" → <ST> (collided with <S>=skipped-with-
  reply in RECONCILE_OK). Both sentinels now use identical bucket labels
  <F> / <SK> / <Q> / <D> / <P> / <A> / <ST> so a downstream parser can
  grep either with the same regex.
- RECONCILE_OK on the empty-list early-exit path (Step 4 → 9.5 → 11)
  documented to emit zero counts.

pr-review --fix pre-flight stale-stash check:
- Before stashing this run's uncommitted work, refuses to start if a
  stale `pr-review --local --fix safety stash` already exists in
  `git stash list`. Catches the regression where a prior crashed run
  left a stash; the new run would otherwise stack another with the same
  message and risk popping the wrong one.

pr-fix outside-packages baseline scope = 5 items:
- Step 6a's "outside packages/" clause now includes CONTRIBUTING.md
  and biome.json (matching pr-review Step 4's always-read baseline).
  Files under .agents/commands/ etc. now get full root-level context
  in /pr-fix, not the narrow 3-file subset.

FIX_DONE_PR sentinel — CI timeout signal + early-exit fields:
- Added `ci=PENDING_TIMEOUT` (CI poll loop timed out — distinct from
  plain PENDING) and `ci=NA` (Step 10 was skipped on the empty-list
  early-exit path).
- Documented exact field values for the empty-list early-exit case so
  the sentinel is always grep-able and clearly distinguishable from a
  "real" run.

Step 5f / Step 9.5 / smoke-test recipe — category count consistency:
- Resolved the 6-vs-7-vs-8 confusion: Step 5a defines SEVEN terminal
  categories; Step 5f routing has eight ROWS (seven categories + a
  Mixed-purpose VARIANT of actionable+fixed); Step 5d's printed bucket
  counts use the seven Step-5a labels.
- Smoke-test recipe step 3 corrected: "actionable: 3" (fixed + skipped
  + mixed-purpose, all classified as actionable in Step 5a) plus 1 each
  for the other six. The previous "every column should show 1" was
  unsatisfiable.
- Step 5f routing rule for distinguishing rows by reply-text grep is
  now a deterministic, ordered ladder (first match wins), with
  Mixed-purpose distinguished BY its `(Re your question:` postscript.

Reciprocal drift reminder + cross-reference narrowing:
- Added DRIFT-CHECK reminder at the top of /pr-fix Step 6 mirroring
  the one already in /pr-review Step 7 (alt 2b). Both narrow the
  cross-reference to "Step 6c-6d" (the actual apply-and-validate
  mechanics — the wider Step 6 includes the context-gathering surface
  which is /pr-fix-specific).

Sentinel grammar registry (NEW — single source of truth in each file):
- Added a "Sentinel grammar registry" section before "Error Handling"
  in both files. Lists every sentinel with its owning step, fire
  condition, and exact trailer grammar. Adding a new sentinel now
  requires adding a row to the registry in the same PR — this is the
  documented contract.

pr-review watcher code-fence language tag:
- Tagged ```text on the watcher CronCreate prompt block, matching
  pr-fix's existing ```text tag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…stash robustness

HIGH:
- Whitespace-only ASSEMBLED_PROMPT silent-pass (both files): both empty-prompt
  guards switched from `[ -z "$X" ]` to `[ -z "${X//[[:space:]]/}" ]` so a
  whitespace-only prompt aborts loud (matching the discipline already used
  for <BASE_BRANCH>/<HEAD_BRANCH> validation).
- pr-review REVIEW_INCOMPLETE registry grammar fix: the row's trailer
  used `<CYCLE_FAILED_AGENTS>` but the sentinel is owned by Step 7 (alt 2)
  — Local-only mode, NOT the watcher — and emits using `<FAILED_AGENTS>`.
  Registry now matches the actual emission.

MEDIUM:
- pr-fix Step 11 conditional leading line: on the empty-list early-exit
  path the prose previously said "fixes applied and pushed" even though
  no commit was made. Now picks between two leading lines based on N>0.
- pr-fix RECONCILE_FAILED grammar disambiguated to a single summary form:
  `threads [<id>, <id>, ...] in unknown state.` — one sentinel per run,
  matching the pattern of every other sentinel in the skill.
- pr-fix watcher Step 2 now explicitly checks `git status` exit code
  separately from the captured stdout (corrupted index / lock contention
  no longer silently passes the clean-tree gate). Dirty-tree sentinel
  output truncated to a count + first-path summary so it stays a single
  grep-able line.
- pr-fix watcher Step 2 also detects orphan watcher stashes from a prior
  crashed cycle (`git stash list --format='%gs' | grep -E 'pr-fix watcher
  : lint-aborted cycle'`) and emits WATCH_TRANSIENT_ERROR with manual-
  resolution guidance.
- pr-fix watcher Step 7b now checks `git stash drop` exit code and emits
  a degraded sentinel when drop fails (stash retained at stash@{0} —
  drop failed: <stderr>) instead of falsely claiming "stashed-and-
  dropped".
- pr-fix watcher Step 7b documents the recovery procedure for inspecting
  a discarded fix from the reflog (`git fsck --unreachable | grep ...`
  + `git stash apply <sha>`).
- pr-review --fix stale-stash check anchored with `git stash list --format
  ='%gs' | grep -Fxq 'pr-review --local --fix safety stash'` (whole-line
  fixed-string match) so a user-named stash that merely *contains* the
  substring no longer false-positives. Plus a documented escape hatch
  via PR_REVIEW_FIX_BYPASS_STALE_STASH=1 env var for legitimate-collision
  cases.
- pr-review Step 4 baseline classification: moved docs/jsdoc-style.md
  from "Always read" to "Conditional baseline" (the trigger language was
  already conditional). Merged the two split Conditional subsections
  into one (SECURITY.md + jsdoc-style.md + docs/tibs/TEMPLATE.md).
  Worked example and outside-packages note updated to reflect the new
  structure.
- Sentinel grammar registry stability/versioning policy added to BOTH
  files: declares the registries are stable contracts that downstream
  parsers MAY rely on; describes which changes are patch/minor (additive
  only) vs major (renames/removals/value-space narrowing — require the
  4-step deprecation flow from AGENTS.md §7).

LOW:
- pr-review L1032 cross-registry symmetry: WATCH_PR_CLOSED moved from
  /pr-fix's "own set" to the "shared" parenthetical (it's defined in
  BOTH registries with identical semantics, matching pr-fix's reciprocal
  classification).
- pr-fix placeholder convention (L46) and report-template exemption list
  (L776): dropped stale `<S>`, added `<SK>` and `<ST>` to match the
  iter-4 bucket-label rename.
- pr-fix L389 drift-check comment: trailing period dropped for parity
  with pr-review's drift comment style.
- pr-review L963: bare $CYCLE_LAST_REVIEWED_RAW → ${CYCLE_LAST_REVIEWED_
  RAW} for parity with the documented brace convention.
- pr-fix Step 5f gained an empty-list smoke-test recipe (PR with no
  unresolved threads → expect Sentinel: RECONCILE_OK with 0 counts +
  Sentinel: FIX_DONE_PR with ci=NA + the conditional Step 11 leading
  line).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ty, smoke tests

HIGH:
- pr-fix watcher Step 2 Pre-condition B (orphan-stash detection) was silently
  swallowing `git stash list` failures via `|| true`. Now splits into two
  steps: capture `CYCLE_STASH_LIST` with explicit exit-code check, then grep
  the captured value. Also reports the count of orphan stashes instead of a
  vague "an orphan exists" message and instructs the user to drop EACH ref
  individually.
- pr-fix watcher Step 7b (lint-failure stash-and-drop) was pointing the
  user at the unstable positional ref `stash@{0}` for recovery — that ref
  shifts whenever any other stash is created, so the suggested
  `git stash drop stash@{0}` could destroy the wrong stash. Now captures
  the stash COMMIT SHA via `git rev-parse stash@{0}` BEFORE the drop
  attempt and emits that stable SHA in both the success and degraded
  WATCH_LINT_FAILED sentinels. Recovery instruction is now
  `git stash drop <SHA>` which works regardless of positional shift.

MEDIUM:
- Sentinel grammar registry stability policy: 4-step deprecation flow had
  steps 2/3 swapped vs AGENTS.md §7. Restored canonical order: introduce
  successor → mark @deprecated → maintain BOTH for one minor → remove in
  next major.
- Sentinel grammar registry now has an "Effective from this commit forward"
  date anchor and a "Shell requirement" clause stating the watcher prompt
  uses bash-only constructs (rules out POSIX-portable rewrites that would
  break the whitespace-stripped guard introduced in iter-5).
- Stability policy parity: the PENDING_TIMEOUT example now appears in
  both pr-review and pr-fix Major bullets (was only in pr-review).
- pr-review Step 9 watcher Step 6 (READ PROJECT CONTEXT) restructured to
  mirror the main flow's Step 4 merged "Conditional baseline" (items 5-7
  for jsdoc-style.md / SECURITY.md / TEMPLATE.md), instead of having
  jsdoc-style.md inlined into the Always-read line and TEMPLATE.md alone
  in a separate Conditional/adaptive line.
- pr-fix Step 11 conditional leading line now covers a third terminal
  state: N>0 actionable comments but ALL confidence-gated to LOW (no
  fix landed). Step 6 must track FIXES_APPLIED + ACTIONABLE_COUNT counters
  to disambiguate. Three branches: Normal / Empty-list / Triage-skipped-all.
- pr-fix empty-list smoke-test recipe now caveats: Step 3 may push a merge
  commit before Step 4 short-circuits, in which case HEAD has moved and
  the recipe's exact-output assertions don't apply. Pre-condition: PR must
  be cleanly mergeable.
- pr-review --fix stale-stash check: bypass UX gates the warning prose on
  the bypass env var (an opted-in user no longer sees the scary
  "stale stash detected" warning before the bypass confirmation), and
  surfaces the count + list of matching stash refs (instead of generic
  "an entry exists").
- WATCH_FIX_DONE now carries a ci= field for grammar parity with
  FIX_DONE_PR. Same value space (PASS / FAIL / PENDING / PENDING_TIMEOUT
  / NA) so a single regex parses either.

LOW:
- RECONCILE_FAILED grammar switched from `[<id>, <id>, ...]` (bracket+comma,
  shell-hostile) to `<N> threads in unknown state: <id1> <id2> <id3>.`
  (space-separated). Aligns with the registry's prevailing field-with-
  count grammar and is friendly to ad-hoc `for id in ...` recovery loops.
- Sentinel registry deprecation marking convention added to both files:
  strikethrough the sentinel name + append `(deprecated, removal: ...;
  superseded by ...)` to the trailer cell. Includes a hypothetical row
  example so authors have a concrete template.
- New "Smoke tests" section added to BOTH files covering the iter-5/iter-6
  behaviors that previously had no recipe: whitespace-only ASSEMBLED_PROMPT
  guard, orphan-stash detection, degraded WATCH_LINT_FAILED, RECONCILE_FAILED
  summary grammar, WATCH_FIX_DONE ci= field, stale-stash check + bypass env
  var, stash-pop conflict path.
- pr-review Usage region gained an "Environment variables" subsection
  documenting `PR_REVIEW_FIX_BYPASS_STALE_STASH=1` (previously only
  discoverable from inside the Step 7 alt 2b stash check).
- pr-fix watcher's report-template list (~L792) re-aligned with the
  top-of-file Computed-at-runtime placeholder table — now includes <M>
  and <H> (previously omitted) so the two lists are in sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… executable smoke tests

HIGH:
- pr-review stale-stash check was silently broken: the awk parser at
  Step 7 alt 2b Check 1 split git stash list output on space and stripped
  the leading %gd token but did NOT strip the "On <branch>: " prefix that
  git stash auto-prepends to %gs. The exact-equality check therefore
  compared "On main: pr-review --local --fix safety stash" against the
  bare canonical subject and never matched. The whole safety check has
  been a no-op. Rewrote to use --format='%gd%x09%gs' (tab-separated, no
  OFS-collapse hazard) and explicitly strip the "On <branch>: " prefix
  in awk via sub(/^On [^:]+: /, "", subj). Also: defensive
  ${MATCHING_STASH_COUNT:-0} comparison, while-read loop instead of
  unquoted $MATCHING_STASH_REFS expansion, recovery instructions now key
  on commit SHA (positional refs shift between drops).
- WATCH_FIX_DONE ci= field referenced CI_POLL_TIMED_OUT, but that flag
  is set only inside the main-flow Step 10 poll loop; the watcher's
  Step 4 is a one-shot status snapshot with no poll. PENDING_TIMEOUT
  could never legitimately appear in WATCH_FIX_DONE. Restricted the
  watcher's ci= value-space to PASS / FAIL / PENDING / NA (a strict
  subset of FIX_DONE_PR's). Documented the difference inline at Step 7h
  and in the registry row.
- Step 11's "Triage-skipped-all" leading-line case (added in iter-6)
  required Step 5/6 to track FIXES_APPLIED and ACTIONABLE_COUNT counters,
  but Step 5/6 never set them. Added explicit "set ACTIONABLE_COUNT = ..."
  to Step 5d (just before the bucket summary) and "set FIXES_APPLIED = 0"
  + "increment FIXES_APPLIED by 1 on each successful Edit + biome-clean
  validate + push" to Step 6 entry. The third leading-line case is now
  implementable.
- Smoke-test recipes were non-executable as written. Fixed:
    * Orphan-stash recipe required uncommitted changes (undocumented
      pre-req) — now creates /tmp/orphan-trigger first.
    * Degraded-WATCH_LINT_FAILED recipe ran chmod -w .git/refs/stash
      BEFORE the first stash push (file doesn't exist yet → chmod fails
      silently → drop succeeds → recipe produces a false PASS). Now
      bootstraps the refs with a throwaway stash push+drop, THEN chmod.
      Also documents macOS APFS / packed-refs caveat with chflags
      uchg fallback.
    * RECONCILE_FAILED recipe required forking the skill prompt — heavy.
      Replaced with a lighter alternative: post a comment from a fresh
      bot account whose login isn't in the recognized-author allowlist.
    * WATCH_FIX_DONE ci= recipe was scoped to "exactly ONE match per
      cycle" but greppped a multi-cycle log. Now requires single-cycle
      stdout capture and adds 5 sub-cases for the value-space mapping
      (PASS / FAIL / PENDING / NA — PENDING_TIMEOUT explicitly noted as
      not in the watcher's value space).
- New "Triage-skipped-all" smoke-test recipe added covering the third
  Step 11 leading-line branch.

MEDIUM:
- Stability policy 4-step flow's "Status column" reference was a typo
  (registry tables have no Status column — convention is cell-based).
  Both files now reference "the cell-based marking convention below"
  with a pointer to the strikethrough + trailer-cell append rule.

LOW:
- pr-fix Usage gained an "Environment variables" subsection documenting
  that no env vars are read today, for symmetry with /pr-review's
  equivalent subsection.
- "Effective from this commit forward" anchor expanded to include the
  date (2026-05-04) and a `git log --diff-filter=A` recipe so a future
  reader can pin the registry's introduction commit.
- Deprecation example row wrapped in <!-- BEGIN-EXAMPLE-DO-NOT-SHIP -->
  / <!-- END-EXAMPLE-DO-NOT-SHIP --> HTML comments to make accidental
  copy-paste-into-prod loud-fail visible in raw view.
- Orphan-stash sentinel (pr-fix Step 2 Pre-condition B) now emits a
  single-line summary (count + first match) for the sentinel itself,
  with the full multi-line list going to stderr — preserves the
  "single grep-able line" invariant. Recovery instruction switched to
  by-SHA (git rev-parse <ref> first, then git stash drop <sha>) for
  positional-ref-shift safety.
- RECONCILE_FAILED grammar gained a 20-ID line-length cap with
  "(+<N-20> more)" overflow indicator + full list to stderr.
- Stash-pop conflict recipe in pr-review converted from prose pseudocode
  to executable bash (with $F and $V_FIX as named pre-conditions and an
  explicit cleanup block).
- Whitespace-only ASSEMBLED_PROMPT recipe duplicated verbatim across
  both files now has a "DRIFT-CHECK: keep in sync with /pr-fix
  smoke-tests" / mirror reminder; recipe is self-contained (inlines
  GUARD as an env-passed snippet instead of "<paste the pre-flight
  guard>" indirection).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ater

The previous 7 iterations of additive fixes pushed both files past 1180 lines
each. User explicitly asked to "not overcomplicate" — this iteration removes
content that wasn't earning its keep. One real bug fixed; the rest is deletion.

CONFUSION fix:
- pr-review CI mode: the verdict table allowed only APPROVE/REQUEST_CHANGES
  but the agent-failure path defaulted to COMMENT (not in the table).
  Simplified to: on agent failure, never APPROVE — REQUEST_CHANGES with the
  WARNING line so a human resolves it. No third event variant needed.

DROP — over-engineered sections removed:
- "Smoke tests" sections from both files (~196 lines): theater recipes
  requiring exotic preconditions (`chmod -w .git/refs/stash` + chflags
  uchg fallback, fresh GitHub bot accounts, draft PRs with 5+ minute CI
  fixtures, custom GraphQL author injection). These were written but never
  run. Removing them does not reduce the safety of any guarded path —
  the guards are still in the watcher prompt where they execute on real
  invocations.
- Stability policy + Shell requirement + Deprecation marking convention +
  hypothetical example row (~40 lines per file): speculative governance
  for downstream parsers that don't exist. Replaced with a single sentence:
  "Sentinel names and trailer field order are part of the contract —
  rename or reorder fields only with a deprecation note in the table.
  Cross-skill changes must land in both registries in the same PR."
- pr-fix Step 11 third leading-line case (Triage-skipped-all) + the
  FIXES_APPLIED / ACTIONABLE_COUNT runtime counters threaded through
  Steps 5d / 6 / 11: collapsed into two cases (fix pushed / no fix pushed).
  The body's existing "Skipped: <M> findings (see thread replies for
  reasons)" line spells out the cause — no separate counter needed.
- PR_REVIEW_FIX_BYPASS_STALE_STASH env var + "Environment variables"
  Usage subsection in pr-review (~15 lines): a knob nobody sets. The
  bypass logic existed for a hypothetical user who deliberately named
  their stash with the skill's exact internal subject string. If the
  stale-stash check ever false-positives, the recovery is one shell
  command (`git stash drop <ref>`).
- pr-fix "Environment variables" symmetry-only subsection: a header
  announcing it had no content.
- pr-fix Step 9.5 RECONCILE_FAILED line-length cap with `(+<N-20> more)`
  overflow: triggered only on PRs with 20+ simultaneously-defective
  threads — a state that doesn't happen in normal use.
- Inline `# DRIFT-CHECK: keep in sync ...` marker comments in markdown
  prose: nobody greps markdown files for HTML-style code-comment markers.
  The cross-skill drift information is already in the surrounding prose.
- pr-fix Step 12's standalone "Failure handling discipline" subsection
  + pr-review Step 9's equivalent: the requirement was restated verbatim
  inside the watcher prompt body just below. Dropped the duplicate
  subsections.
- pr-review duplicate "Caveat (idempotency boundary)" paragraph at the
  top of file (line 62): duplicate of the operative caveat inside Step 7
  (alt 2) at line 747.

Net: pr-review.md 1189 → 1065 lines (-124); pr-fix.md 1188 → 1028 lines (-160).
Total -284 lines. Functional behavior unchanged for normal invocations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ watcher silent failures

After iter-8's 284-line simplification, two orphan references and several real
silent-failure paths surfaced. Fixed:

HIGH:
- pr-review L761 had a trailing clause "we gate the warning prose on the bypass
  env var so an opted-in user doesn't see the 'stale stash' warning before the
  bypass confirmation" — orphan from iter-8's deletion of
  PR_REVIEW_FIX_BYPASS_STALE_STASH. The surviving code unconditionally aborts
  with FIX_ABORTED. Dropped the trailing clause.
- pr-fix Step 5f empty-list smoke-test recipe asserted the pre-iter-8 leading
  line "PR #<PR> had no actionable comments — no commit, no push." but iter-8
  collapsed Step 11 to "PR #<PR_NUMBER> — no commit, no push." (no "had no
  actionable comments" phrase). Updated the recipe's expected text.

MEDIUM (real silent-failure paths):
- WATCH_FIX_DONE ci= value space narrowed from PASS/FAIL/PENDING/NA to
  FAIL/NA/UNKNOWN. Watcher Step 4 captures only ${CYCLE_FAILED_CHECKS} (failed
  checks); pending checks are NEVER queried. The PASS and PENDING branches
  were therefore unreachable as written. Replaced with UNKNOWN (fix pushed
  AND no failed checks, but the watcher does not distinguish PASS vs PENDING
  — the next 2-minute cycle will pick up any state change). FIX_DONE_PR
  retains its full PASS/FAIL/PENDING/PENDING_TIMEOUT/NA value space because
  the main flow's Step 10c does poll with timeout.
- Watcher Step 3 ambiguous-conflict path was a silent failure: after
  `git merge --abort` the cycle continued to Step 6, where it could reach
  WATCH_FIX_GREEN despite the unresolved base-branch conflict (false-green
  signal). Added CYCLE_MERGE_AMBIGUOUS flag in Step 3; Step 6 now blocks
  WATCH_FIX_GREEN when the flag is set and emits WATCH_TRANSIENT_ERROR
  identifying the ambiguous-conflict files instead.
- Watcher Step 4 retry-exhaustion: the "max 2 retries" rule had no documented
  fallthrough. Added explicit "if both retries fail, stop, fall through to
  step 5 with ci=FAIL; next cycle re-evaluates" so the user understands
  retries don't loop indefinitely.

LOW:
- pr-review registry rows for REVIEW_CLEAN / REVIEW_INCOMPLETE attributed
  owning-step only to "Step 7 (alt 2)" but the same sentinel strings are
  also emitted in Step 7 (alt) Local PR mode body (line 674). Updated rows
  to mention both.
- pr-fix Step 11 empty-list path now has a defensive `commit=unknown`
  fallback if Step 2b's SHA capture is unset (instead of leaking an
  unrendered placeholder).
- pr-fix WATCH_TRANSIENT_ERROR registry row now notes that permanent
  failures (branch-protection rejection on `git push`, expired auth)
  flow through this same sentinel and will repeat every cycle until the
  watcher expires or the user runs CronDelete — so users know to watch
  for repeated identical sentinels.

Net: +14/-14 lines (in-place fixes, no new content of substance).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits the unified /pr-review skill into three focused entry points and lifts
maintainer-facing material out of every-invocation runtime prompts. Net effect:
~60% smaller per-invocation prompt for pr-review, single sentinel registry,
no more stash dance, no more vestigial Codex pass.

New structure:

  .agents/
    commands/
      AGENTS.md             ← maintainer notes (sentinel registry, drift
                              conventions, deprecation flow, bash requirement);
                              loaded ZERO times per user invocation.
      pr-review-ci.md       ← CI verdict mode (APPROVE / REQUEST_CHANGES,
                              CLAUDE_REVIEW_COMPLETE markers, Guidelines
                              Compliance checklist).        168 lines.
      pr-review-gh.md       ← Local PR mode + --watch (COMMENT event, watcher
                              cycle that references base Step 5 instead of
                              duplicating agent definitions).  196 lines.
      pr-review-local.md    ← Pre-PR local review + --fix (terminal-only
                              output; --fix REFUSES on dirty tree — no stash
                              dance).                          190 lines.
      pr-review.md          ← DELETED (was 1065 lines).
      pr-fix.md             ← cross-refs updated; sentinel registry section
                              now points at .agents/commands/AGENTS.md.
    lib/
      pr-review-base.md     ← shared Steps 3-6 (diff retrieval, project
                              context discovery, 7 SDK-specialized review
                              agents, aggregation). The three pr-review-*
                              skills delegate to this file.    250 lines.

Per-invocation runtime prompt cost (entry-point + base):
  pr-review-ci:    418 lines (was 1065 — 60% reduction)
  pr-review-gh:    446 lines (was 1065 — 58% reduction)
  pr-review-local: 440 lines (was 1065 — 59% reduction)

Simplifications applied alongside the split (per the user's "where they've
outgrown the medium" critique):

- Codex pass removed from all three pr-review-* skills. The previous
  fire-and-forget `codex exec` background invocation was vestigial — no
  output integration with Step 6's aggregator, optional, no dedup with
  Claude's findings. If a real Codex co-reviewer is wanted later, build it
  as Agent 8 in pr-review-base.md Step 5.
- /pr-review-local --fix REFUSES on dirty tree (mirrors /pr-fix Step 2a).
  The previous "stash discipline" — stale-stash awk parser, in-memory
  revert source, stash-pop conflict sentinel, FIX_DONE_WITH_STASH_CONFLICTS
  branch — was ~80 lines of plumbing handling the 3-condition edge case
  (uncommitted work + crashed prior run + lint rejection). Refusing on
  dirty tree collapses the entire dance: user runs `git stash push -u`,
  invokes `--fix`, then `git stash pop`. Skill stays out of stash mgmt.
  FIX_DONE_WITH_STASH_CONFLICTS sentinel removed from registry.
- Watcher cycle in pr-review-gh delegates Step 5 (agent fan-out) to
  pr-review-base.md instead of duplicating the 7 agent prompt blocks
  inline. Drift hazard between the main-flow Step 5 and the watcher's
  inline Step 7.a-g is gone — both now point at the same source.
- Sentinel grammar registry consolidated into .agents/commands/AGENTS.md.
  Each entry-point skill no longer carries its own per-skill registry
  table; they reference the central one. Easier to keep in sync, smaller
  runtime prompts.
- Three skills means each one's argument validation is trivial — no more
  three-mode dispatch table at the top of a single file. /pr-review-ci
  refuses if not in CI; /pr-review-gh refuses if in CI; /pr-review-local
  has no PR_NUMBER. Each is unambiguous about its own preconditions.

Cross-skill drift reminders updated:
- /pr-fix Step 6 drift reminder now points at /pr-review-local Step 7b
  (the renamed --fix sub-step).
- /pr-fix Notes pairing block now lists all three pr-review-* skills.

Files outside packages/ baseline reference now points at .agents/lib/
pr-review-base.md (the canonical shared Step 4) instead of the deleted
unified /pr-review Step 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…try → AGENTS.md

Mirrors the iter-10 split for pr-fix: extract the per-category routing
(the actual contract) into .agents/lib/pr-fix-routing.md, and lift the
duplicated sentinel registry + smoke-test recipes into the maintainer-only
.agents/commands/AGENTS.md. The pr-fix watcher's Step 7 now references the
lib instead of inlining a parallel copy of the routing.

New files:
- .agents/lib/pr-fix-routing.md (76 lines) — routing table (8 rows × exact
  reply text × resolve-yes/no), one reply-mechanism heredoc template, one
  resolve-mechanism graphql mutation, output contract (bucket counts).
  Single source of truth for both main flow Step 9 and watcher Step 7.

Changes:
- pr-fix.md Step 5f collapsed to a 5-line pointer ("read .agents/lib/
  pr-fix-routing.md and treat its routing table as the contract for
  Step 9 below"). The 8-row routing table + ~30 lines of smoke-test recipe
  + ~15 lines of empty-list recipe — all gone from runtime.
- pr-fix.md Step 9 collapsed from ~110 lines (six per-category bash
  blocks: 9a, 9b, 9c-skipped, 9c-question, 9c-discussion, 9c-praise,
  9c-already, 9c-stale) to ~10 lines that defer to the lib.
- pr-fix.md Step 12 watcher Step 7f collapsed from ~25 lines (per-category
  bullets + heredoc template + per-category "NEVER reference
  ${CYCLE_HEAD_SHA}" notes) to ~5 lines that defer to the lib with the
  cycle-substitution rule. Step 7g (resolve mutation) is now part of the
  lib; the standalone "g." substep is gone — what was step h (CI verdict
  + sentinel) becomes the new step g.
- pr-fix.md "Drift reminder" subsection of Step 12 deleted. The drift
  hazard the sticker existed to mitigate (parallel routing implementations
  in main flow vs watcher) is gone now that both reference the lib.
- pr-fix.md local sentinel registry table (12 rows × 5 columns) deleted
  in favor of a one-line pointer to .agents/commands/AGENTS.md. The local
  table was a duplicate maintained by hand against the canonical one.
- AGENTS.md gained a "Smoke-test recipes (maintainer-only, not loaded
  at runtime)" section with the per-category routing recipe (~30 lines)
  and the empty-list early-exit recipe (~15 lines) lifted from pr-fix.md.

Net: pr-fix.md 1028 → 846 lines (−182). Lib (76 lines) + AGENTS.md
content (+33 lines) compensates for some of that, but the runtime-prompt
cost dropped substantially:

  /pr-fix invocation runtime:        846 + 76 = 922 lines (was 1028)
  /pr-fix --watch invocation runtime: same (lib loaded once)
  AGENTS.md (maintainer-only):       loaded ZERO times per user invocation.

The drift hazard between main-flow Step 9 and watcher Step 7 — the smell
the iter-9 review flagged — is gone. Both paths now resolve through the
same lib, the same way the three /pr-review-* skills resolve through
.agents/lib/pr-review-base.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the centralized governance files introduced in the prior two iterations
(AGENTS.md sentinel registry + conventions, pr-fix-routing.md). Each skill now
owns its own sentinel grammar inline, and pr-fix's routing lives in the file
that uses it (with the watcher referencing main-flow Step 9 directly instead
of an external lib).

Deleted:
- .agents/commands/AGENTS.md — was a master file (sentinel registry across all
  skills, conventions, smoke recipes). Replaced by per-skill inline registries
  covering only what each skill emits.
- .agents/lib/pr-fix-routing.md — was a one-skill helper masquerading as a
  library. The watcher and the main flow are halves of the SAME skill, not
  separate consumers — a lib doesn't earn modularity here, it just adds an
  extra Read per cycle. Routing table + reply mechanism + resolve mechanism +
  per-category sub-steps (9a / 9c-skipped / 9c-question / 9c-discussion /
  9c-praise / 9c-already / 9c-stale) restored inline in pr-fix.md Step 5f
  and Step 9. Watcher Step 7f now reads "follow main-flow Step 9, substituting
  ${CYCLE_HEAD_SHA} for <HEAD_SHA>" — single source of truth via cross-section
  reference within the same file, no external lib.

Smoke-test recipes (per-category routing + empty-list early-exit) deleted
entirely. They were theater per iter-8 — never actually run, required exotic
preconditions (fresh GitHub bot accounts, draft PRs with 5-minute CI, chmod
on .git internals). The maintainer-only-not-runtime classification was the
right idea but the recipes themselves don't earn the bytes anywhere.

Per-skill sentinel grammar restored:
- pr-fix.md sentinel registry covers only its own sentinels (RECONCILE_OK,
  RECONCILE_FAILED, FIX_DONE_PR, WATCH_REJECTED, WATCH_TRANSIENT_ERROR,
  WATCH_PR_CLOSED, WATCH_FIX_GREEN, WATCH_LINT_FAILED, WATCH_FIX_DONE).
- pr-review-ci.md adds an inline grammar table (one row: REVIEW_DONE_PR).
- pr-review-gh.md adds an inline grammar table (8 rows: REVIEW_CLEAN,
  REVIEW_INCOMPLETE, REVIEW_DONE_PR, WATCH_REJECTED, WATCH_TRANSIENT_ERROR,
  WATCH_PR_CLOSED, WATCH_REVIEW_CLEAN, WATCH_REVIEW_DONE).
- pr-review-local.md adds an inline grammar table (5 rows: REVIEW_CLEAN,
  REVIEW_INCOMPLETE, REVIEW_DONE_LOCAL, FIX_DONE_LOCAL, FIX_ABORTED).

Each skill's grammar is now next to its emit sites — no master registry to
drift against.

What's kept:
- .agents/lib/pr-review-base.md — three distinct skills (pr-review-ci, -gh,
  -local) depend on it for Steps 3-6 (diff, project context discovery, the
  7 SDK-specialized review agents, aggregation). That's a real cross-skill
  library boundary, not a master file. Say if you want this inlined too.

Net runtime cost per invocation:
- /pr-review-ci:    172 + 250 = 422 lines (+4 vs prior)
- /pr-review-gh:    207 + 250 = 457 lines (+11 — gained inline registry)
- /pr-review-local: 198 + 250 = 448 lines (+8 — gained inline registry)
- /pr-fix:          905 lines (was 846 — gained inline routing + registry)
- /pr-fix --watch:  same single file load (was 846 + 76 lib = 922; now 905,
  watcher refers to in-file Step 9 instead of external file)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s + redundancy

Five findings from /ben-local-review on the iter-12 master-file removal commit.
Three orphan-bullet leftovers + one redundant inline block + one cross-skill
sentinel-trailer drift.

HIGH:
- pr-fix.md L777 stale `Step 7h` cross-reference. The watcher's terminal
  sentinel was renumbered to Step 7g when the lint-failed sub-step was inlined,
  but a "(so Step 7h emits ci=FAIL)" pointer in Step 4c still referenced the
  old name. Updated to Step 7g.
- pr-review-gh.md Notes — two orphan bullets that survived iter-12's
  "no master/god files" sweep:
  - "No Codex pass: previous iterations included an optional fire-and-forget
    codex exec invocation. Removed in the iter-10 split — it was vestigial."
  - "Cross-skill watcher considerations: if a user runs /pr-review-gh <PR>
    --watch AND /pr-fix <PR> --watch together..."
  Both are runtime archaeology / governance content, exactly the shape of the
  lines that DID get removed. Deleted.
- pr-review-local.md Notes — surviving "Drift reminder for --fix mechanics"
  bullet (cross-skill governance reminder). Same pattern, also deleted.

MEDIUM:
- pr-fix.md Step 9 had a "Per-category sub-steps" bullet list (9a / 9c-skipped
  / 9c-question / 9c-discussion / 9c-praise / 9c-already / 9c-stale) that
  restated in prose what the Step 5f routing table just specified in tabular
  form. Pure repetition. Deleted; the Step 5f table + the reply/resolve
  mechanism + the "Resolves fire for: ... Leave unresolved for: ..." summary
  line above it are sufficient.
- pr-review-gh.md REVIEW_INCOMPLETE trailer was missing the (<names>) field
  that pr-review-local.md has. Same sentinel name = same trailer is the
  contract. Added (<names>) to gh's main-flow body, watcher prompt body,
  and registry-table row so a downstream parser greping
  `Sentinel: REVIEW_INCOMPLETE — \d+ of 7 agents failed \(<names>\)`
  matches both skills.

Net: −17 / +4 lines. pr-fix.md 905 → 895; pr-review-gh.md slightly slimmer
on the bullet deletion and slightly fatter on the (<names>) addition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0xbulma 0xbulma requested a review from Foulks-Plb May 5, 2026 09:13
@0xbulma 0xbulma marked this pull request as ready for review May 5, 2026 09:13
@0xbulma
Copy link
Copy Markdown
Collaborator Author

0xbulma commented May 5, 2026

Self-review (iteration history + structural notes)

This PR landed via 13 review-and-fix iterations. The first commit was a faithful port from the personal /ben-pr-{review,fix} skills; everything after was refining for this monorepo. Posting the journey here for reviewers who want the "why" behind the structure, and for future maintainers.

Phases

Phase 1 — Initial port (iter-1). Took the personal /ben-pr-review and /ben-pr-fix skills and merged them with the SDK-specific /pr-review and /pr-fix that were already in the repo. Hybrid: kept the 7 SDK-specialized review agents (Module & API Architecture, Web3 Security, Documentation Analyzer, etc.) as the project's baked-in agent set; adopted ben-*'s structural improvements (frontmatter-less, severity scheme normalized to Critical/High/Medium/Low, adaptive project context detection, watcher SHA fallback). Also added references to the project's authoritative AGENTS.md (root, canonical) + MISSION.md + per-package AGENTS.md/README.md/ARCHITECTURE.md. Added --local to /pr-review matching /ben-pr-local-review.

Phase 2 — Dogfooding rounds (iter-2 through iter-7). Ran /pr-review --local on each commit's diff and applied the findings. Caught real bugs: stale-stash awk parser silently failing because it didn't strip the On <branch>: prefix (the entire safety check was a no-op for several iterations); ${VAR}=$(...) invalid bash syntax peppered through both watcher prompts (would die on cycle 1 if interpreted literally); <[A-Z_]+> pre-flight regex false-positive on legitimate output template tokens; WATCH_FIX_DONE ci=PENDING branch unreachable because Step 4 never queries pending checks; sentinel format drift between modes; cycle-derived placeholders missing the CYCLE_ prefix in pr-review while pr-fix had them. Each iteration tightened a specific class of silent failure or contract drift.

Phase 3 — Simplification (iter-8). Cut 284 lines of accreted theater after the user pointed out the files had grown to 1180+ lines each. Deleted: smoke-test sections (theater recipes requiring chmod -w .git/refs/stash and fresh GitHub bot accounts — written but never run); a Stability policy + Shell requirement + deprecation example block (speculative governance for parsers that don't exist); a third Step 11 leading-line case + FIXES_APPLIED/ACTIONABLE_COUNT runtime counters (collapsed to two cases); PR_REVIEW_FIX_BYPASS_STALE_STASH env var (knob nobody sets); RECONCILE_FAILED line-length cap (triggers only on 20+ defective threads — never happens); inline # DRIFT-CHECK: markers (nobody greps markdown for them); a duplicated Caveat (idempotency boundary) paragraph.

Phase 4 — Post-simplification cleanup (iter-9). Two HIGH orphan references from iter-8's deletions (a bypass env var clause referencing a deleted env var, an empty-list smoke recipe asserting pre-iter-8 leading-line text). Three real silent-failure paths in the watcher: ambiguous-conflict path could reach WATCH_FIX_GREEN despite unresolved conflicts; Step 4 retry exhaustion had no documented fallthrough; WATCH_FIX_DONE ci= value space narrowed to FAIL/NA/UNKNOWN after discovering Step 4 never captures pending checks (so PASS and PENDING branches were unreachable as written).

Phase 5 — The split (iter-10). Split the unified /pr-review into three single-mode entry points (pr-review-ci, pr-review-gh, pr-review-local) with a shared library (.agents/lib/pr-review-base.md) for Steps 3–6 (diff, project context, 7 agents, aggregation). Each entry point handles only its mode-specific Steps 1–2 and Steps 7+. Plus simplifications: dropped the vestigial Codex pass; replaced --local --fix's ~80-line stash-and-pop discipline with a one-line refuse-on-dirty-tree gate; watcher cycle in pr-review-gh delegates Step 5 to the base instead of inlining a parallel copy of the 7-agent prompt block.

Phase 6 — Symmetric extraction for pr-fix (iter-11). Same trick on /pr-fix: extracted Step 5f routing table + Step 9 reply/resolve mechanisms into .agents/lib/pr-fix-routing.md; watcher Step 7 referenced the lib; sentinel registry + smoke-test recipes lifted into .agents/commands/AGENTS.md (maintainer-only, not loaded at runtime).

Phase 7 — Drop the master files (iter-12). Maintainer feedback: the AGENTS.md registry was a god file, and pr-fix-routing.md was a one-skill helper masquerading as a library (the watcher and main flow are halves of the same skill, not separate consumers). Deleted both. Re-inlined the routing in pr-fix.md Step 5f + Step 9. Watcher now references main-flow Step 9 directly: "follow Step 9, substituting ${CYCLE_HEAD_SHA} for <HEAD_SHA>" — single source within the same file. Each /pr-review-* skill got its own inline sentinel-grammar table covering only what it emits. Smoke-test recipes deleted entirely (theater). Kept .agents/lib/pr-review-base.md because it's a real cross-skill library used by 3 distinct consumers, not a master file.

Phase 8 — Final orphan cleanup (iter-13). Three orphan bullets/cross-references survived iter-12's master-file sweep (a stale Step 7h pointer; "No Codex pass" archaeology; "Cross-skill watcher considerations" governance note; a "Drift reminder for --fix mechanics" sticker). Plus one redundant inline block in pr-fix Step 9 (a per-category sub-step list that restated the Step 5f routing table in prose) and a <names> field missing from pr-review-gh's REVIEW_INCOMPLETE trailer (drifted from pr-review-local).

Final structure

.agents/
  commands/
    pr-review-ci.md       172 lines  ← inline sentinel grammar (1 row)
    pr-review-gh.md       205 lines  ← inline sentinel grammar (8 rows)
    pr-review-local.md    197 lines  ← inline sentinel grammar (5 rows)
    pr-fix.md             895 lines  ← inline routing + reply/resolve + sentinel registry
  lib/
    pr-review-base.md     250 lines  ← 3 distinct skills depend on this; real library

Per-invocation runtime prompt cost vs the previous 1065-line monolith:

  • /pr-review-ci: 422 lines (60% reduction)
  • /pr-review-gh: 455 lines (57% reduction)
  • /pr-review-local: 447 lines (58% reduction)
  • /pr-fix: 895 lines (was 1188 — 25% reduction)

What I'd flag for human review

  1. Test on a fresh session. All 13 self-reviews ran in the same conversation, which means cached context made some checks easier than they'd be cold. The /pr-review-{ci,gh,local} commands say "Read .agents/lib/pr-review-base.md and follow Steps 3–6 there" — verify on a fresh invocation that the model actually does the Read tool call instead of hallucinating Steps 3–6 from training-data analogs.

  2. Watcher pairing semantics. Running /pr-review-gh <PR> --watch AND /pr-fix <PR> --watch together: both crons fire every 2 min, no cross-cron coordination. The pairing works (review posts → fix replies → next cycle picks up new state), but I removed the doc note about this in iter-13's "stop describing absences" round. If the loop semantics bite a user, the right response is to either fix the structure (e.g. a single /pr command with --watch doing both) or add the note back — not to leave a sticker.

  3. pr-review-base.md as a kept dependency. It's 250 lines, depended on by 3 skills. I argued it's a real library (3 consumers, byte-identical Steps 3–6) rather than a master file. If you disagree and want strict per-skill isolation, inlining tripples to ~750 lines added across the three entry points and the contract is still byte-identical — so the maintenance cost is concrete.

  4. /pr-fix watcher and main flow share a routing table inline (no lib). The watcher's Step 7f reads "follow main-flow Step 9, substituting ${CYCLE_HEAD_SHA} for <HEAD_SHA>" — cross-section reference within one file. If a future change to Step 9 forgets to keep the watcher's substitution rule valid, the drift is silent. Both files are short enough that a careful reviewer should catch it; if it bites, extract a real shared piece.

  5. Sentinel grammars are the documented contract. Downstream parsers MAY rely on them. The per-skill inline tables are the source of truth. No central registry to drift against (intentionally).

Stats

  • 13 commits, all signed (Claude Opus 4.7 (1M context) co-authorship).
  • Net diff vs main: ~+870 / −1110 lines depending on which baseline you compare to (the unified /pr-review.md was 1065 lines and is now deleted).
  • 6 files added (3 entry points + 1 library + the 13 commit messages serve as the change log).

🤖 Generated with Claude Code

chatgpt-codex-connector[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Six unique defects fixed across the four skill files (one verdict-table gap was
flagged by both reviewers). All concrete bugs an AI agent following the prompt
literally would hit; all fixes confirm and apply the suggested correction.

- pr-review-ci.md verdict table (Codex P1 + Devin): the "Request Changes" row
  said "Any critical, OR multiple high, OR <FAILED_AGENTS> non-zero" — a PR
  with exactly 1 high and 0 critical and 0 failed agents matched neither
  APPROVE (which requires zero high) nor REQUEST_CHANGES. Changed
  "multiple high" → "any high" so the two rows are exhaustive complements.
- pr-review-ci.md fallback API call (Devin): the fallback `gh api ...issues/
  comments` had no `--method POST` and no body field — `gh api` defaults to
  GET, so the literal command would list comments instead of posting one.
  Added `--method POST -f body="<REVIEW_BODY>"`.
- pr-review-local.md validation table (Devin): row 3 ("Findings + agent crash")
  asserted REVIEW_INCOMPLETE, but Step 7's sentinel logic emits REVIEW_DONE_LOCAL
  for any non-zero findings (with WARNING prepended). Updated the table row to
  match Step 7's actual output and added a separate row for "Zero findings +
  agent crash". Step 7 prose updated to say the WARNING line is prepended in
  the findings + crash case.
- pr-review-local.md same-branch shortcut (Codex P2): the early-exit on
  HEAD_BRANCH == BASE_BRANCH AND clean tree skipped review even when the
  branch had unpushed commits ahead of origin/<BASE_BRANCH> — common case for
  `main` with local commits. Replaced with a real commit-range diff check
  (git diff --name-only "$MERGE_BASE..HEAD") AND a clean-tree check.
- pr-fix.md orphan-stash recovery (Codex P1): the recovery instruction said
  "for each orphan, run `git rev-parse <ref>` to get its SHA, then `git stash
  drop <sha>`". `git stash drop` only accepts stash refs (stash@{N}), not
  commit SHAs — the documented cleanup path failed exactly when orphan
  stashes were detected. Rewrote to instruct dropping by ref highest-index-
  first so positional refs of remaining stashes don't shift.
- Markdown blank-line fix (Devin): `## Sentinel grammar` headings in
  pr-review-{ci,gh,local}.md immediately followed list items with no blank
  line. CommonMark-strict parsers treat that as list continuation, not a
  heading. Added blank lines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…tion

The degraded sentinel emitted when `git stash drop stash@{0}` fails told the
user to run `git stash drop ${CYCLE_LINT_STASH_SHA}`, but that variable is a
commit SHA (from `git rev-parse stash@{0}`). `git stash drop` accepts only
stash@{N} refs, not commit SHAs — line 741's orphan-stash recovery was
already corrected to state this constraint, but the line 804 path was
missed.

Replace the broken `git stash drop <SHA>` suggestion with the same idiom
used at line 741: `git stash list --format='%H %gd' | grep ^<SHA>` to find
the current stash@{N} ref, then drop by ref. Restate the constraint so the
two recovery paths agree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…n on lint failure

Devin (PR #599) flagged a real bug in /pr-review-local --fix's apply-fixes
loop: when fix N+1 on the same file fails lint, `git checkout -- <file>`
reverts the ENTIRE file to HEAD — including fix N which already passed lint
and was tracked as "applied". Counter says `Fixed: X` but `git diff` shows
nothing for that file. Silent destruction.

Replaced the per-finding apply-then-validate loop with a per-file batch:
process all findings on a given file as one unit, run lint once, revert
all-or-nothing. If lint passes the batch, every finding for the file is
applied. If lint fails the batch, the entire file is reverted and every
finding for the file is marked skipped: lint rejected the batch. The user
sees a consistent picture in `git diff` — what's reported as applied is
what's actually on disk.

Same pattern as /pr-fix's watcher Step 7b stash-based revert (revert the
batch as a unit), now consistent across both --fix flows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…cher

Devin (PR #599) caught a critical bug in the pr-review-gh watcher's Step 2
CYCLE_LAST_REVIEWED_SHA jq filter. `null | test("...")` is a fatal jq error,
and any review with `body: null` triggers it (extremely common — a human
clicking "Approve" in the GitHub UI without typing text produces a review
with null body). jq's `or` short-circuits left-to-right: when `.user.login
== $login` is false (any non-bot user), evaluation moves to the right side,
which crashes if `.body` is null. The crash exits jq non-zero, the watcher
sees a non-zero exit, emits WATCH_TRANSIENT_ERROR, and ends the cycle.

Because the null-body review persists in the API response, EVERY subsequent
cycle hits the same crash — the watcher is permanently dead until the
3-day expiry or manual CronDelete. Whole watcher loop is busted by a
single human approval-without-text on the PR.

Devin's fix is correct: null-coalesce body to empty string before test().

  (.body | test("..."))
  →
  ((.body // "") | test("..."))

The old pr-review.md watcher (pre-iter-10 split) only filtered on
.user.login and didn't have this issue — the body-pattern fallback was
introduced in this PR to catch reviews after a bot rename. Keeping the
fallback (it's load-bearing for the bot-rename case) but null-guarding it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0xbulma 0xbulma merged commit 85fc5f8 into main May 5, 2026
19 checks passed
@0xbulma 0xbulma deleted the 0xbulma/upgrade-pr-skills branch May 5, 2026 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants