Skip to content

feat(skills): add review-loop skill with opt-in gates#14

Merged
sourcehawk merged 8 commits into
mainfrom
feature/review-loop-skill
May 30, 2026
Merged

feat(skills): add review-loop skill with opt-in gates#14
sourcehawk merged 8 commits into
mainfrom
feature/review-loop-skill

Conversation

@sourcehawk
Copy link
Copy Markdown
Owner

Description

Adds an opt-in review-loop skill that drives a pull request's automated (Copilot) review toward clean, and wires it into the feature workflow at two default-off gates. Nothing in the workflow currently loops a PR through automated review; this lets a session request Copilot, wait without blocking the session, address the feedback with real rigor, and re-request — only when the user asks for it.

Changes

  • New review-loop skill: request Copilot via gh pr edit --add-reviewer "@copilot" (the REST requested_reviewers endpoint rejects Copilot), detect unavailability, wait in the background, triage every open comment through superpowers:receiving-code-review, address, re-request; terminate when a fresh review is silent and threads are resolved, with a 3-round cap.
  • Background-wait script await-copilot-review.sh (gh + shell builtins only — no new dependency).
  • Two opt-in, default-off gates in developing-a-feature: a per-sub-PR gate (sub_pr_review_loop in the state file, run by fanning-out-with-worktrees at ripening) and a PR-to-main gate before external review. Pushback pauses for the user interactively, or bubbles up to the wave checkpoint in autonomous fan-out.
  • README and project-CLAUDE skill-table rows.

Challenges

The mechanism is the load-bearing part and was easy to get wrong in two places, both verified against live GitHub rather than assumed:

  • Requesting Copilot looks like a requested_reviewers API call, but that endpoint 422s on Copilot — the only working path is gh pr edit --add-reviewer "@copilot" (gh ≥ 2.88.0).
  • Re-requesting (round 2+) is unreliable: once Copilot has reviewed, re-adding it sometimes fires a fresh review (observed: two reviews landing on one commit) and sometimes silently does nothing, and Copilot does not dependably auto-review on push. The skill re-triggers with remove-then-re-add, confirms a genuinely new review by watermark rather than the command's exit code, and stops-and-reports if none arrives — pointing the user at the durable path (a repo ruleset running Copilot review on every push) instead of spinning.

Related

Touches developing-a-feature alongside #14 (single-PR SDD), in a different section; both land review wiring in fanning-out-with-worktrees/developing-a-feature and will need a trivial rebase depending on merge order.

Testing

Built via superpowers:writing-skills (RED → GREEN → REFACTOR) with isolated baseline subagents. RED baselines reached for the wrong (422) request command, polled synchronously holding the session, and looped without an unavailability check. With the skill, isolated agents used the correct command, waited in the background, triaged before applying, detected the unavailable case and exited, and respected the 3-round cap. The re-request fix was driven by a live test on a real PR (confirming re-request is unreliable) and re-verified in isolation: the agent captures a watermark, re-triggers via remove-then-re-add, confirms by watermark, and stops-and-reports on timeout rather than spinning. The wait script's watermark filter was checked against real Copilot-reviewed PRs (finds a newer review, excludes a stale one).

sourcehawk and others added 3 commits May 30, 2026 23:33
Opt-in skill that drives a PR to review-clean by looping: request the
automated (Copilot) review, wait for it in the background so the session
is never held hostage, triage every open comment with the rigor of
receiving-code-review, address the correct ones, and re-request until a
fresh review is silent and threads are resolved. A 3-round cap stops
non-converging loops.

The verified mechanism (gh pr edit --add-reviewer "@copilot"; the REST
requested_reviewers endpoint rejects Copilot) and a gh-and-builtins-only
background-wait script keep the skill within the plugin's superpowers +
gh dependency rule.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wires feature-dev-workflow:review-loop into developing-a-feature at two
opt-in, default-off points: a per-sub-PR gate (recorded as
sub_pr_review_loop in the state file, run by fanning-out-with-worktrees
at each sub-PR ripening before the orchestrator's own review) and a
PR-to-main gate before external review. Pushback is interactive at the
PR-to-main gate and bubbles up to the wave checkpoint in autonomous
fan-out. Adds the README and project-CLAUDE skill-table rows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The loop's round-2+ step assumed re-requesting Copilot works like the
first request. It doesn't: once Copilot has reviewed, re-adding it as a
reviewer is unreliable — sometimes a fresh review fires (even on the same
commit), sometimes nothing happens, and Copilot does not dependably
auto-review on push. Verified against a live PR (two reviews landed on
one commit) and GitHub's own docs/community reports.

Re-trigger with remove-then-re-add, confirm a genuinely new review by
watermark rather than the gh exit code, and STOP-and-report if none
arrives before the wait times out instead of spinning — pointing the
user at the durable path (a repo ruleset running Copilot review on every
push). Also note that a fired re-review may repeat already-addressed
comments, so triage re-verifies against the current diff.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 30, 2026 22:55
…uesting

When a repo auto-reviews Copilot on every push, the push that lands the
round's fixes already triggers a review. The previous step 6 then also
did an explicit remove-then-re-add, producing a second redundant review.

Capture the watermark before pushing, wait for a push-triggered review
first, and fall back to the explicit remove-then-re-add only if that wait
times out. Same command sequence either way; the repo's behavior decides
which branch resolves it, with no double review when auto-review is on.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in review-loop skill to drive PRs through automated Copilot review, and wires that skill into the feature-development workflow for sub-PR and final PR gates.

Changes:

  • Adds the review-loop skill and background polling script for detecting fresh Copilot reviews.
  • Adds default-off review-loop gates to multi-PR fan-out and PR-to-main workflow steps.
  • Updates the project CLAUDE skill table to include the new skill.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
skills/review-loop/SKILL.md Defines the review-loop workflow, Copilot request/re-request mechanics, termination rules, and pushback handling.
skills/review-loop/templates/await-copilot-review.sh Adds a polling helper that waits for a fresh Copilot review newer than a watermark.
skills/developing-a-feature/SKILL.md Adds opt-in review-loop gates for sub-PRs and final PRs.
skills/fanning-out-with-worktrees/SKILL.md Runs the review-loop during sub-PR ripening when enabled.
templates/project-CLAUDE.md Lists the new review-loop skill in the workflow skill table.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +42 to +49
filter="[.[]
| select((.user.login==\"$BOT\") or (.user.login|test(\"[Cc]opilot\")))
| select(.submitted_at > \"$SINCE\")
| {id, state, submitted_at, body}]
| sort_by(.submitted_at) | reverse"

while :; do
found=$(gh api "repos/$REPO/pulls/$PR/reviews" --paginate --jq "$filter")
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 79a30e6. Verified the bug is real (a multi-page no-match did print repeated []). --slurp isn't usable here — gh rejects it together with --jq — so instead the filter now emits each matching review as a bare object rather than wrapping in an outer array. A no-match then prints nothing on every page, so the emptiness check is correct. Confirmed against a real paginated PR: past-watermark finds the reviews, future-watermark across forced single-item pages yields zero bytes.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 79a30e6. Confirmed the bug is real (a multi-page no-match did print repeated []). --slurp isn't usable here — gh rejects it together with --jq — so instead the filter now emits each matching review as a bare object rather than wrapping in an outer array; a no-match then prints nothing on every page and the emptiness check is correct. Verified against a real paginated PR: past-watermark finds the reviews, future-watermark across forced single-item pages yields zero bytes.


Sub-PRs into the feature branch are owned by `feature-dev-workflow:fanning-out-with-worktrees`, not this step.

Once the PR to main is open — the single-PR PR, or the multi-PR integration PR — ask the user (via `AskUserQuestion`) whether to run an automated review-loop on it before handing off for external human review. If yes, **OPTIONAL SUB-SKILL:** `feature-dev-workflow:review-loop` against this PR; it drives the PR's automated (Copilot) review to clean. This is the interactive context, so a comment the loop wants to push back on pauses for the user. If no, hand off as-is.
sourcehawk and others added 2 commits May 31, 2026 01:08
…oller

Address Copilot review on #14. Under set -euo pipefail a transient gh/network
failure in the poll loop killed the script with gh's exit code, which the caller
could not distinguish from the 124 timeout. Guard the poll with || true so a blip
or rate-limit just retries next interval; only a real timeout exits 124.

Also document why the submitted_at watermark comparison is a plain string compare
(GitHub always returns UTC ISO-8601 Z timestamps, whose lexical order is
chronological) rather than converting to epoch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two issues Copilot flagged, both verified against real PRs before fixing:

- Wait poller false positive across pages. `gh api --paginate --jq` runs
  the filter once per page and concatenates, so an array-wrapped filter
  emitted `[]` per page — a multi-page no-match printed `[]\n[]…`, which
  is non-empty and != `[]`, read as "review found". (`--slurp` is not an
  option: gh rejects it together with `--jq`.) The filter now emits each
  matching review as a bare object, so a no-match prints nothing on every
  page and the emptiness test is correct. Documented why the watermark
  stays a string comparison (GitHub returns fixed-width UTC ISO-8601 Z).

- Stale review commit at Gate B. The loop ran at PR-open, but Step 7's
  teardown is the last commit on the branch, so a clean review before
  teardown is stale against what external reviewers see. Gate B now runs
  the loop on the final PR diff — deferred until after the teardown commit
  (or re-run), and when CI gates teardown, once that commit is green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 30, 2026 23:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment on lines +48 to +52
filter="[.[]
| select((.user.login==\"$BOT\") or (.user.login|test(\"[Cc]opilot\")))
| select(.submitted_at > \"$SINCE\")
| {id, state, submitted_at, body}]
| sort_by(.submitted_at) | reverse"
@@ -0,0 +1,129 @@
---
name: review-loop
sourcehawk and others added 2 commits May 31, 2026 01:37
…fixes

My earlier commit (8b98754) left the wait poller broken — it kept the
array-wrapped jq filter but dropped the != "[]" guard, so the first poll
always false-positived. This rebuilds the poller and properly lands the
two issues Copilot flagged on #14, both verified against a real PR:

- Paginated false positive. gh runs the --jq filter once per page under
  --paginate and concatenates (and --slurp is rejected with --jq). The
  filter now emits each matching review as a bare object instead of an
  array, so a multi-page no-match prints nothing rather than repeated
  empty arrays.

- Error bodies no longer mistaken for reviews. On an HTTP error gh prints
  the error JSON to stdout and exits non-zero; output is now captured only
  when gh succeeds, so a 404/rate-limit/blip leaves the result empty and
  the loop retries instead of false-positiving. (Found while testing.)

- Stale review commit at Gate B (developing-a-feature). The loop ran at
  PR-open, but Step 7's teardown is the last commit on the branch, so a
  clean review before teardown is stale against what external reviewers
  see. Gate B now runs the loop on the final PR diff.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…kill

# Conflicts:
#	skills/fanning-out-with-worktrees/SKILL.md
Copilot AI review requested due to automatic review settings May 30, 2026 23:48
@sourcehawk sourcehawk merged commit 4c1dc56 into main May 30, 2026
1 check failed
@sourcehawk sourcehawk deleted the feature/review-loop-skill branch May 30, 2026 23:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants