Skip to content

feat(ci): PR review fleet auto-routing and confirmation flow#27107

Merged
tylerbutler merged 36 commits intomicrosoft:mainfrom
tylerbutler:feat/pr-review-auto-router
Apr 23, 2026
Merged

feat(ci): PR review fleet auto-routing and confirmation flow#27107
tylerbutler merged 36 commits intomicrosoft:mainfrom
tylerbutler:feat/pr-review-auto-router

Conversation

@tylerbutler
Copy link
Copy Markdown
Member

@tylerbutler tylerbutler commented Apr 20, 2026

Description

Adds two layers on top of the existing PR review fleet: automatic fleet sizing on PR open, and an opt-in proposal comment so the author can adjust the reviewer set before anything runs. Removes the friction of picking the right label for every PR while keeping authors in control of whether (and with which reviewers) the fleet runs, and lets them ask questions about the review before opting in.

Changes

  • pr-review-auto-route.yml (new) — On PR open/reopen, computes diff metrics (lines, files, distinct packages under packages//experimental//examples/) and picks a tier: small (1 reviewer; ≤ 100 lines AND ≤ 5 files AND ≤ 1 package), large (5; > 500 lines OR > 30 files OR > 5 packages), or medium (3). Posts a sticky proposal comment with a pre-checked reviewer checklist and uploads PR metadata. Skips if the fleet-review label is already present. Runs under pull_request_target with a base-branch checkout and a fetch-only PR head, so no PR-authored code executes despite having write permissions.

  • pr-review-confirm.yml (new) — Listens on issue_comment. Gated to OWNER/MEMBER/COLLABORATOR so external accounts cannot trigger CI with secrets; members can still "yes" on behalf of external PR authors. Classifies the reply: affirmative triggers the fleet with the checkbox-selected reviewers (rejects empty selections with a helpful reply), a question is answered via Copilot CLI, anything else is ignored. Per-PR concurrency prevents double-dispatch. Uses dawidd6/action-download-artifact to locate the auto-route artifact by PR number in one step.

  • pr-review-dispatch.yml (new) — Label-based dispatcher: the fleet-review label runs the default 3-reviewer fleet via workflow_run.

  • pr-review-fleet.yml — Trigger rewired to workflow_run + workflow_dispatch. Accepts an optional reviewers JSON array that overrides reviewer_count. Scrubs GitHub token patterns from the generated report. The sticky comment is always updated — clean runs overwrite any prior findings comment with a clean-verdict body instead of leaving it stale.

  • pr_review_propose.py (new) — Helpers for building the proposal comment, parsing checkboxes, formatting display names, and building Q&A context. Reviewer selection is factored into get_selected() as a future hook for content-aware logic. consolidate_reviews.py imports the reviewer registry from here so labels stay in sync.

Reviewer Guidance

The review process is outlined on this wiki page.

tylerbutler and others added 21 commits April 16, 2026 15:13
Adds a GitHub Actions workflow that triggers on PRs and fans out five
parallel Copilot CLI reviewer agents, each focused on a different axis:
correctness, security, API compatibility, performance, and testing.

Each agent writes its findings to a markdown file, which is then posted
as a collapsible PR comment. A summary job posts a table after all
reviewers complete.
The repo already has COPILOT_GITHUB_TOKEN configured as a repository
secret (used by code-simplifier and duplicate-code-detector workflows).
No PAT creation is needed.
- Add NO_ISSUES_FOUND marker to each reviewer's "clean" output template
- Workflow detects the marker and skips posting a comment
- Remove the summary job (was failing due to missing checkout, and adds
  noise when individual reviewers already post their own comments)
- Add --repo flag to gh pr comment for reliability
Removes synchronize/reopened triggers to avoid re-running 5 agents on
every push. Reviews now run on initial PR creation or when the
'fleet-review' label is applied for on-demand re-review.
…report

- Rewrite all reviewer prompts with adversarial personas:
  Breaker (correctness), Exploiter (security), Sentinel (API compat),
  Profiler (performance), Skeptic (testing)
- Add high-confidence gate to every reviewer: findings must have
  concrete code paths, failure mechanisms, and specific fixes
- Add severity system with per-area caps (security promotes +1,
  performance/testing capped at HIGH, etc.)
- Standardize output format: [SEVERITY] file:line — description — fix
- Replace per-reviewer PR comments with a single consolidated report
- Fan-in consolidation job merges findings, de-duplicates by file:line,
  determines verdict, and posts one structured table
- No comment posted when all reviewers find zero issues
Remove the opened trigger so reviews are only kicked off when the
fleet-review label is manually applied to a PR.
Addresses review fleet self-review findings:

- Fix regex: `^\[CRITICAL\]|\[HIGH\]|\[MEDIUM\]` only anchored the
  first alternative. Changed to `^\[(CRITICAL|HIGH|MEDIUM)\]` so all
  severities require start-of-line match, preventing false matches in
  prose text.
- Move severity counting into the de-duplication loop so the summary
  header matches the table rows.
- Use awk to split on ' — ' delimiter instead of fragile sed chains,
  fixing garbled output when descriptions contain special characters.
Use a hidden HTML marker (<\!-- pr-review-fleet -->) to find and update
the existing report comment on subsequent runs. Falls back to creating
a new comment if none exists yet.
- C1: Read prompts from the base branch (git show origin/$BASE_REF:...)
  instead of the PR checkout to prevent prompt injection via malicious
  changes to prompt files
- M1: Use unique keys (noloc-N) for findings without file:line instead
  of collapsing them all to "unknown" and silently dropping duplicates
- M2: Use cut -c (character-based) instead of head -c (byte-based) to
  avoid truncating multi-byte UTF-8 sequences in the fix column
Replace the inline bash consolidation with an external Python script
(.github/scripts/consolidate_reviews.py) that can be tested
independently. Includes 18 tests covering parsing, de-duplication,
verdict logic, and end-to-end report generation.

The workflow now checks out the script via sparse-checkout, runs it,
and posts the report only if findings were found.
- Breaker: add Fluid-specific attack vectors (distributed ops, DDS
  lifecycle, SharedTree patterns, summarization during mutations)
- API Analyst: inject api-conventions.md from the review skill at
  runtime via __API_CONVENTIONS__ placeholder; covers naming, type
  design, error handling, events, and documentation conventions
- Security: cap at MEDIUM (client library, not a service) to match
  local skill; note library context in prompt
- All prompts: add file exclusion lists (.d.ts, lockfiles, .map,
  *.api.md, binaries)
- Verdict logic: HIGH in Correctness/API Compat triggers Request
  Changes; HIGH in other areas only triggers it at 3+; aligns with
  local skill's nuanced verdict rules
- Performance: add telemetry correctness to attack list
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Tyler Butler <tyler@tylerbutler.com>
… report

- Drop shell(cat|head|tail|grep|ls|wc) grants; keep only shell(git:*).
  File inspection goes through the workspace-scoped `read` tool, which
  prevents a prompt-injected agent from reading /proc/self/environ,
  ~/.netrc, or other paths outside the repo.
- Add a scrub step that redacts gh[pousr]_ and github_pat_ token
  patterns from report.md before it is posted as a PR comment.
- Rewrite the one `| grep` usage in api-compatibility.md as a git
  pathspec so no grep allowance is needed.
- Add pr-review-dispatch.yml: maps fleet-review* labels to reviewer
  counts (small=1, default=3, large=5) and uploads dispatch params
  as an artifact for the fleet workflow to consume
- Refactor pr-review-fleet.yml: replace pull_request trigger with
  workflow_run (from dispatcher) and workflow_dispatch (manual);
  add setup job that resolves PR context and builds a dynamic reviewer
  matrix; pass head_sha explicitly to checkout so workflow_run path
  checks out the correct PR head
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <contact@tilework.tech>
…ation

- Take main's consolidate_reviews.py (deterministic themed emoji sets via
  --pr-number hash, TypedDict severity labels, hashlib)
- Take main's test_consolidate_reviews.py (tests for new emoji logic)
- Pass needs.setup.outputs.pr_number as --pr-number to the consolidation
  step so the themed emoji set is selected deterministically per PR
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <contact@tilework.tech>
Upstream switched to pull_request_target for fork-PR secret access.
This branch solves the same problem via the dispatcher architecture:
pr-review-dispatch.yml handles label events (read-only), then the fleet
triggers via workflow_run with base-repo permissions. Keep our version.
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <contact@tilework.tech>
New pr-review-auto-route.yml workflow triggers on PR open/reopen,
computes diff metrics (lines changed, files changed, packages touched),
and selects fleet size automatically:
  small  (1) — ≤100 lines AND ≤5 files AND ≤1 package
  medium (3) — everything else
  large  (5) — >500 lines OR >30 files OR >5 packages

Saves dispatch-params artifact using the same format as the label
dispatcher so the fleet workflow triggers via workflow_run unchanged.
Skips if a fleet-review* label is already present on the PR.

Also adds "PR Review Auto Router" to the fleet workflow's workflow_run
trigger list alongside the existing label dispatcher.
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <contact@tilework.tech>
@tylerbutler tylerbutler self-assigned this Apr 20, 2026
tylerbutler and others added 2 commits April 21, 2026 14:54
Auto-router now posts a sticky confirmation comment with reviewer checkboxes
when a PR opens. A new pr-review-confirm workflow listens for replies:
affirmative triggers the fleet with the selected reviewers; questions are
answered via Copilot CLI.

- Shared reviewer registry lives in pr_review_confirm.py; consolidate_reviews.py
  imports from it so reviewer labels stay consistent across the flow.
- Fleet dispatch accepts an optional reviewers JSON array; reviewer_count
  becomes optional and is ignored when reviewers is provided.
- Per-PR concurrency group on the confirm workflow prevents rapid-fire
  comments from racing to trigger the fleet twice.
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <contact@tilework.tech>
@tylerbutler tylerbutler changed the title feat(ci): auto-route PRs to review fleet by size and complexity feat(ci): PR review fleet auto-routing and confirmation flow Apr 21, 2026
The auto-router now handles size-based selection automatically, so the
three-way label dispatcher is redundant. Only the fleet-review label
remains for manual trigger, always running the default 3-reviewer fleet.

- pr-review-dispatch.yml: single-label if-guard, no reviewer-count mapping,
  no reviewer_count in dispatch params.
- pr-review-fleet.yml: workflow_run path defaults reviewer_count to 3
  instead of reading it from params.json.
- pr-review-auto-route.yml: skip guard now does an exact-match array
  contains() instead of a toJson substring match.
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <contact@tilework.tech>
@tylerbutler tylerbutler marked this pull request as ready for review April 21, 2026 22:55
Copilot AI review requested due to automatic review settings April 21, 2026 22:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an auto-routing + confirmation layer on top of the existing PR review fleet so reviewer selection and fleet sizing can be computed automatically on PR open, then explicitly confirmed (and optionally adjusted) by the author via a comment-driven flow.

Changes:

  • Introduces auto-routing on PR open/reopen to compute diff metrics, propose a reviewer set, and post a sticky confirmation comment.
  • Adds a confirmation workflow that parses reviewer checkboxes and either dispatches the fleet or answers questions via Copilot CLI.
  • Refactors the fleet workflow to run via workflow_run/workflow_dispatch, accept an explicit reviewers list, and scrub secrets from the final report.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
.github/workflows/pr-review-fleet.yml Rewires triggering, adds setup step to resolve PR params, supports explicit reviewer lists, and scrubs secrets before posting report.
.github/workflows/pr-review-dispatch.yml New label-based dispatcher that uploads PR context as an artifact for the fleet workflow.
.github/workflows/pr-review-confirm.yml New comment listener that finds pending auto-route state, parses checkboxes, and dispatches the fleet / answers questions.
.github/workflows/pr-review-auto-route.yml New auto-router that computes PR diff metrics, posts a confirmation comment, and uploads pending metadata.
.github/scripts/pr_review_confirm.py New helper script for building/parsing the confirmation comment and formatting reviewer context.
.github/scripts/consolidate_reviews.py Updates reviewer registry source and promoted-area naming to match the confirmation flow.
.github/scripts/test_consolidate_reviews.py Updates expected area label string (“API Compatibility”).

Comment thread .github/workflows/pr-review-auto-route.yml Outdated
Comment thread .github/workflows/pr-review-fleet.yml
Comment thread .github/scripts/pr_review_propose.py
Comment thread .github/workflows/pr-review-fleet.yml Outdated
Comment thread .github/scripts/consolidate_reviews.py
Comment thread .github/workflows/pr-review-confirm.yml Outdated
Comment thread .github/workflows/pr-review-confirm.yml
Comment thread .github/scripts/pr_review_propose.py
Comment thread .github/scripts/pr_review_confirm.py Outdated
Comment thread .github/scripts/pr_review_confirm.py Outdated
Comment thread .github/workflows/pr-review-confirm.yml Outdated
tylerbutler and others added 4 commits April 21, 2026 17:20
- Harden auto-router: switch to pull_request_target with base-branch
  checkout so no PR-authored code runs despite the write token. Fetches
  the PR head SHA for diff metrics without ever checking it out.
- Gate confirm workflow to OWNER/MEMBER/COLLABORATOR so external accounts
  cannot trigger CI with secrets; members can still reply on behalf of
  external PR authors. Reject empty reviewer selections with a helpful
  reply instead of silently no-opping.
- Replace the find-run-and-download bash with
  dawidd6/action-download-artifact filtered by PR number, dropping a
  ~20-line gh api + jq step.
- Always update the fleet sticky comment — clean runs overwrite a prior
  findings comment with a clean-verdict body instead of leaving it stale.
- Rename pr_review_confirm to pr_review_propose (workflow filename and
  <\!-- pr-review-confirm --> marker unchanged by design).
- Extract get_selected() as a future hook for content-aware selection.
- Light copy update on the proposal comment.
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <contact@tilework.tech>
workflow_dispatch-triggered runs don't appear on the PR's Checks tab
by default, making in-flight fleet reviews hard to discover.

- Fleet setup now creates a "Fleet Review" check run against the PR
  head SHA so the run is surfaced in the PR UI with a link back to
  the workflow logs. Consolidate finalizes it: success on clean,
  neutral when findings are posted (advisory, not gating), failure if
  consolidation itself errored. Runs under `if: always()` so a failure
  mid-way doesn't leave the check stuck in-progress.
- Fleet setup also posts an in-progress sticky comment using the same
  <\!-- pr-review-fleet --> marker as the final report, so the existing
  post/update step in consolidate overwrites it in place with the
  findings or clean-verdict body.
- Added `checks: write` permission to pr-review-fleet.yml, and a
  comment to pr-review-confirm.yml explaining why it requires
  `actions: write` (to dispatch the fleet via gh workflow run).
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <contact@tilework.tech>
Replaces the bash-based find-or-create logic for the in-progress and
final PR comments with marocchino/sticky-pull-request-comment@v2 (same
action already used by auto-route), and the gh-api + jq calls for
check-run create/finalize with LouisBrunner/checks-action@v3.1.0.

Net -22 lines and removes four bash blocks that were duplicating
primitives already exposed as actions. No behaviour change.
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <contact@tilework.tech>
…wnload

actions/upload-artifact preserves the dispatch/ prefix from the dispatcher's
upload path. The fleet workflow was reading params.json from the workspace
root, which never existed — jq would have errored on a missing file when
triggered via workflow_run.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI and others added 3 commits April 22, 2026 14:21
H1 (Pungent, Correctness): jq invocations now use -c so the reviewers
JSON written to GITHUB_OUTPUT stays single-line. Without -c, the
multi-line array was truncated at the first newline, leaving
needs.setup.outputs.reviewers as just '[' and breaking
fromJson() in the matrix on the label-dispatch (workflow_run) path.

M1 (Smelly, Correctness): Split the check-run finalize step out of
the consolidate job into a new always-runs teardown job that depends
on setup directly. This guarantees the 'Fleet Review' check is
finalized even when setup fails after creating it (which would
otherwise skip review and consolidate, leaving the check stuck
in_progress on the PR forever). consolidate now exposes has_findings
and consolidate_outcome as job outputs for the teardown job.

M2 (Smelly, Testing): Added TestCLI with one round-trip test per
subcommand (build-comment, parse-checkboxes, format-names,
build-qa-context). These exercise main()/_build_parser() so a bad
set_defaults(func=...) wiring would surface in CI.

M3 (Smelly, Testing): Added test_high_in_api_compat_file_promotes_to_request_changes
to TestMain. Pins the full file-stem -> REVIEWERS lookup ->
PROMOTED_AREAS -> 'Request Changes' verdict chain. A future label
divergence between pr_review_propose.REVIEWERS and PROMOTED_AREAS
will now break this test instead of silently downgrading verdicts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- H1: teardown failure-detection missed plain CONSOLIDATE_RESULT='failure',
  reporting a green check on a failed consolidate step. Simplify to
  `[ "$CONSOLIDATE_RESULT" \!= "success" ]` which covers failure, skipped,
  and cancelled; drop the now-redundant consolidate_outcome output.
- H2: SHA-pin marocchino/sticky-pull-request-comment@v2 to v2.9.4 SHA
  in pr-review-fleet.yml (2x) and pr-review-auto-route.yml (ratchet
  comments already present; pin was never applied).
- H3: SHA-pin peter-evans/find-comment@v3 to v3.1.0 SHA in
  pr-review-confirm.yml.
- M1: consolidate_reviews.py validates json.loads(--reviewers) is a list
  before iteration, preventing a bare JSON string (e.g. forgotten
  brackets) from being iterated char-by-char and flagged as invalid
  output from 11 phantom reviewers. Test added.
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <contact@tilework.tech>
@microsoft microsoft deleted a comment from github-actions Bot Apr 23, 2026
"<!-- pr-review-confirm -->",
"",
"Hey! Want me to review this PR?",
"Hey! You look nice today! Want me to review this PR?",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😸

@tylerbutler tylerbutler merged commit 0c429c2 into microsoft:main Apr 23, 2026
21 checks passed
@tylerbutler tylerbutler deleted the feat/pr-review-auto-router branch April 23, 2026 20:50
tylerbutler added a commit that referenced this pull request Apr 23, 2026
## Description

The \`PR Review Confirm\` workflow (introduced in #27107) fails to parse
and produces zero jobs on every PR — GitHub reports *"This run likely
failed because of a workflow file issue"* (e.g. run
[24861202057](https://github.com/microsoft/FluidFramework/actions/runs/24861202057)).
The Q&A step used three bash heredocs whose bodies were flush-left
inside a \`run: |\` YAML block scalar; YAML terminated the scalar at the
first unindented line, leaving the rest of the file unparseable.

This change:

- Extracts the Q&A prompt into \`.github/prompts/pr-review-qa.md\`,
sibling to the existing reviewer prompts, using the repo's \`__VAR__\`
placeholder convention.
- Adds a \`render-qa-prompt\` subcommand on
\`.github/scripts/pr_review_propose.py\` that substitutes
\`__REVIEWER_CONTEXT__\` and \`__REPLY__\` from environment variables
(safe for multi-line values and arbitrary special characters) and opens
the template as UTF-8.
- Replaces the heredoc block in the workflow with a single call to the
new subcommand, and extends the step's sparse-checkout to include
\`.github/prompts\`.
- Adds unit and CLI tests for the new subcommand alongside the existing
\`pr_review_propose.py\` tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants