feat(ci): PR review fleet auto-routing and confirmation flow by tylerbutler · Pull Request #27107 · microsoft/FluidFramework

tylerbutler · 2026-04-20T22:36:03Z

Description

Adds two layers on top of the existing PR review fleet: automatic fleet sizing on PR open, and an opt-in proposal comment so the author can adjust the reviewer set before anything runs. Removes the friction of picking the right label for every PR while keeping authors in control of whether (and with which reviewers) the fleet runs, and lets them ask questions about the review before opting in.

Changes

pr-review-auto-route.yml (new) — On PR open/reopen, computes diff metrics (lines, files, distinct packages under packages//experimental//examples/) and picks a tier: small (1 reviewer; ≤ 100 lines AND ≤ 5 files AND ≤ 1 package), large (5; > 500 lines OR > 30 files OR > 5 packages), or medium (3). Posts a sticky proposal comment with a pre-checked reviewer checklist and uploads PR metadata. Skips if the fleet-review label is already present. Runs under pull_request_target with a base-branch checkout and a fetch-only PR head, so no PR-authored code executes despite having write permissions.
pr-review-confirm.yml (new) — Listens on issue_comment. Gated to OWNER/MEMBER/COLLABORATOR so external accounts cannot trigger CI with secrets; members can still "yes" on behalf of external PR authors. Classifies the reply: affirmative triggers the fleet with the checkbox-selected reviewers (rejects empty selections with a helpful reply), a question is answered via Copilot CLI, anything else is ignored. Per-PR concurrency prevents double-dispatch. Uses dawidd6/action-download-artifact to locate the auto-route artifact by PR number in one step.
pr-review-dispatch.yml (new) — Label-based dispatcher: the fleet-review label runs the default 3-reviewer fleet via workflow_run.
pr-review-fleet.yml — Trigger rewired to workflow_run + workflow_dispatch. Accepts an optional reviewers JSON array that overrides reviewer_count. Scrubs GitHub token patterns from the generated report. The sticky comment is always updated — clean runs overwrite any prior findings comment with a clean-verdict body instead of leaving it stale.
pr_review_propose.py (new) — Helpers for building the proposal comment, parsing checkboxes, formatting display names, and building Q&A context. Reviewer selection is factored into get_selected() as a future hook for content-aware logic. consolidate_reviews.py imports the reviewer registry from here so labels stay in sync.

Reviewer Guidance

The review process is outlined on this wiki page.

Adds a GitHub Actions workflow that triggers on PRs and fans out five parallel Copilot CLI reviewer agents, each focused on a different axis: correctness, security, API compatibility, performance, and testing. Each agent writes its findings to a markdown file, which is then posted as a collapsible PR comment. A summary job posts a table after all reviewers complete.

The repo already has COPILOT_GITHUB_TOKEN configured as a repository secret (used by code-simplifier and duplicate-code-detector workflows). No PAT creation is needed.

- Add NO_ISSUES_FOUND marker to each reviewer's "clean" output template - Workflow detects the marker and skips posting a comment - Remove the summary job (was failing due to missing checkout, and adds noise when individual reviewers already post their own comments) - Add --repo flag to gh pr comment for reliability

Removes synchronize/reopened triggers to avoid re-running 5 agents on every push. Reviews now run on initial PR creation or when the 'fleet-review' label is applied for on-demand re-review.

…report - Rewrite all reviewer prompts with adversarial personas: Breaker (correctness), Exploiter (security), Sentinel (API compat), Profiler (performance), Skeptic (testing) - Add high-confidence gate to every reviewer: findings must have concrete code paths, failure mechanisms, and specific fixes - Add severity system with per-area caps (security promotes +1, performance/testing capped at HIGH, etc.) - Standardize output format: [SEVERITY] file:line — description — fix - Replace per-reviewer PR comments with a single consolidated report - Fan-in consolidation job merges findings, de-duplicates by file:line, determines verdict, and posts one structured table - No comment posted when all reviewers find zero issues

Remove the opened trigger so reviews are only kicked off when the fleet-review label is manually applied to a PR.

Addresses review fleet self-review findings: - Fix regex: `^\[CRITICAL\]|\[HIGH\]|\[MEDIUM\]` only anchored the first alternative. Changed to `^\[(CRITICAL|HIGH|MEDIUM)\]` so all severities require start-of-line match, preventing false matches in prose text. - Move severity counting into the de-duplication loop so the summary header matches the table rows. - Use awk to split on ' — ' delimiter instead of fragile sed chains, fixing garbled output when descriptions contain special characters.

Use a hidden HTML marker (<\!-- pr-review-fleet -->) to find and update the existing report comment on subsequent runs. Falls back to creating a new comment if none exists yet.

- C1: Read prompts from the base branch (git show origin/$BASE_REF:...) instead of the PR checkout to prevent prompt injection via malicious changes to prompt files - M1: Use unique keys (noloc-N) for findings without file:line instead of collapsing them all to "unknown" and silently dropping duplicates - M2: Use cut -c (character-based) instead of head -c (byte-based) to avoid truncating multi-byte UTF-8 sequences in the fix column

Replace the inline bash consolidation with an external Python script (.github/scripts/consolidate_reviews.py) that can be tested independently. Includes 18 tests covering parsing, de-duplication, verdict logic, and end-to-end report generation. The workflow now checks out the script via sparse-checkout, runs it, and posts the report only if findings were found.

- Breaker: add Fluid-specific attack vectors (distributed ops, DDS lifecycle, SharedTree patterns, summarization during mutations) - API Analyst: inject api-conventions.md from the review skill at runtime via __API_CONVENTIONS__ placeholder; covers naming, type design, error handling, events, and documentation conventions - Security: cap at MEDIUM (client library, not a service) to match local skill; note library context in prompt - All prompts: add file exclusion lists (.d.ts, lockfiles, .map, *.api.md, binaries) - Verdict logic: HIGH in Correctness/API Compat triggers Request Changes; HIGH in other areas only triggers it at 3+; aligns with local skill's nuanced verdict rules - Performance: add telemetry correctness to attack list

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Tyler Butler <tyler@tylerbutler.com>

… report - Drop shell(cat|head|tail|grep|ls|wc) grants; keep only shell(git:*). File inspection goes through the workspace-scoped `read` tool, which prevents a prompt-injected agent from reading /proc/self/environ, ~/.netrc, or other paths outside the repo. - Add a scrub step that redacts gh[pousr]_ and github_pat_ token patterns from report.md before it is posted as a PR comment. - Rewrite the one `| grep` usage in api-compatibility.md as a git pathspec so no grep allowance is needed.

- Add pr-review-dispatch.yml: maps fleet-review* labels to reviewer counts (small=1, default=3, large=5) and uploads dispatch params as an artifact for the fleet workflow to consume - Refactor pr-review-fleet.yml: replace pull_request trigger with workflow_run (from dispatcher) and workflow_dispatch (manual); add setup job that resolves PR context and builds a dynamic reviewer matrix; pass head_sha explicitly to checkout so workflow_run path checks out the correct PR head 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>

…ation - Take main's consolidate_reviews.py (deterministic themed emoji sets via --pr-number hash, TypedDict severity labels, hashlib) - Take main's test_consolidate_reviews.py (tests for new emoji logic) - Pass needs.setup.outputs.pr_number as --pr-number to the consolidation step so the themed emoji set is selected deterministically per PR 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>

Upstream switched to pull_request_target for fork-PR secret access. This branch solves the same problem via the dispatcher architecture: pr-review-dispatch.yml handles label events (read-only), then the fleet triggers via workflow_run with base-repo permissions. Keep our version. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>

New pr-review-auto-route.yml workflow triggers on PR open/reopen, computes diff metrics (lines changed, files changed, packages touched), and selects fleet size automatically: small (1) — ≤100 lines AND ≤5 files AND ≤1 package medium (3) — everything else large (5) — >500 lines OR >30 files OR >5 packages Saves dispatch-params artifact using the same format as the label dispatcher so the fleet workflow triggers via workflow_run unchanged. Skips if a fleet-review* label is already present on the PR. Also adds "PR Review Auto Router" to the fleet workflow's workflow_run trigger list alongside the existing label dispatcher. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>

Auto-router now posts a sticky confirmation comment with reviewer checkboxes when a PR opens. A new pr-review-confirm workflow listens for replies: affirmative triggers the fleet with the selected reviewers; questions are answered via Copilot CLI. - Shared reviewer registry lives in pr_review_confirm.py; consolidate_reviews.py imports from it so reviewer labels stay consistent across the flow. - Fleet dispatch accepts an optional reviewers JSON array; reviewer_count becomes optional and is ignored when reviewers is provided. - Per-PR concurrency group on the confirm workflow prevents rapid-fire comments from racing to trigger the fleet twice. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>

The auto-router now handles size-based selection automatically, so the three-way label dispatcher is redundant. Only the fleet-review label remains for manual trigger, always running the default 3-reviewer fleet. - pr-review-dispatch.yml: single-label if-guard, no reviewer-count mapping, no reviewer_count in dispatch params. - pr-review-fleet.yml: workflow_run path defaults reviewer_count to 3 instead of reading it from params.json. - pr-review-auto-route.yml: skip guard now does an exact-match array contains() instead of a toJson substring match. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>

Copilot

Pull request overview

This PR adds an auto-routing + confirmation layer on top of the existing PR review fleet so reviewer selection and fleet sizing can be computed automatically on PR open, then explicitly confirmed (and optionally adjusted) by the author via a comment-driven flow.

Changes:

Introduces auto-routing on PR open/reopen to compute diff metrics, propose a reviewer set, and post a sticky confirmation comment.
Adds a confirmation workflow that parses reviewer checkboxes and either dispatches the fleet or answers questions via Copilot CLI.
Refactors the fleet workflow to run via workflow_run/workflow_dispatch, accept an explicit reviewers list, and scrub secrets from the final report.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
.github/workflows/pr-review-fleet.yml	Rewires triggering, adds setup step to resolve PR params, supports explicit reviewer lists, and scrubs secrets before posting report.
.github/workflows/pr-review-dispatch.yml	New label-based dispatcher that uploads PR context as an artifact for the fleet workflow.
.github/workflows/pr-review-confirm.yml	New comment listener that finds pending auto-route state, parses checkboxes, and dispatches the fleet / answers questions.
.github/workflows/pr-review-auto-route.yml	New auto-router that computes PR diff metrics, posts a confirmation comment, and uploads pending metadata.
.github/scripts/pr_review_confirm.py	New helper script for building/parsing the confirmation comment and formatting reviewer context.
.github/scripts/consolidate_reviews.py	Updates reviewer registry source and promoted-area naming to match the confirmation flow.
.github/scripts/test_consolidate_reviews.py	Updates expected area label string (“API Compatibility”).

- Harden auto-router: switch to pull_request_target with base-branch checkout so no PR-authored code runs despite the write token. Fetches the PR head SHA for diff metrics without ever checking it out. - Gate confirm workflow to OWNER/MEMBER/COLLABORATOR so external accounts cannot trigger CI with secrets; members can still reply on behalf of external PR authors. Reject empty reviewer selections with a helpful reply instead of silently no-opping. - Replace the find-run-and-download bash with dawidd6/action-download-artifact filtered by PR number, dropping a ~20-line gh api + jq step. - Always update the fleet sticky comment — clean runs overwrite a prior findings comment with a clean-verdict body instead of leaving it stale. - Rename pr_review_confirm to pr_review_propose (workflow filename and <\!-- pr-review-confirm --> marker unchanged by design). - Extract get_selected() as a future hook for content-aware selection. - Light copy update on the proposal comment. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>

workflow_dispatch-triggered runs don't appear on the PR's Checks tab by default, making in-flight fleet reviews hard to discover. - Fleet setup now creates a "Fleet Review" check run against the PR head SHA so the run is surfaced in the PR UI with a link back to the workflow logs. Consolidate finalizes it: success on clean, neutral when findings are posted (advisory, not gating), failure if consolidation itself errored. Runs under `if: always()` so a failure mid-way doesn't leave the check stuck in-progress. - Fleet setup also posts an in-progress sticky comment using the same <\!-- pr-review-fleet --> marker as the final report, so the existing post/update step in consolidate overwrites it in place with the findings or clean-verdict body. - Added `checks: write` permission to pr-review-fleet.yml, and a comment to pr-review-confirm.yml explaining why it requires `actions: write` (to dispatch the fleet via gh workflow run). 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>

Replaces the bash-based find-or-create logic for the in-progress and final PR comments with marocchino/sticky-pull-request-comment@v2 (same action already used by auto-route), and the gh-api + jq calls for check-run create/finalize with LouisBrunner/checks-action@v3.1.0. Net -22 lines and removes four bash blocks that were duplicating primitives already exposed as actions. No behaviour change. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>

…wnload actions/upload-artifact preserves the dispatch/ prefix from the dispatcher's upload path. The fleet workflow was reading params.json from the workspace root, which never existed — jq would have errored on a missing file when triggered via workflow_run. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

H1 (Pungent, Correctness): jq invocations now use -c so the reviewers JSON written to GITHUB_OUTPUT stays single-line. Without -c, the multi-line array was truncated at the first newline, leaving needs.setup.outputs.reviewers as just '[' and breaking fromJson() in the matrix on the label-dispatch (workflow_run) path. M1 (Smelly, Correctness): Split the check-run finalize step out of the consolidate job into a new always-runs teardown job that depends on setup directly. This guarantees the 'Fleet Review' check is finalized even when setup fails after creating it (which would otherwise skip review and consolidate, leaving the check stuck in_progress on the PR forever). consolidate now exposes has_findings and consolidate_outcome as job outputs for the teardown job. M2 (Smelly, Testing): Added TestCLI with one round-trip test per subcommand (build-comment, parse-checkboxes, format-names, build-qa-context). These exercise main()/_build_parser() so a bad set_defaults(func=...) wiring would surface in CI. M3 (Smelly, Testing): Added test_high_in_api_compat_file_promotes_to_request_changes to TestMain. Pins the full file-stem -> REVIEWERS lookup -> PROMOTED_AREAS -> 'Request Changes' verdict chain. A future label divergence between pr_review_propose.REVIEWERS and PROMOTED_AREAS will now break this test instead of silently downgrading verdicts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- H1: teardown failure-detection missed plain CONSOLIDATE_RESULT='failure', reporting a green check on a failed consolidate step. Simplify to `[ "$CONSOLIDATE_RESULT" \!= "success" ]` which covers failure, skipped, and cancelled; drop the now-redundant consolidate_outcome output. - H2: SHA-pin marocchino/sticky-pull-request-comment@v2 to v2.9.4 SHA in pr-review-fleet.yml (2x) and pr-review-auto-route.yml (ratchet comments already present; pin was never applied). - H3: SHA-pin peter-evans/find-comment@v3 to v3.1.0 SHA in pr-review-confirm.yml. - M1: consolidate_reviews.py validates json.loads(--reviewers) is a list before iteration, preventing a bare JSON string (e.g. forgotten brackets) from being iterated char-by-char and flagged as invalid output from 11 phantom reviewers. Test added. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>

noencke · 2026-04-23T20:42:39Z

        "<!-- pr-review-confirm -->",
        "",
-        "Hey! Want me to review this PR?",
+        "Hey! You look nice today! Want me to review this PR?",


## Description The \`PR Review Confirm\` workflow (introduced in #27107) fails to parse and produces zero jobs on every PR — GitHub reports *"This run likely failed because of a workflow file issue"* (e.g. run [24861202057](https://github.com/microsoft/FluidFramework/actions/runs/24861202057)). The Q&A step used three bash heredocs whose bodies were flush-left inside a \`run: |\` YAML block scalar; YAML terminated the scalar at the first unindented line, leaving the rest of the file unparseable. This change: - Extracts the Q&A prompt into \`.github/prompts/pr-review-qa.md\`, sibling to the existing reviewer prompts, using the repo's \`__VAR__\` placeholder convention. - Adds a \`render-qa-prompt\` subcommand on \`.github/scripts/pr_review_propose.py\` that substitutes \`__REVIEWER_CONTEXT__\` and \`__REPLY__\` from environment variables (safe for multi-line values and arbitrary special characters) and opens the template as UTF-8. - Replaces the heredoc block in the workflow with a single call to the new subcommand, and extends the step's sparse-checkout to include \`.github/prompts\`. - Adds unit and CLI tests for the new subcommand alongside the existing \`pr_review_propose.py\` tests.

tylerbutler and others added 21 commits April 16, 2026 15:13

fix: remove PAT setup instructions, use existing COPILOT_GITHUB_TOKEN

c89ba55

The repo already has COPILOT_GITHUB_TOKEN configured as a repository secret (used by code-simplifier and duplicate-code-detector workflows). No PAT creation is needed.

fix: trigger review fleet only on PR open or fleet-review label

5257450

Removes synchronize/reopened triggers to avoid re-running 5 agents on every push. Reviews now run on initial PR creation or when the 'fleet-review' label is applied for on-demand re-review.

fix: use claude-sonnet-4-6 as the model for review agents

e6c53df

fix: use telescope emoji for review fleet report header

e4a3783

fix: trigger review fleet only on fleet-review label (manual only)

c31eea5

Remove the opened trigger so reviews are only kicked off when the fleet-review label is manually applied to a PR.

upgrade action versions

391fd4a

feat: update existing PR comment instead of posting duplicates

d678b76

Use a hidden HTML marker (<\!-- pr-review-fleet -->) to find and update the existing report comment on subsequent runs. Falls back to creating a new comment if none exists yet.

Merge branch 'main' into feat/pr-review-fleet

a835799

Apply suggestions from code review

d8766a6

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Tyler Butler <tyler@tylerbutler.com>

tylerbutler self-assigned this Apr 20, 2026

tylerbutler and others added 2 commits April 21, 2026 14:54

Merge branch 'main' into feat/pr-review-auto-router

4dff371

tylerbutler changed the title ~~feat(ci): auto-route PRs to review fleet by size and complexity~~ feat(ci): PR review fleet auto-routing and confirmation flow Apr 21, 2026

tylerbutler marked this pull request as ready for review April 21, 2026 22:55

Copilot AI review requested due to automatic review settings April 21, 2026 22:55

Copilot started reviewing on behalf of tylerbutler April 21, 2026 22:56 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

noencke reviewed Apr 21, 2026

View reviewed changes

Comment thread .github/scripts/pr_review_propose.py

noencke reviewed Apr 21, 2026

View reviewed changes

Comment thread .github/scripts/pr_review_confirm.py Outdated

noencke approved these changes Apr 21, 2026

View reviewed changes

tylerbutler commented Apr 21, 2026

View reviewed changes

Comment thread .github/scripts/pr_review_confirm.py Outdated

tylerbutler commented Apr 21, 2026

View reviewed changes

Comment thread .github/workflows/pr-review-confirm.yml Outdated

tylerbutler and others added 4 commits April 21, 2026 17:20

style(ci): ruff format and lint .github/scripts python

24f9aec

tylerbutler added the fleet-review label Apr 22, 2026

tylerbutler added fleet-review and removed fleet-review labels Apr 22, 2026

Copilot AI and others added 3 commits April 22, 2026 14:21

chore: gitignore .github/scripts/__pycache__

6dbacc5

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge branch 'main' into feat/pr-review-auto-router

065206e

tylerbutler added fleet-review and removed fleet-review labels Apr 22, 2026

tylerbutler added 2 commits April 23, 2026 11:27

updates

195e996

Merge branch 'main' into feat/pr-review-auto-router

4f25e87

tylerbutler added fleet-review and removed fleet-review labels Apr 23, 2026

microsoft deleted a comment from github-actions Bot Apr 23, 2026

noencke approved these changes Apr 23, 2026

View reviewed changes

tylerbutler merged commit 0c429c2 into microsoft:main Apr 23, 2026
21 checks passed

tylerbutler deleted the feat/pr-review-auto-router branch April 23, 2026 20:50

tylerbutler mentioned this pull request Apr 23, 2026

fix(ci): unbreak pr-review-confirm workflow YAML parsing #27149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): PR review fleet auto-routing and confirmation flow#27107

feat(ci): PR review fleet auto-routing and confirmation flow#27107
tylerbutler merged 36 commits intomicrosoft:mainfrom
tylerbutler:feat/pr-review-auto-router

tylerbutler commented Apr 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noencke Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tylerbutler commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Reviewer Guidance

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noencke Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tylerbutler commented Apr 20, 2026 •

edited

Loading