feat(ci): PR review fleet auto-routing and confirmation flow#27107
Merged
tylerbutler merged 36 commits intomicrosoft:mainfrom Apr 23, 2026
Merged
feat(ci): PR review fleet auto-routing and confirmation flow#27107tylerbutler merged 36 commits intomicrosoft:mainfrom
tylerbutler merged 36 commits intomicrosoft:mainfrom
Conversation
Adds a GitHub Actions workflow that triggers on PRs and fans out five parallel Copilot CLI reviewer agents, each focused on a different axis: correctness, security, API compatibility, performance, and testing. Each agent writes its findings to a markdown file, which is then posted as a collapsible PR comment. A summary job posts a table after all reviewers complete.
The repo already has COPILOT_GITHUB_TOKEN configured as a repository secret (used by code-simplifier and duplicate-code-detector workflows). No PAT creation is needed.
- Add NO_ISSUES_FOUND marker to each reviewer's "clean" output template - Workflow detects the marker and skips posting a comment - Remove the summary job (was failing due to missing checkout, and adds noise when individual reviewers already post their own comments) - Add --repo flag to gh pr comment for reliability
Removes synchronize/reopened triggers to avoid re-running 5 agents on every push. Reviews now run on initial PR creation or when the 'fleet-review' label is applied for on-demand re-review.
…report - Rewrite all reviewer prompts with adversarial personas: Breaker (correctness), Exploiter (security), Sentinel (API compat), Profiler (performance), Skeptic (testing) - Add high-confidence gate to every reviewer: findings must have concrete code paths, failure mechanisms, and specific fixes - Add severity system with per-area caps (security promotes +1, performance/testing capped at HIGH, etc.) - Standardize output format: [SEVERITY] file:line — description — fix - Replace per-reviewer PR comments with a single consolidated report - Fan-in consolidation job merges findings, de-duplicates by file:line, determines verdict, and posts one structured table - No comment posted when all reviewers find zero issues
Remove the opened trigger so reviews are only kicked off when the fleet-review label is manually applied to a PR.
Addresses review fleet self-review findings: - Fix regex: `^\[CRITICAL\]|\[HIGH\]|\[MEDIUM\]` only anchored the first alternative. Changed to `^\[(CRITICAL|HIGH|MEDIUM)\]` so all severities require start-of-line match, preventing false matches in prose text. - Move severity counting into the de-duplication loop so the summary header matches the table rows. - Use awk to split on ' — ' delimiter instead of fragile sed chains, fixing garbled output when descriptions contain special characters.
Use a hidden HTML marker (<\!-- pr-review-fleet -->) to find and update the existing report comment on subsequent runs. Falls back to creating a new comment if none exists yet.
- C1: Read prompts from the base branch (git show origin/$BASE_REF:...) instead of the PR checkout to prevent prompt injection via malicious changes to prompt files - M1: Use unique keys (noloc-N) for findings without file:line instead of collapsing them all to "unknown" and silently dropping duplicates - M2: Use cut -c (character-based) instead of head -c (byte-based) to avoid truncating multi-byte UTF-8 sequences in the fix column
Replace the inline bash consolidation with an external Python script (.github/scripts/consolidate_reviews.py) that can be tested independently. Includes 18 tests covering parsing, de-duplication, verdict logic, and end-to-end report generation. The workflow now checks out the script via sparse-checkout, runs it, and posts the report only if findings were found.
- Breaker: add Fluid-specific attack vectors (distributed ops, DDS lifecycle, SharedTree patterns, summarization during mutations) - API Analyst: inject api-conventions.md from the review skill at runtime via __API_CONVENTIONS__ placeholder; covers naming, type design, error handling, events, and documentation conventions - Security: cap at MEDIUM (client library, not a service) to match local skill; note library context in prompt - All prompts: add file exclusion lists (.d.ts, lockfiles, .map, *.api.md, binaries) - Verdict logic: HIGH in Correctness/API Compat triggers Request Changes; HIGH in other areas only triggers it at 3+; aligns with local skill's nuanced verdict rules - Performance: add telemetry correctness to attack list
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Tyler Butler <tyler@tylerbutler.com>
… report - Drop shell(cat|head|tail|grep|ls|wc) grants; keep only shell(git:*). File inspection goes through the workspace-scoped `read` tool, which prevents a prompt-injected agent from reading /proc/self/environ, ~/.netrc, or other paths outside the repo. - Add a scrub step that redacts gh[pousr]_ and github_pat_ token patterns from report.md before it is posted as a PR comment. - Rewrite the one `| grep` usage in api-compatibility.md as a git pathspec so no grep allowance is needed.
- Add pr-review-dispatch.yml: maps fleet-review* labels to reviewer counts (small=1, default=3, large=5) and uploads dispatch params as an artifact for the fleet workflow to consume - Refactor pr-review-fleet.yml: replace pull_request trigger with workflow_run (from dispatcher) and workflow_dispatch (manual); add setup job that resolves PR context and builds a dynamic reviewer matrix; pass head_sha explicitly to checkout so workflow_run path checks out the correct PR head 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>
…ation - Take main's consolidate_reviews.py (deterministic themed emoji sets via --pr-number hash, TypedDict severity labels, hashlib) - Take main's test_consolidate_reviews.py (tests for new emoji logic) - Pass needs.setup.outputs.pr_number as --pr-number to the consolidation step so the themed emoji set is selected deterministically per PR 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>
Upstream switched to pull_request_target for fork-PR secret access. This branch solves the same problem via the dispatcher architecture: pr-review-dispatch.yml handles label events (read-only), then the fleet triggers via workflow_run with base-repo permissions. Keep our version. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>
New pr-review-auto-route.yml workflow triggers on PR open/reopen, computes diff metrics (lines changed, files changed, packages touched), and selects fleet size automatically: small (1) — ≤100 lines AND ≤5 files AND ≤1 package medium (3) — everything else large (5) — >500 lines OR >30 files OR >5 packages Saves dispatch-params artifact using the same format as the label dispatcher so the fleet workflow triggers via workflow_run unchanged. Skips if a fleet-review* label is already present on the PR. Also adds "PR Review Auto Router" to the fleet workflow's workflow_run trigger list alongside the existing label dispatcher. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>
Auto-router now posts a sticky confirmation comment with reviewer checkboxes when a PR opens. A new pr-review-confirm workflow listens for replies: affirmative triggers the fleet with the selected reviewers; questions are answered via Copilot CLI. - Shared reviewer registry lives in pr_review_confirm.py; consolidate_reviews.py imports from it so reviewer labels stay consistent across the flow. - Fleet dispatch accepts an optional reviewers JSON array; reviewer_count becomes optional and is ignored when reviewers is provided. - Per-PR concurrency group on the confirm workflow prevents rapid-fire comments from racing to trigger the fleet twice. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>
The auto-router now handles size-based selection automatically, so the three-way label dispatcher is redundant. Only the fleet-review label remains for manual trigger, always running the default 3-reviewer fleet. - pr-review-dispatch.yml: single-label if-guard, no reviewer-count mapping, no reviewer_count in dispatch params. - pr-review-fleet.yml: workflow_run path defaults reviewer_count to 3 instead of reading it from params.json. - pr-review-auto-route.yml: skip guard now does an exact-match array contains() instead of a toJson substring match. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds an auto-routing + confirmation layer on top of the existing PR review fleet so reviewer selection and fleet sizing can be computed automatically on PR open, then explicitly confirmed (and optionally adjusted) by the author via a comment-driven flow.
Changes:
- Introduces auto-routing on PR open/reopen to compute diff metrics, propose a reviewer set, and post a sticky confirmation comment.
- Adds a confirmation workflow that parses reviewer checkboxes and either dispatches the fleet or answers questions via Copilot CLI.
- Refactors the fleet workflow to run via
workflow_run/workflow_dispatch, accept an explicitreviewerslist, and scrub secrets from the final report.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/pr-review-fleet.yml | Rewires triggering, adds setup step to resolve PR params, supports explicit reviewer lists, and scrubs secrets before posting report. |
| .github/workflows/pr-review-dispatch.yml | New label-based dispatcher that uploads PR context as an artifact for the fleet workflow. |
| .github/workflows/pr-review-confirm.yml | New comment listener that finds pending auto-route state, parses checkboxes, and dispatches the fleet / answers questions. |
| .github/workflows/pr-review-auto-route.yml | New auto-router that computes PR diff metrics, posts a confirmation comment, and uploads pending metadata. |
| .github/scripts/pr_review_confirm.py | New helper script for building/parsing the confirmation comment and formatting reviewer context. |
| .github/scripts/consolidate_reviews.py | Updates reviewer registry source and promoted-area naming to match the confirmation flow. |
| .github/scripts/test_consolidate_reviews.py | Updates expected area label string (“API Compatibility”). |
noencke
reviewed
Apr 21, 2026
noencke
reviewed
Apr 21, 2026
noencke
approved these changes
Apr 21, 2026
tylerbutler
commented
Apr 21, 2026
tylerbutler
commented
Apr 21, 2026
- Harden auto-router: switch to pull_request_target with base-branch checkout so no PR-authored code runs despite the write token. Fetches the PR head SHA for diff metrics without ever checking it out. - Gate confirm workflow to OWNER/MEMBER/COLLABORATOR so external accounts cannot trigger CI with secrets; members can still reply on behalf of external PR authors. Reject empty reviewer selections with a helpful reply instead of silently no-opping. - Replace the find-run-and-download bash with dawidd6/action-download-artifact filtered by PR number, dropping a ~20-line gh api + jq step. - Always update the fleet sticky comment — clean runs overwrite a prior findings comment with a clean-verdict body instead of leaving it stale. - Rename pr_review_confirm to pr_review_propose (workflow filename and <\!-- pr-review-confirm --> marker unchanged by design). - Extract get_selected() as a future hook for content-aware selection. - Light copy update on the proposal comment. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>
workflow_dispatch-triggered runs don't appear on the PR's Checks tab by default, making in-flight fleet reviews hard to discover. - Fleet setup now creates a "Fleet Review" check run against the PR head SHA so the run is surfaced in the PR UI with a link back to the workflow logs. Consolidate finalizes it: success on clean, neutral when findings are posted (advisory, not gating), failure if consolidation itself errored. Runs under `if: always()` so a failure mid-way doesn't leave the check stuck in-progress. - Fleet setup also posts an in-progress sticky comment using the same <\!-- pr-review-fleet --> marker as the final report, so the existing post/update step in consolidate overwrites it in place with the findings or clean-verdict body. - Added `checks: write` permission to pr-review-fleet.yml, and a comment to pr-review-confirm.yml explaining why it requires `actions: write` (to dispatch the fleet via gh workflow run). 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>
Replaces the bash-based find-or-create logic for the in-progress and final PR comments with marocchino/sticky-pull-request-comment@v2 (same action already used by auto-route), and the gh-api + jq calls for check-run create/finalize with LouisBrunner/checks-action@v3.1.0. Net -22 lines and removes four bash blocks that were duplicating primitives already exposed as actions. No behaviour change. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>
…wnload actions/upload-artifact preserves the dispatch/ prefix from the dispatcher's upload path. The fleet workflow was reading params.json from the workspace root, which never existed — jq would have errored on a missing file when triggered via workflow_run. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
H1 (Pungent, Correctness): jq invocations now use -c so the reviewers JSON written to GITHUB_OUTPUT stays single-line. Without -c, the multi-line array was truncated at the first newline, leaving needs.setup.outputs.reviewers as just '[' and breaking fromJson() in the matrix on the label-dispatch (workflow_run) path. M1 (Smelly, Correctness): Split the check-run finalize step out of the consolidate job into a new always-runs teardown job that depends on setup directly. This guarantees the 'Fleet Review' check is finalized even when setup fails after creating it (which would otherwise skip review and consolidate, leaving the check stuck in_progress on the PR forever). consolidate now exposes has_findings and consolidate_outcome as job outputs for the teardown job. M2 (Smelly, Testing): Added TestCLI with one round-trip test per subcommand (build-comment, parse-checkboxes, format-names, build-qa-context). These exercise main()/_build_parser() so a bad set_defaults(func=...) wiring would surface in CI. M3 (Smelly, Testing): Added test_high_in_api_compat_file_promotes_to_request_changes to TestMain. Pins the full file-stem -> REVIEWERS lookup -> PROMOTED_AREAS -> 'Request Changes' verdict chain. A future label divergence between pr_review_propose.REVIEWERS and PROMOTED_AREAS will now break this test instead of silently downgrading verdicts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- H1: teardown failure-detection missed plain CONSOLIDATE_RESULT='failure', reporting a green check on a failed consolidate step. Simplify to `[ "$CONSOLIDATE_RESULT" \!= "success" ]` which covers failure, skipped, and cancelled; drop the now-redundant consolidate_outcome output. - H2: SHA-pin marocchino/sticky-pull-request-comment@v2 to v2.9.4 SHA in pr-review-fleet.yml (2x) and pr-review-auto-route.yml (ratchet comments already present; pin was never applied). - H3: SHA-pin peter-evans/find-comment@v3 to v3.1.0 SHA in pr-review-confirm.yml. - M1: consolidate_reviews.py validates json.loads(--reviewers) is a list before iteration, preventing a bare JSON string (e.g. forgotten brackets) from being iterated char-by-char and flagged as invalid output from 11 phantom reviewers. Test added. 🤖 Generated with [Nori](https://noriagentic.com) Co-Authored-By: Nori <contact@tilework.tech>
noencke
approved these changes
Apr 23, 2026
| "<!-- pr-review-confirm -->", | ||
| "", | ||
| "Hey! Want me to review this PR?", | ||
| "Hey! You look nice today! Want me to review this PR?", |
tylerbutler
added a commit
that referenced
this pull request
Apr 23, 2026
## Description The \`PR Review Confirm\` workflow (introduced in #27107) fails to parse and produces zero jobs on every PR — GitHub reports *"This run likely failed because of a workflow file issue"* (e.g. run [24861202057](https://github.com/microsoft/FluidFramework/actions/runs/24861202057)). The Q&A step used three bash heredocs whose bodies were flush-left inside a \`run: |\` YAML block scalar; YAML terminated the scalar at the first unindented line, leaving the rest of the file unparseable. This change: - Extracts the Q&A prompt into \`.github/prompts/pr-review-qa.md\`, sibling to the existing reviewer prompts, using the repo's \`__VAR__\` placeholder convention. - Adds a \`render-qa-prompt\` subcommand on \`.github/scripts/pr_review_propose.py\` that substitutes \`__REVIEWER_CONTEXT__\` and \`__REPLY__\` from environment variables (safe for multi-line values and arbitrary special characters) and opens the template as UTF-8. - Replaces the heredoc block in the workflow with a single call to the new subcommand, and extends the step's sparse-checkout to include \`.github/prompts\`. - Adds unit and CLI tests for the new subcommand alongside the existing \`pr_review_propose.py\` tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds two layers on top of the existing PR review fleet: automatic fleet sizing on PR open, and an opt-in proposal comment so the author can adjust the reviewer set before anything runs. Removes the friction of picking the right label for every PR while keeping authors in control of whether (and with which reviewers) the fleet runs, and lets them ask questions about the review before opting in.
Changes
pr-review-auto-route.yml(new) — On PR open/reopen, computes diff metrics (lines, files, distinct packages underpackages//experimental//examples/) and picks a tier: small (1 reviewer; ≤ 100 lines AND ≤ 5 files AND ≤ 1 package), large (5; > 500 lines OR > 30 files OR > 5 packages), or medium (3). Posts a sticky proposal comment with a pre-checked reviewer checklist and uploads PR metadata. Skips if thefleet-reviewlabel is already present. Runs underpull_request_targetwith a base-branch checkout and a fetch-only PR head, so no PR-authored code executes despite having write permissions.pr-review-confirm.yml(new) — Listens onissue_comment. Gated to OWNER/MEMBER/COLLABORATOR so external accounts cannot trigger CI with secrets; members can still "yes" on behalf of external PR authors. Classifies the reply: affirmative triggers the fleet with the checkbox-selected reviewers (rejects empty selections with a helpful reply), a question is answered via Copilot CLI, anything else is ignored. Per-PR concurrency prevents double-dispatch. Usesdawidd6/action-download-artifactto locate the auto-route artifact by PR number in one step.pr-review-dispatch.yml(new) — Label-based dispatcher: thefleet-reviewlabel runs the default 3-reviewer fleet viaworkflow_run.pr-review-fleet.yml— Trigger rewired toworkflow_run+workflow_dispatch. Accepts an optionalreviewersJSON array that overridesreviewer_count. Scrubs GitHub token patterns from the generated report. The sticky comment is always updated — clean runs overwrite any prior findings comment with a clean-verdict body instead of leaving it stale.pr_review_propose.py(new) — Helpers for building the proposal comment, parsing checkboxes, formatting display names, and building Q&A context. Reviewer selection is factored intoget_selected()as a future hook for content-aware logic.consolidate_reviews.pyimports the reviewer registry from here so labels stay in sync.Reviewer Guidance
The review process is outlined on this wiki page.