💡 Generalize pr-review into a Context-Adaptive "Review" Agent (PRs, Plans, Skills) #607

don-petry · 2026-06-12T00:35:15Z

don-petry
Jun 12, 2026
Maintainer

Summary

Generalize the existing pr-review agent into a single context-adaptive "review" agent that reviews any reviewable artifact — a PR diff today, an initiative plan.json (Epic #597's critic), a candidate skill edit (Epic #581's gate) tomorrow — by selecting its rubric and output channel from context rather than being hard-wired to pull requests. Instead of forking a second "plan critic" alongside pr-review (the naive reading of Epic #597's #603), we keep one maintained review brain and teach it new artifact types. The review logic, model routing, and approval discipline already built for PRs become reusable across the whole agentic pipeline.

Market Signal

"Review" is converging into a reusable, cross-cutting agent capability rather than a per-surface bolt-on:

LLM-as-judge / critic generalizes across tasks: the same judge/critic harness scores code, plans, and free-form output when given a task-appropriate rubric — which is why production eval stacks expose a generic judge configured per use case, not one judge per artifact (AI-native CI/CD eval gates, MLflow: evaluating skills).
Reflect-and-critique is the portable quality primitive: GEPA-style reflective critique improves any generated artifact from the same loop (GEPA 2507.19457, DSPy); the actor-critic split is artifact-agnostic.
Agent-authored artifacts of every kind now need review: GitHub frames the surge of agent-authored PRs as needing a structured review gate (reviewing agent PRs) — and our pipeline now produces agent-authored plans and will produce agent-authored skill edits. The same discipline should cover all three.

Hype filter: this is consolidation, not new capability — we already run a competent reviewer. The shippable move is a rubric/output-channel abstraction over the current pr-review, not a new model or framework.

User Signal

This fell directly out of answering Epic #597's open questions. The critic-wiring question (B1 / story #603) asked "second claude-code-action step vs. second prompt turn?" — and the better answer was neither: reuse pr-review. That implies a generalization bigger than #603:

We already maintain a capable reviewer: agents/pr-reviewer.md + the prompts/triage.md → deep-review.md → synthesize.md cascade + scripts/engine.sh model routing + the approval discipline that supplies the org-leads CODEOWNER review.
Epic Initiative: Adversarial plan-critic + structural gates for the initiative-planner (Bob) #597 needs that same judgment applied to an initiative plan (the adversarial critic, [Phase 2] Adversarial plan-critic pass against the fixed rubric #603).
Epic Initiative: Eval-gated, human-reviewed self-improving skills (SkillOpt-style) #581 needs it applied to a candidate skill edit (the strict-improvement gate).
Without generalization we'd fork the review logic three ways and maintain three drifting rubrics.

Consolidating means a fix or improvement to "how we review" lands once and benefits PRs, plans, and skills together — the highest-leverage shape for a small org.

Technical Opportunity

Refactor pr-review around an explicit artifact contract rather than assuming a PR:

Artifact abstraction: a review invocation takes {artifact_type, content_ref, rubric, output_channel}. artifact_type=pr_diff reproduces today's behavior exactly (no regression); plan_json and skill_candidate are new types.
Rubric registry: per-type rubrics as versioned files (the PR rubric is the current prompts/ cascade; the plan rubric is Epic Initiative: Adversarial plan-critic + structural gates for the initiative-planner (Bob) #597's fixed checklist; the skill rubric is Epic Initiative: Eval-gated, human-reviewed self-improving skills (SkillOpt-style) #581's eval criteria) — CODEOWNER-gated like any review logic.
Output channels: PR → inline review comments; plan → structured findings consumed by the planner before materialize; skill → pass/score for the gate.
Reuse the engine + approval plumbing: keep scripts/engine.sh model routing and the existing idempotency markers / advisory-gate behavior; only the input adapter and rubric vary.
Migration is additive: ship pr_diff as the first (and initially only) registered type so pr-review consumers are untouched, then register plan_json to satisfy [Phase 2] Adversarial plan-critic pass against the fixed rubric #603, then skill_candidate for Initiative: Eval-gated, human-reviewed self-improving skills (SkillOpt-style) #581.

Assessment

Dimension	Score	Rationale
Feasibility	med	The reviewer exists; this is an input-adapter + rubric-registry refactor, not new capability. Risk is regressing the production PR path, so `pr_diff` must be behavior-preserving and well-tested before new types land.
Impact	high	One maintained review brain instead of three; every review improvement compounds across PRs, plans (#597/#603), and skills (#581). Removes a forked-critic from #597 before it's built.
Urgency	med	Sequence-sensitive: cheapest if done before #603 implements a bespoke critic, so the generalization isn't retrofitted. Not a fire, but the window is now.

Adversarial Review

Strongest objection: pr-review is load-bearing production infrastructure — it supplies the required CODEOWNER approval on real PRs across the org. Refactoring it to chase reuse risks regressing the one agent we most depend on, to serve two initiatives that are still inert. Premature abstraction over two hypothetical artifact types is exactly how a clean reviewer turns into a leaky god-object.

Rebuttal: The migration is explicitly behavior-preserving: artifact_type=pr_diff is the only registered type at first and must pass the existing PR-review tests unchanged before anything else is added — so production review is untouched until the abstraction is proven. The abstraction isn't speculative: we have two concrete, already-planned second consumers (#603 and #581's gate), which is the bar for "extract, don't guess." And the alternative is worse for exactly the reliability reason raised — three separately-evolving review prompts drift apart, so a hardening fix (e.g. a prompt-injection guard) has to be applied three times and will be missed once. Consolidation reduces the attack/regression surface, it doesn't grow it. If the second consumer never materializes, we simply never register the second type — the refactor still leaves pr-review cleaner.

Suggested Next Step

Do the behavior-preserving extraction first: introduce the {artifact_type, content_ref, rubric, output_channel} contract with pr_diff as the sole registered type, and prove the existing PR-review suite passes unchanged. Then register plan_json to satisfy Epic #597's #603 (re-scoped to "invoke the review agent with the plan rubric"), and skill_candidate for Epic #581's gate. Coordinate with the Safe Release Strategy so any change to pr-review rides its existing ring/channel promotion rather than shipping straight to stable.

2026-06-12T01:19:36Z

github-actions[bot]
Bot Jun 12, 2026

📋 Initiative planned by the BMAD Scrum Master (Bob).

Epic #610 — Generalize pr-review into a context-adaptive review agent (artifact contract + rubric registry)

6 stories created (inert — labelled initiative, NOT initiative:auto):

[Phase 1] Define the review artifact contract + rubric registry (pr_diff sole type) #611 (M) — [Phase 1] Define the review artifact contract + rubric registry (pr_diff sole type)
[Phase 1] Route pr_diff review through the artifact contract (behavior-preserving extraction) #612 (L) — [Phase 1] Route pr_diff review through the artifact contract (behavior-preserving extraction)
[Phase 1] Regression guard locking pr_diff behavior through the contract #613 (S) — [Phase 1] Regression guard locking pr_diff behavior through the contract
[Phase 2] Register plan_json artifact type (plan rubric + structured-findings channel) #614 (M) — [Phase 2] Register plan_json artifact type (plan rubric + structured-findings channel)
[Phase 2] Register skill_candidate artifact type (skill rubric + pass/score channel) #615 (M) — [Phase 2] Register skill_candidate artifact type (skill rubric + pass/score channel)
[Phase 2] Promote the generalized review agent through the Safe Release rings #616 (S) — [Phase 2] Promote the generalized review agent through the Safe Release rings

Open questions for review:

Should the plan_json rubric (Story 4) and [Phase 1] Bake the fixed critic rubric into Bob's planning prompt #599's 'fixed critic rubric baked into Bob's planning prompt' share one source-of-truth file, or may the review-agent plan rubric diverge from the planner's self-check? Coordinate with Epic Initiative: Adversarial plan-critic + structural gates for the initiative-planner (Bob) #597.
The skill_candidate rubric (Story 5) depends on Epic Initiative: Eval-gated, human-reviewed self-improving skills (SkillOpt-style) #581's eval-case format ([Phase 1] Define held-out eval-case format + seed prompts/triage.md case set #582) and scorer semantics ([Phase 1] Build the deterministic eval scorer + offline tests (triage) #583/[Phase 2] Extend the harness to prompts/deep-review.md via an LLM-judge scorer #585/[Phase 2] Strict-improvement gate + manual propose->validate->PR runbook with regression-rejection proof #586) being finalized; if they are not, skill_candidate registration may need to wait or ship a stub rubric behind the gate.
Registry/helper placement: Story 1 proposes scripts/lib/review-registry.sh + a sibling README; confirm maintainers prefer that over a prompts/ manifest.
Output channels for plan_json/skill_candidate are assumed to be net-new non-GitHub channels (findings JSON / pass-score) rather than a switch inside post-pr-review.sh; confirm that boundary.
Story 6 assumes Epic Initiative: Safe Release Strategy for Agentic Workflows (versioning · rings · canary) #495's ring-staged promotion ([Phase 2] Replace PUT-contents clobber deploy with versioned, ring-staged, health-gated promotion #501) is live; if not, the behavior-preserving refactor should hold at ring 0 / self-host until the pipeline exists.

Review the epic and its sub-issue DAG, adjust as needed, then add initiative:auto to epic #610 to hand it to initiative-driver for auto-implementation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💡 Generalize pr-review into a Context-Adaptive "Review" Agent (PRs, Plans, Skills) #607

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

💡 Generalize pr-review into a Context-Adaptive "Review" Agent (PRs, Plans, Skills) #607

Uh oh!

don-petry Jun 12, 2026 Maintainer

Summary

Market Signal

User Signal

Technical Opportunity

Assessment

Adversarial Review

Suggested Next Step

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Jun 12, 2026

don-petry
Jun 12, 2026
Maintainer

github-actions[bot]
Bot Jun 12, 2026