ci: add REVIEW.md for tunable Claude reviews#104
Merged
akseljoonas merged 7 commits intomainfrom Apr 24, 2026
Merged
Conversation
REVIEW.md is a repo-root freeform instructions file that gets prepended to the review prompt as highest-priority guidance. Lets maintainers tune severity calibration, nit caps, skip lists, and repo-specific must-checks by editing one file instead of the workflow YAML. Mirrors the pattern used by the managed Anthropic Code Review product so we keep the same levers on our self-hosted Actions setup.
Insights from the Latent Space 'harness engineering' interview: review agents should default to merge, not block; 🟡/🟣 are informational not required; author pushback without a fix is legitimate for non-Important findings; repeated disagreement is a signal REVIEW.md is missing a rule. Also adds a 'What I checked' bullet list to the summary shape so even clean LGTM reviews surface the coverage the reviewer actually applied.
Replace 🔴 Important / 🟡 Nit / 🟣 Pre-existing with plain P0/P1/P2 labels throughout REVIEW.md and the workflow prompt. Matches the priority scheme from the Latent Space harness-engineering interview and reads cleaner in terminal-rendered GitHub diffs.
… verdict Maintainer feedback: default-bias-merge was borrowed from a closed AI-loop context (Ryan's harness) where the PR author is also an agent and merge-and- iterate is cheap. For an open-source repo taking one-shot external PRs with a small maintainer team, the risk flips: false negatives ship bugs, false positives cost one contributor round trip. Rigor is the correct default. Three concrete changes: - 'Default bias: rigor' replaces 'default bias: merge'. Hold the line on P0 even under contributor pushback. P1/P2 still accept deferral silently. - New 'Investigate before posting' section requires reading callers and callees (not just the diff), tracing routing/auth chains end-to-end, and checking established patterns before flagging divergence. - Summary now carries an explicit 'Verdict: ready to merge / changes requested / needs discussion' so the maintainer sees the call at a glance.
Empirical test against the current open-PR queue surfaced a false-negative: a bot PR (orbisai0security, #96) titled 'upgrade authlib to 1.6.9 for CVE-2026-27962' actually bumps 1.6.5 → 1.7.0 in the lockfile, the CVE isn't in NVD, and the bump silently introduces a new transitive dep (joserfc). Existing REVIEW.md rules are routing/auth/agent-loop centric and would LGTM it. New 'Dependency PRs' section requires: CVE verification against NVD or GH Advisory DB, title-version ↔ lockfile-diff match, justification for any new transitive dep, and P0 framing-flag when a dep-only PR claims a code-behavior fix.
- Remove 'What counts as P0 in this repo' enumeration: P0 is implicitly for Claude to figure out from context, not a static checklist. - Remove 'Always check' repo-specific enumeration: same rationale. The rigor + investigate-before-posting framing carries the weight. - Remove 'Anything CI already enforces' block under 'Do not report': rigor framing plus the skip-paths list already covers it. - Drop 'If you cannot invest the depth to verify, do not post the finding' tail from Investigate-before-posting (implicit in rigor). - Drop routing/effort/caching citation expansion from Verification bar (implicit in generic citation rule). - Drop the concrete What-I-checked example from Summary shape. - Drop 'one paragraph of context at most' from Summary shape. - Tighten P1 cap from 5 to 3.
Dep-PR rubric was carrying four bulleted cases that amounted to one idea: claims in the PR body must match the diff, new deps need justification, lying framing is P0. Collapsed to a single paragraph. Also drops 'Consider adding a test' from the speculative examples — that heuristic tends to manufacture P1s rather than filter noise.
|
I'll analyze this and get back to you. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Adds a repo-root `REVIEW.md` that the Claude review workflow prepends to the prompt as highest-priority guidance. Mirrors the pattern Anthropic's managed Code Review product uses, so we keep the same tuning levers on our self-hosted setup.
What's in REVIEW.md
Workflow change
`.github/workflows/claude-review.yml` gains a `Compose review prompt` step that cats `REVIEW.md` (if present) followed by the baseline prompt, then passes the whole thing via `steps.compose.outputs.prompt`. Falls back cleanly if `REVIEW.md` is missing.
Not changed
Test plan