ci: add REVIEW.md for tunable Claude reviews by akseljoonas · Pull Request #104 · huggingface/ml-intern

akseljoonas · 2026-04-24T14:22:44Z

Summary

Adds a repo-root `REVIEW.md` that the Claude review workflow prepends to the prompt as highest-priority guidance. Mirrors the pattern Anthropic's managed Code Review product uses, so we keep the same tuning levers on our self-hosted setup.

What's in REVIEW.md

Severity calibration for this repo — 🔴 Important reserved for LLM routing breakage, effort-probe regressions, auth/quota fail-open, shell/URL injection, agent-loop correctness, backend↔frontend contract drift. Everything else 🟡 Nit at most.
Nit cap of 5 per review ("plus N similar items" in the summary beyond that).
Re-review convergence — on subsequent runs against the same PR, suppress new Nits, post Important only.
Skip list — generated files, lockfiles, `session_logs/`, `reports/`, anything ruff already enforces, speculative findings without a concrete failure mode.
Always-check list — new routing prefixes must update `_print_hf_routing_info`; new LLM calls must go through `llm_params.py`; system-prompt edits must call out dropped rules; prompt-cache markers must survive.
Verification bar — every behavior claim requires a `file:line` citation.
Summary shape — reviews open with a one-line tally (`3 important, 2 nits` / `No blocking issues — 2 nits` / `LGTM`).

Workflow change

`.github/workflows/claude-review.yml` gains a `Compose review prompt` step that cats `REVIEW.md` (if present) followed by the baseline prompt, then passes the whole thing via `steps.compose.outputs.prompt`. Falls back cleanly if `REVIEW.md` is missing.

Not changed

`.github/workflows/claude.yml` (the `@claude` mention workflow) — mention requests use whatever prompt the user wrote in the comment; we don't override it.

Test plan

Merge, then open a throwaway PR and confirm the review body opens with a severity tally
Confirm findings carry severity markers
Tweak REVIEW.md (e.g. drop nit cap to 1), push to any open PR, confirm the next review honors it

REVIEW.md is a repo-root freeform instructions file that gets prepended to the review prompt as highest-priority guidance. Lets maintainers tune severity calibration, nit caps, skip lists, and repo-specific must-checks by editing one file instead of the workflow YAML. Mirrors the pattern used by the managed Anthropic Code Review product so we keep the same levers on our self-hosted Actions setup.

Insights from the Latent Space 'harness engineering' interview: review agents should default to merge, not block; 🟡/🟣 are informational not required; author pushback without a fix is legitimate for non-Important findings; repeated disagreement is a signal REVIEW.md is missing a rule. Also adds a 'What I checked' bullet list to the summary shape so even clean LGTM reviews surface the coverage the reviewer actually applied.

Replace 🔴 Important / 🟡 Nit / 🟣 Pre-existing with plain P0/P1/P2 labels throughout REVIEW.md and the workflow prompt. Matches the priority scheme from the Latent Space harness-engineering interview and reads cleaner in terminal-rendered GitHub diffs.

… verdict Maintainer feedback: default-bias-merge was borrowed from a closed AI-loop context (Ryan's harness) where the PR author is also an agent and merge-and- iterate is cheap. For an open-source repo taking one-shot external PRs with a small maintainer team, the risk flips: false negatives ship bugs, false positives cost one contributor round trip. Rigor is the correct default. Three concrete changes: - 'Default bias: rigor' replaces 'default bias: merge'. Hold the line on P0 even under contributor pushback. P1/P2 still accept deferral silently. - New 'Investigate before posting' section requires reading callers and callees (not just the diff), tracing routing/auth chains end-to-end, and checking established patterns before flagging divergence. - Summary now carries an explicit 'Verdict: ready to merge / changes requested / needs discussion' so the maintainer sees the call at a glance.

Empirical test against the current open-PR queue surfaced a false-negative: a bot PR (orbisai0security, #96) titled 'upgrade authlib to 1.6.9 for CVE-2026-27962' actually bumps 1.6.5 → 1.7.0 in the lockfile, the CVE isn't in NVD, and the bump silently introduces a new transitive dep (joserfc). Existing REVIEW.md rules are routing/auth/agent-loop centric and would LGTM it. New 'Dependency PRs' section requires: CVE verification against NVD or GH Advisory DB, title-version ↔ lockfile-diff match, justification for any new transitive dep, and P0 framing-flag when a dep-only PR claims a code-behavior fix.

- Remove 'What counts as P0 in this repo' enumeration: P0 is implicitly for Claude to figure out from context, not a static checklist. - Remove 'Always check' repo-specific enumeration: same rationale. The rigor + investigate-before-posting framing carries the weight. - Remove 'Anything CI already enforces' block under 'Do not report': rigor framing plus the skip-paths list already covers it. - Drop 'If you cannot invest the depth to verify, do not post the finding' tail from Investigate-before-posting (implicit in rigor). - Drop routing/effort/caching citation expansion from Verification bar (implicit in generic citation rule). - Drop the concrete What-I-checked example from Summary shape. - Drop 'one paragraph of context at most' from Summary shape. - Tighten P1 cap from 5 to 3.

Dep-PR rubric was carrying four bulleted cases that amounted to one idea: claims in the PR body must match the diff, new deps need justification, lying framing is P0. Collapsed to a single paragraph. Also drops 'Consider adding a test' from the speculative examples — that heuristic tends to manufacture P1s rather than filter noise.

akseljoonas

lgtm

claude · 2026-04-24T15:11:50Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

akseljoonas added 7 commits April 24, 2026 17:22

akseljoonas commented Apr 24, 2026

View reviewed changes

akseljoonas merged commit b292d83 into main Apr 24, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add REVIEW.md for tunable Claude reviews#104

ci: add REVIEW.md for tunable Claude reviews#104
akseljoonas merged 7 commits intomainfrom
ci/review-md

akseljoonas commented Apr 24, 2026

Uh oh!

akseljoonas left a comment

Uh oh!

Uh oh!

claude Bot commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akseljoonas commented Apr 24, 2026

Summary

What's in REVIEW.md

Workflow change

Not changed

Test plan

Uh oh!

akseljoonas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant