feat(court): enhance reviewer and judge prompts to catch silent data corruption by MrFlounder · Pull Request #31 · promptfoo/crabcode

MrFlounder · 2026-02-20T04:37:05Z

Summary

Expands both Reviewer A (Claude) and Reviewer B (Codex) prompts with 4 high-risk pattern checks derived from the PR #3232 incident
Adds a "Judge's Own Investigation" step to Phase 3 so the judge proactively checks for these patterns even if neither reviewer flags them
The 4 patterns: called-but-not-changed code contracts, mutations on read paths, blast radius from null/default initial state, data contract completeness at merge boundaries

Context

PR #3232 (refreshMetricsIfStale()) caused mass data corruption because:

The bug lived in MetricsCalculationService.calculate() which was called but not changed in the diff
A GET endpoint silently mutated data via fire-and-forget
All existing evals had null metricsRefreshedAt, so every record got hit on first view
Object spread merged incomplete data over existing metrics, zeroing out fields

None of these were visible in the diff alone. The current reviewer prompts only say "review for bugs, security issues, and code quality" — too generic to catch this class of issue.

Test plan

Run crab court on a test PR and verify both reviewers mention the new checks in their findings
Verify the judge's Phase 3 output includes the "Judge's Own Investigation" section
Confirm no regression in general review quality (reviewers still catch standard bugs)

🤖 Generated with Claude Code

…corruption patterns Adds 4 high-risk pattern checks to both reviewer agent prompts and the judge's own investigation phase, derived from the PR #3232 incident where a clean-looking diff caused mass data corruption through called-but-not-changed code. Reviewers now actively check for: 1. Called-but-not-changed code contracts (trace into called functions) 2. Mutations on read paths (writes triggered by GET endpoints) 3. Blast radius from null/default initial state (mass rewrites on first access) 4. Data contract completeness at merge/spread boundaries (missing fields) The judge also proactively investigates these patterns even if neither reviewer flagged them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MrFlounder merged commit f901632 into main Feb 20, 2026
3 checks passed

MrFlounder deleted the feat/court-review-enhanced-prompts branch February 20, 2026 04:38

This was referenced Feb 20, 2026

chore(main): release 0.11.1 #30

Closed

chore(main): release 0.13.1 #35

Merged

This was referenced Mar 18, 2026

chore(main): release 0.14.0 #70

Merged

chore(main): release 0.14.1 #71

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(court): enhance reviewer and judge prompts to catch silent data corruption#31

feat(court): enhance reviewer and judge prompts to catch silent data corruption#31
MrFlounder merged 1 commit intomainfrom
feat/court-review-enhanced-prompts

MrFlounder commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MrFlounder commented Feb 20, 2026

Summary

Context

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant