Skip to content

feat(court): enhance reviewer and judge prompts to catch silent data corruption#31

Merged
MrFlounder merged 1 commit intomainfrom
feat/court-review-enhanced-prompts
Feb 20, 2026
Merged

feat(court): enhance reviewer and judge prompts to catch silent data corruption#31
MrFlounder merged 1 commit intomainfrom
feat/court-review-enhanced-prompts

Conversation

@MrFlounder
Copy link
Copy Markdown
Contributor

Summary

  • Expands both Reviewer A (Claude) and Reviewer B (Codex) prompts with 4 high-risk pattern checks derived from the PR #3232 incident
  • Adds a "Judge's Own Investigation" step to Phase 3 so the judge proactively checks for these patterns even if neither reviewer flags them
  • The 4 patterns: called-but-not-changed code contracts, mutations on read paths, blast radius from null/default initial state, data contract completeness at merge boundaries

Context

PR #3232 (refreshMetricsIfStale()) caused mass data corruption because:

  1. The bug lived in MetricsCalculationService.calculate() which was called but not changed in the diff
  2. A GET endpoint silently mutated data via fire-and-forget
  3. All existing evals had null metricsRefreshedAt, so every record got hit on first view
  4. Object spread merged incomplete data over existing metrics, zeroing out fields

None of these were visible in the diff alone. The current reviewer prompts only say "review for bugs, security issues, and code quality" — too generic to catch this class of issue.

Test plan

  • Run crab court on a test PR and verify both reviewers mention the new checks in their findings
  • Verify the judge's Phase 3 output includes the "Judge's Own Investigation" section
  • Confirm no regression in general review quality (reviewers still catch standard bugs)

🤖 Generated with Claude Code

…corruption patterns

Adds 4 high-risk pattern checks to both reviewer agent prompts and the
judge's own investigation phase, derived from the PR #3232 incident where
a clean-looking diff caused mass data corruption through called-but-not-changed code.

Reviewers now actively check for:
1. Called-but-not-changed code contracts (trace into called functions)
2. Mutations on read paths (writes triggered by GET endpoints)
3. Blast radius from null/default initial state (mass rewrites on first access)
4. Data contract completeness at merge/spread boundaries (missing fields)

The judge also proactively investigates these patterns even if neither
reviewer flagged them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MrFlounder MrFlounder merged commit f901632 into main Feb 20, 2026
3 checks passed
@MrFlounder MrFlounder deleted the feat/court-review-enhanced-prompts branch February 20, 2026 04:38
This was referenced Feb 20, 2026
This was referenced Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant