Skip to content

Add case_coverage category to quality checklist#2557

Merged
hiroshinishio merged 1 commit intomainfrom
wes
Apr 20, 2026
Merged

Add case_coverage category to quality checklist#2557
hiroshinishio merged 1 commit intomainfrom
wes

Conversation

@hiroshinishio
Copy link
Copy Markdown
Collaborator

@hiroshinishio hiroshinishio commented Apr 19, 2026

Summary

Added a new case_coverage category to the quality checklist with three checks:

  • dimension_enumeration — tests identify the function's independent input dimensions
  • combinatorial_matrix — all meaningful combinations are covered, not just happy path
  • explicit_expected_per_cell — each case asserts an exact expected result

The checklist is JSON-serialized into the grader's system prompt, so the new category is picked up automatically across all callers. The checklist hash changes, invalidating cached grading results for previously evaluated files.

Verified end-to-end with two Gemma integration tests: one confirms the category reaches the model and is graded with valid statuses; the second contrasts a 1-case test against a parametrized matrix on the same source and asserts Gemma marks combinatorial_matrix as fail for the weak test while the strong test fails fewer checks overall. Grading is discriminative, not rubber-stamped.

Social Media Post (GitAuto)

Quality gate now grades tests on case-matrix completeness

  • New case_coverage category checks whether tests enumerate independent input dimensions and assert exact expected values per cell
  • Verified the grader distinguishes a 1-case test from a full parametrized matrix on the same source
  • Thin test files now surface as failures instead of slipping through on happy-path coverage

Social Media Post (Wes)

Was reviewing a new function this morning and noticed the tests covered one happy path for a function with three independent input dimensions. The existing quality gate passed it. Added a category that grades whether tests enumerate the full matrix. Ran it on a contrived weak-vs-strong pair to prove the grader actually discriminates. Now thin tests fail the gate instead of sliding through.

New checks: dimension_enumeration, combinatorial_matrix, explicit_expected_per_cell.
Verified end-to-end with Gemma integration tests — grading discriminates a
1-case test from a parametrized matrix on the same source.
@hiroshinishio hiroshinishio self-assigned this Apr 19, 2026
@hiroshinishio hiroshinishio merged commit 45be343 into main Apr 20, 2026
1 check passed
@hiroshinishio hiroshinishio deleted the wes branch April 20, 2026 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant