Fail mouse-latency cells below valid-rep target#1361
Conversation
There was a problem hiding this comment.
Pull request overview
Tightens the mouse-latency cell quality gate so cells that fall below the intended 10 valid reps (e.g., the observed 9/15 cell) are explicitly labeled INSUFFICIENT-VALID-REPS instead of OK, ensuring the final verdict is fail-closed as INSUFFICIENT-DATA. Plan doc references to the prior < 7 threshold are realigned with the runner's 10-valid-rep target.
Changes:
- Bump the per-cell valid-rep threshold from 7 to 10 and rename the sub-status to
INSUFFICIENT-VALID-REPS. - Extend the aggregate test suite with new
9/15-style cell and gate-cell cases; update the existingexcludes_invalid_from_mediantest to the 10-valid baseline. - Update §4.5/§4.7/§7.2 of the #905 plan to reflect the 10-valid-rep gate-grade requirement.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| test/incus/mouse_latency_aggregate.py | New REQUIRED_VALID_REPS=10 and INSUFFICIENT-VALID-REPS status; gate status check propagates this to INSUFFICIENT-DATA. |
| test/incus/mouse_latency_aggregate_test.py | Adds tests for the 9/15 cell and gate-cell propagation; adjusts the median-exclusion test to 10 valid reps. |
| docs/pr/905-mouse-latency/plan.md | Plan text updated from < 7 to < 10 valid-rep wording in §4.5, §4.7, §7.2. |
Claude round-1 review on
|
Round-1 quad-review consolidated synthesis on
|
| Reviewer | Verdict |
|---|---|
| Claude | MERGE-READY |
| Codex | MERGE-NEEDS-MINOR (2 doc/rationale findings) |
| Gemini Pro 3 | MERGE-NEEDS-MINOR (2 findings: backward-compat break + doc mismatch) |
| Copilot | 0 inline findings |
Convergence on substance
All four reviewers verified the core mechanism:
REQUIRED_VALID_REPS = 10named constant (not magic number)- Threshold raised cleanly from
< 7→< 10 - New
INSUFFICIENT-VALID-REPScell status distinct fromINSUFFICIENT-DATAfinal-verdict - Fail-closed: any non-OK gate cell → INSUFFICIENT-DATA final + exit 2
- Replay artifact
xpf-100e100m-surplus-persistent-i20-20260515-235752reproduces9/15 → INSUFFICIENT-VALID-REPS → exit 2 - Boundary correct (
< 10means exactly 10 OK, 11+ also OK) - Tests cover insufficient + sufficient paths
Codex MINOR — doc/rationale
docs/pr/905-mouse-latency/plan.md:526-528and:570-571— still say "cells with fewer than 10 valid reps are reported INSUFFICIENT-DATA", but the implementation distinguishes cell-statusINSUFFICIENT-VALID-REPSfrom final-verdictINSUFFICIENT-DATA. Real LOW; doc wording.- Statistical rationale for 10-vs-7 is thin — the plan documents 10 reps generally via median-of-10 / IQR notes but doesn't strongly justify why 9 must be discarded. Policy decision, not a code blocker.
Gemini MINOR — same findings + parameterization suggestion
- Backward-compat break for old artifacts — runs that previously passed with 7-9 reps will now fail. Gemini suggests CLI parameterization (
--required-valid-reps) to allow historical replay under prior threshold.- My read: the prior
< 7was a latent under-gate that the Measurement: mouse-latency tail under elephant load (100E100M unmeasured half) #905 plan §7.2 always specified as 10. The break is intentional. Adding a CLI override would be useful for archeological replay but isn't a blocker for the gate itself.
- My read: the prior
Recommendation
MERGE. Code is correct, threshold is fail-closed properly, tests cover both paths. The 2 doc-mismatch findings (status naming) are real LOW doc follow-ups; the CLI-parameterization suggestion is a future enhancement, not a blocker.
Merging per user authorization.
Codex task: `task-mp8ehjal-ovqthr`. Gemini Pro 3 task: `task-mp8ehuba-92g0kt`.
Summary
OKINSUFFICIENT-VALID-REPSINSUFFICIENT-DATAwhen any gate cell lacks enough valid repsCloses #1360.
Validation
python3 test/incus/mouse_latency_aggregate_test.pypython3 -m py_compile test/incus/mouse_latency_aggregate.pygit diff --check/tmp/xpf-100e100m-surplus-persistent-i20-20260515-235752: loaded9/15now reportsINSUFFICIENT-VALID-REPS; final verdict isINSUFFICIENT-DATA; reducer exits2.