-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
Four modules have zero Python test coverage:
| Module | Functions | Dafny Companion | Risk |
|---|---|---|---|
state.py |
9 (make_state, start_phase, complete_phase, fail_phase, transition_to_next, feedback_loop, retry_phase, diagnostic_transition, is_review_complete) | spec/state.dfy, lib/verified/state.dfy |
High — incorrect transitions corrupt review state |
saturation.py |
4 (discovery_saturation, conceptual_saturation, should_terminate_discovery, should_feedback_loop) | spec/saturation.dfy, lib/verified/saturation.dfy |
High — incorrect thresholds cause premature termination or infinite loops |
metrics.py |
8 (count_excluded, count_flagged, count_included, recompute_all, etc.) | lib/verified/metrics.dfy |
Medium — incorrect counts reported in state.json |
cli.py |
21 (all CLI commands, JSONL readers, workspace loader) | None | Medium — I/O boundary, parse failures, missing file handling |
The Dafny companions prove properties about the specification, not the Python implementation. A conformance gap between the two is possible and would be invisible without Python-level tests.
What to Test
state.py — Mirror the Dafny lemmas:
- Forward progress: complete phase N, transition to N+1
- No skipping: cannot complete phase N without N-1
- Retry bound: at most 1 retry
- Feedback bound: feedback_iterations never exceeds max
- Invalid transitions raise ValueError
saturation.py — Mirror the Dafny lemmas:
- Zero denominator returns Fraction(0, 1)
- Saturation range [0, 1]
- Threshold boundary: saturation == threshold exactly
- should_feedback_loop dual condition
metrics.py — Mirror the Dafny lemmas:
- Accounting identity: excluded + flagged + included == total screened
- Non-negative counts
- Idempotency: same inputs produce same output
- Empty inputs return zero counts
cli.py — I/O boundary tests:
- JSONL parsing (valid, empty, malformed)
- Missing workspace directory
- Missing individual data files (graceful degradation)
Context
Identified during panel review (2026-03-23). Dijkstra (Formal Methods) panelist flagged as non-blocking concern. The 85 existing tests cover validators, postconditions, oracle contracts, preprocessing, PRISMA, export, and section parsing well — the gap is specifically in state machine, saturation, metrics, and CLI.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels