Skip to content

Critical modules lack Python tests despite Dafny companions #2

@melek

Description

@melek

Problem

Four modules have zero Python test coverage:

Module Functions Dafny Companion Risk
state.py 9 (make_state, start_phase, complete_phase, fail_phase, transition_to_next, feedback_loop, retry_phase, diagnostic_transition, is_review_complete) spec/state.dfy, lib/verified/state.dfy High — incorrect transitions corrupt review state
saturation.py 4 (discovery_saturation, conceptual_saturation, should_terminate_discovery, should_feedback_loop) spec/saturation.dfy, lib/verified/saturation.dfy High — incorrect thresholds cause premature termination or infinite loops
metrics.py 8 (count_excluded, count_flagged, count_included, recompute_all, etc.) lib/verified/metrics.dfy Medium — incorrect counts reported in state.json
cli.py 21 (all CLI commands, JSONL readers, workspace loader) None Medium — I/O boundary, parse failures, missing file handling

The Dafny companions prove properties about the specification, not the Python implementation. A conformance gap between the two is possible and would be invisible without Python-level tests.

What to Test

state.py — Mirror the Dafny lemmas:

  • Forward progress: complete phase N, transition to N+1
  • No skipping: cannot complete phase N without N-1
  • Retry bound: at most 1 retry
  • Feedback bound: feedback_iterations never exceeds max
  • Invalid transitions raise ValueError

saturation.py — Mirror the Dafny lemmas:

  • Zero denominator returns Fraction(0, 1)
  • Saturation range [0, 1]
  • Threshold boundary: saturation == threshold exactly
  • should_feedback_loop dual condition

metrics.py — Mirror the Dafny lemmas:

  • Accounting identity: excluded + flagged + included == total screened
  • Non-negative counts
  • Idempotency: same inputs produce same output
  • Empty inputs return zero counts

cli.py — I/O boundary tests:

  • JSONL parsing (valid, empty, malformed)
  • Missing workspace directory
  • Missing individual data files (graceful degradation)

Context

Identified during panel review (2026-03-23). Dijkstra (Formal Methods) panelist flagged as non-blocking concern. The 85 existing tests cover validators, postconditions, oracle contracts, preprocessing, PRISMA, export, and section parsing well — the gap is specifically in state machine, saturation, metrics, and CLI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions