|
| 1 | +# ADR-40126: Formal Compliance Test Suite for the Monte Carlo Forecast Engine |
| 2 | + |
| 3 | +**Date**: 2026-06-19 |
| 4 | +**Status**: Draft |
| 5 | + |
| 6 | +## Context |
| 7 | + |
| 8 | +The Monte Carlo forecast engine (`pkg/cli/forecast.go`, `forecast_montecarlo.go`) is governed by a formal specification under `specs/forecast-compliance-fixtures/README.md`, with numbered requirements (R-MC-*, R-FC-*, R-CLI-*) and acceptance tests T-FC-031–T-FC-040. The canonical fixture `run_summary_minimal.json` set `total_effective_tokens` but omitted `total_aic`; because `forecast.go:593` gates on `runAIC ≤ 0 → continue`, every fixture-backed run was silently skipped and produced no Monte Carlo input. This field-name gap between what the fixture published and what the engine reads (`TokenUsage.TotalAIC`) went undetected because no test bound the fixture schema to the engine's read path. The engine's numeric invariants (Bernoulli success collapse, observed-rate and yield formulas, the λ=15 Poisson crossover, the n≥10 reliability threshold, nil-on-degenerate-λ behavior) were likewise unguarded by executable tests mapped to the spec. |
| 9 | + |
| 10 | +## Decision |
| 11 | + |
| 12 | +We will add a formal compliance test suite, `pkg/cli/forecast_compliance_fixtures_formal_test.go`, that maps thirteen predicates (P1–P13) one-to-one onto the formal model in `specs/forecast-compliance-fixtures/README.md`. Each predicate is expressed as a table-driven Go test that exercises the real engine functions (`runMonteCarlo`, `useNormalApproximationForPoisson`, `RunForecast`) and the real fixture files, with the formal cross-reference (TLA+ invariant, F* pre/post condition, or Z3-SMT schema gap) recorded in the test doc comment. We will also fix the fixture by adding `"total_aic": 0.0054`, and P12/P13 explicitly guard against regression of that schema gap. This follows the existing repository convention for formal conformance suites (see [ADR-38166](38166-formal-conformance-test-suite-for-compiler-threat-detection.md)). |
| 13 | + |
| 14 | +## Alternatives Considered |
| 15 | + |
| 16 | +### Alternative 1: Fix the fixture only, no test suite |
| 17 | +Add `total_aic` to the fixture and stop. This restores correct behavior but leaves the schema-mapping gap unguarded — the same field could be dropped or renamed again with no failing test. Rejected because the original defect was caused precisely by the absence of a test binding fixture keys to the engine's read path. |
| 18 | + |
| 19 | +### Alternative 2: Generic unit tests without spec mapping |
| 20 | +Write conventional unit tests covering the engine's numeric behavior without the explicit P1–P13 predicate-to-specification mapping. This would catch regressions but would not make conformance to the formal model auditable, and would diverge from the established formal-conformance pattern already used elsewhere in the codebase. Rejected for weaker traceability between requirement and test. |
| 21 | + |
| 22 | +## Consequences |
| 23 | + |
| 24 | +### Positive |
| 25 | +- The fixture-to-engine field mapping (`total_aic`) is now enforced by P1/P12/P13; the silent-skip defect cannot regress unnoticed. |
| 26 | +- Each formal requirement has a named, executable test, giving auditable traceability between `specs/forecast-compliance-fixtures/README.md` and `pkg/cli`. |
| 27 | +- Numeric invariants (Poisson crossover at λ=15, n≥10 reliability, degenerate-λ nil handling) are now pinned to exact boundary values. |
| 28 | + |
| 29 | +### Negative |
| 30 | +- The test file replicates a small amount of engine logic (e.g. the duration derivation in P10 mirrors `forecast.go:573-574`); the replica must be kept in sync if the engine changes. |
| 31 | +- Thirteen predicates add maintenance surface that must track future edits to the formal specification. |
| 32 | + |
| 33 | +### Neutral |
| 34 | +- Tests are gated behind `//go:build !integration` and run against in-repo fixtures, so they require no GitHub authentication. |
| 35 | +- P11 tolerates non-validation errors from `RunForecast` (e.g. missing auth) to remain runnable in CI sandboxes, asserting only on the days-validation message. |
| 36 | + |
| 37 | +--- |
| 38 | + |
| 39 | +*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/27797333197) workflow. The PR author must review, complete, and finalize this document before the PR can merge.* |
0 commit comments