Skip to content

Commit 3235a3c

Browse files
authored
feat: add Monte Carlo forecast compliance test suite (P1–P13) and fix fixture AIC gap (#40126)
1 parent 95989bd commit 3235a3c

4 files changed

Lines changed: 457 additions & 1 deletion

File tree

.github/workflows/daily-security-observability.lock.yml

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# ADR-40126: Formal Compliance Test Suite for the Monte Carlo Forecast Engine
2+
3+
**Date**: 2026-06-19
4+
**Status**: Draft
5+
6+
## Context
7+
8+
The Monte Carlo forecast engine (`pkg/cli/forecast.go`, `forecast_montecarlo.go`) is governed by a formal specification under `specs/forecast-compliance-fixtures/README.md`, with numbered requirements (R-MC-*, R-FC-*, R-CLI-*) and acceptance tests T-FC-031–T-FC-040. The canonical fixture `run_summary_minimal.json` set `total_effective_tokens` but omitted `total_aic`; because `forecast.go:593` gates on `runAIC ≤ 0 → continue`, every fixture-backed run was silently skipped and produced no Monte Carlo input. This field-name gap between what the fixture published and what the engine reads (`TokenUsage.TotalAIC`) went undetected because no test bound the fixture schema to the engine's read path. The engine's numeric invariants (Bernoulli success collapse, observed-rate and yield formulas, the λ=15 Poisson crossover, the n≥10 reliability threshold, nil-on-degenerate-λ behavior) were likewise unguarded by executable tests mapped to the spec.
9+
10+
## Decision
11+
12+
We will add a formal compliance test suite, `pkg/cli/forecast_compliance_fixtures_formal_test.go`, that maps thirteen predicates (P1–P13) one-to-one onto the formal model in `specs/forecast-compliance-fixtures/README.md`. Each predicate is expressed as a table-driven Go test that exercises the real engine functions (`runMonteCarlo`, `useNormalApproximationForPoisson`, `RunForecast`) and the real fixture files, with the formal cross-reference (TLA+ invariant, F* pre/post condition, or Z3-SMT schema gap) recorded in the test doc comment. We will also fix the fixture by adding `"total_aic": 0.0054`, and P12/P13 explicitly guard against regression of that schema gap. This follows the existing repository convention for formal conformance suites (see [ADR-38166](38166-formal-conformance-test-suite-for-compiler-threat-detection.md)).
13+
14+
## Alternatives Considered
15+
16+
### Alternative 1: Fix the fixture only, no test suite
17+
Add `total_aic` to the fixture and stop. This restores correct behavior but leaves the schema-mapping gap unguarded — the same field could be dropped or renamed again with no failing test. Rejected because the original defect was caused precisely by the absence of a test binding fixture keys to the engine's read path.
18+
19+
### Alternative 2: Generic unit tests without spec mapping
20+
Write conventional unit tests covering the engine's numeric behavior without the explicit P1–P13 predicate-to-specification mapping. This would catch regressions but would not make conformance to the formal model auditable, and would diverge from the established formal-conformance pattern already used elsewhere in the codebase. Rejected for weaker traceability between requirement and test.
21+
22+
## Consequences
23+
24+
### Positive
25+
- The fixture-to-engine field mapping (`total_aic`) is now enforced by P1/P12/P13; the silent-skip defect cannot regress unnoticed.
26+
- Each formal requirement has a named, executable test, giving auditable traceability between `specs/forecast-compliance-fixtures/README.md` and `pkg/cli`.
27+
- Numeric invariants (Poisson crossover at λ=15, n≥10 reliability, degenerate-λ nil handling) are now pinned to exact boundary values.
28+
29+
### Negative
30+
- The test file replicates a small amount of engine logic (e.g. the duration derivation in P10 mirrors `forecast.go:573-574`); the replica must be kept in sync if the engine changes.
31+
- Thirteen predicates add maintenance surface that must track future edits to the formal specification.
32+
33+
### Neutral
34+
- Tests are gated behind `//go:build !integration` and run against in-repo fixtures, so they require no GitHub authentication.
35+
- P11 tolerates non-validation errors from `RunForecast` (e.g. missing auth) to remain runnable in CI sandboxes, asserting only on the days-validation message.
36+
37+
---
38+
39+
*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/27797333197) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*

0 commit comments

Comments
 (0)