Skip to content

v1.11.0

tagged this 14 Mar 21:01
AV-31 sprint: 373 manually-reviewed traces across 12 model providers.
Loop detector promoted to first hard CI gate (P=0.986 [0.960],
R=1.000 [0.982] on 370-trace expanded corpus).

- Bundle av31_reviewed corpus (289 traces) for CI validation
- Update ci_backtest.py to load av31 alongside synth + real corpora
- Version bump 1.9.0 → 1.10.0
- CHANGELOG with full validation metrics and gap analysis

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Assets 2
Loading