v1.8.0
Agent Vitals v1.8.0
This release finalizes av-29 detection credibility and release integrity work.
Highlights
- Confabulation is now a first-class backtest detector metric (
confabulation_atlabels + P/R/F1 reporting). - Real confabulation corpus expanded with manually labeled traces (
AV29.C01,AV29.C05) and relabeledAV26.F02. - Backtest CI gate added (
scripts/ci_backtest.py) with blocking composite gate (vitals.any) and JSON artifact output. - CI workflow now runs backtest on Python 3.11 and uploads
backtest-results.json.
Validation Snapshot
- Combined three-way corpus size: 49 traces (33 synthetic + 16 real).
- Composite
vitals.any: P/R/F1 = 1.000 / 1.000 / 1.000. - Confabulation (first-class): P/R/F1 = 1.000 / 1.000 / 1.000.
Notes
- Retro tags for
v1.6.0andv1.7.0were intentionally skipped by operator decision for this release cycle. - Detailed backtest report:
docs/vitals/av29-backtest-report.md.