Skip to content

v1.8.0

Choose a tag to compare

@kneelinghorse kneelinghorse released this 13 Feb 21:19
· 33 commits to main since this release

Agent Vitals v1.8.0

This release finalizes av-29 detection credibility and release integrity work.

Highlights

  • Confabulation is now a first-class backtest detector metric (confabulation_at labels + P/R/F1 reporting).
  • Real confabulation corpus expanded with manually labeled traces (AV29.C01, AV29.C05) and relabeled AV26.F02.
  • Backtest CI gate added (scripts/ci_backtest.py) with blocking composite gate (vitals.any) and JSON artifact output.
  • CI workflow now runs backtest on Python 3.11 and uploads backtest-results.json.

Validation Snapshot

  • Combined three-way corpus size: 49 traces (33 synthetic + 16 real).
  • Composite vitals.any: P/R/F1 = 1.000 / 1.000 / 1.000.
  • Confabulation (first-class): P/R/F1 = 1.000 / 1.000 / 1.000.

Notes

  • Retro tags for v1.6.0 and v1.7.0 were intentionally skipped by operator decision for this release cycle.
  • Detailed backtest report: docs/vitals/av29-backtest-report.md.