Release v1.10.0: Multi-Model Corpus Campaign & Loop Gate Promotion · kneelinghorse/agent-vitals

Precision: 0.986 [CI lower: 0.960] (threshold: 0.80)
Recall: 1.000 [CI lower: 0.982] (threshold: 0.75)
Positives: 213 (threshold: 8)

What's New

First detector to meet all promotion thresholds on the expanded 370-trace corpus:

373 traces manually reviewed across 12 model providers: DeepSeek, Claude Sonnet, Claude Haiku, GPT-4o, GPT-4o-mini, Gemini Flash, Llama 70B, Llama 8B, Mistral Large, Mixtral, Qwen 72B, Command-R+
Both research and build workflows covered
20.9% reclassification rate from manual review (59 thrash FPs from segmentation artifacts)

Confabulation: 1 percentage point short (P_lower=0.790 vs 0.800) — likely qualifies with 2 more TP traces
Stuck: Low recall due to loop co-occurrence suppression in engine design
Thrash: Needs segmentation artifact filtering before promotion
Runaway cost: Insufficient positive samples

pip install agent-vitals==1.10.0

Full changelog: CHANGELOG.md