llmci 0.2.0
Major release: CI gate trust, deeper eval quality, safety/red-team, plugin API, and seventeen runnable examples.
Highlights
CI gate hardening
- Flake resistance (
samples_per_example, significance gating) - Response caching for direct API targets
- Cost/token metrics (
cost_mean,tokens_*) - Portable reports: JUnit, SARIF, JSON, HTML
Eval quality
- RAG judge (faithfulness, relevance, retrieval metrics)
- Pairwise judge with position-swap bias control
- Judge calibration & drift detection (per-criterion support)
- Output diffs vs baseline in reports
- Structured-output (JSON Schema) judge
Safety & plugins
- Safety judge (PII, toxicity, jailbreak)
- Red-team attack generator (
llmci redteam generate) - Plugin API: custom judges, metrics, and report sinks
Examples
examples/11–17, including integrated pre-merge gate with committed baselines
Install: pip install llmci==0.2.0
Full changelog: https://github.com/llmci-cli/llmci/blob/main/CHANGELOG.md#020---2026-06-06