Skip to content

llmci 0.2.0

Choose a tag to compare

@alexminnaar alexminnaar released this 06 Jun 20:31
b8354aa

Major release: CI gate trust, deeper eval quality, safety/red-team, plugin API, and seventeen runnable examples.

Highlights

CI gate hardening

  • Flake resistance (samples_per_example, significance gating)
  • Response caching for direct API targets
  • Cost/token metrics (cost_mean, tokens_*)
  • Portable reports: JUnit, SARIF, JSON, HTML

Eval quality

  • RAG judge (faithfulness, relevance, retrieval metrics)
  • Pairwise judge with position-swap bias control
  • Judge calibration & drift detection (per-criterion support)
  • Output diffs vs baseline in reports
  • Structured-output (JSON Schema) judge

Safety & plugins

  • Safety judge (PII, toxicity, jailbreak)
  • Red-team attack generator (llmci redteam generate)
  • Plugin API: custom judges, metrics, and report sinks

Examples

  • examples/1117, including integrated pre-merge gate with committed baselines

Install: pip install llmci==0.2.0

Full changelog: https://github.com/llmci-cli/llmci/blob/main/CHANGELOG.md#020---2026-06-06