Skip to content

Releases: llmci-cli/llmci

llmci 0.4.1

Choose a tag to compare

@alexminnaar alexminnaar released this 07 Jun 18:49

Patch release: proxy cost pricing for direct targets.

Added

  • settings.price_overrides — per-model input_per_token / output_per_token USD rates when litellm cannot compute cost (internal LLM proxies).

Install: pip install llmci==0.4.1

See CHANGELOG.

llmci 0.4.0

Choose a tag to compare

@alexminnaar alexminnaar released this 06 Jun 21:08

Cross-provider migration, few-shot strategy, and PII allow-list for safety gates.

Added

  • Cross-provider migrationllmci migrate accepts provider/model refs with per-side base URLs
  • Few-shot migration strategy--strategy few_shot inlines train examples as demos
  • PII allow-listpii_leakage criteria accept allow_list (literal or regex: entries)

See CHANGELOG for full details.

llmci 0.3.0

Choose a tag to compare

@alexminnaar alexminnaar released this 06 Jun 20:45

Post-0.2.0 follow-ups: deeper gate trust, RAG faithfulness, red-team mutation, and multimodal evals.

Highlights

  • Composite judge caching — agent outcome/trajectory LLM calls share .llmci/cache/judges/
  • Calibration trend history--save-snapshot appends to a history log with trend table
  • Gate warningsllmci run warns on missing baselines or significance misconfig
  • Per-claim faithfulness — RAG decompose_claims: true for atomic grounding checks
  • LLM attack mutationllmci redteam generate --mutate for broader adversarial coverage
  • Multimodal targetsimages / audio fields on dataset rows for direct API evals
  • Example 18 — multimodal vision eval (examples/18-multimodal-vision)

Install: pip install llmci==0.3.0

Full changelog: https://github.com/llmci-cli/llmci/blob/main/CHANGELOG.md#030---2026-06-06

llmci 0.2.0

Choose a tag to compare

@alexminnaar alexminnaar released this 06 Jun 20:31
b8354aa

Major release: CI gate trust, deeper eval quality, safety/red-team, plugin API, and seventeen runnable examples.

Highlights

CI gate hardening

  • Flake resistance (samples_per_example, significance gating)
  • Response caching for direct API targets
  • Cost/token metrics (cost_mean, tokens_*)
  • Portable reports: JUnit, SARIF, JSON, HTML

Eval quality

  • RAG judge (faithfulness, relevance, retrieval metrics)
  • Pairwise judge with position-swap bias control
  • Judge calibration & drift detection (per-criterion support)
  • Output diffs vs baseline in reports
  • Structured-output (JSON Schema) judge

Safety & plugins

  • Safety judge (PII, toxicity, jailbreak)
  • Red-team attack generator (llmci redteam generate)
  • Plugin API: custom judges, metrics, and report sinks

Examples

  • examples/1117, including integrated pre-merge gate with committed baselines

Install: pip install llmci==0.2.0

Full changelog: https://github.com/llmci-cli/llmci/blob/main/CHANGELOG.md#020---2026-06-06

llmci 0.1.9

Choose a tag to compare

@alexminnaar alexminnaar released this 01 Jun 00:37

Added

  • Release metadata consistency check for package version, action install version, and changelog links.
  • Manual real-LLM example workflow for API-key-dependent examples.
  • GitHub Action inputs for explicit config paths, discovered config runs, and baseline updates.

Fixed

  • Duplicate llmci PR comments from parallel matrix jobs are merged into one canonical comment and stale duplicates are cleaned up.

Install: pip install llmci==0.1.9

llmci 0.1.8

Choose a tag to compare

@alexminnaar alexminnaar released this 31 May 21:29
8a619ac

Added

  • --include and --exclude filters for llmci discover and llmci run --all.

Changed

  • Dogfood matrix evals now use LLMCI_REPORT_SLICE so PR comments merge into one combined report.

llmci 0.1.7

Choose a tag to compare

@alexminnaar alexminnaar released this 31 May 21:11
3340b10

Added

  • llmci discover to list config files in a repository.
  • llmci run --all to run every discovered config.

llmci 0.1.6

Choose a tag to compare

@alexminnaar alexminnaar released this 31 May 20:45
0190049

Added

  • llmci run --config <path> to run evals from an alternate config file.

llmci 0.1.5

Choose a tag to compare

@alexminnaar alexminnaar released this 25 May 23:28

Fixed

  • S3 dataset URI validation runs before the optional boto3 import.
  • PyPI publish workflow grants contents: read so checkout works alongside trusted publishing.

Install: pip install llmci==0.1.5

llmci 0.1.4

Choose a tag to compare

@alexminnaar alexminnaar released this 25 May 23:24

Added

  • Remote eval datasetsdataset accepts s3:// and https:// URIs (string or {source, cache}). S3 requires pip install 'llmci[s3]'. Cached under .llmci/cache/datasets/ by default.

Changed

  • Repository and package metadata URLs updated for the llmci-cli GitHub organization.

Install: pip install llmci==0.1.4