v2.1.0 — MLOps evaluation layer (RAGAS + CI regression gate + GoldenRunner)
What's new
ragfallback now ships a complete MLOps evaluation layer — something most RAG libraries don't include at all.
ragfallback/mlops/ — new package
GoldenRunner
Runs your retrieval pipeline against a labeled golden dataset (JSON file or list[dict]), tracks per-sample latency, computes recall@3, recall@5, and P95 latency across all samples. Fully async via asyncio.gather.
RagasHook
Wraps RAGAS evaluation — faithfulness, answer relevance, context precision, context recall. Falls back to heuristic scoring if ragas is not installed. No crash, logged warning only.
BaselineRegistry
Stores metric snapshots per dataset in a committed JSON file. compare_or_fail() raises RegressionError if any quality metric drops more than 5%, or P95 latency spikes more than 12% vs the stored baseline.
QuerySimulator
Generates adversarial query mixes from any base query set:
short_keyword— first 2 content words onlylong_nl— expanded with verbose instruction prefixambiguous— proper nouns strippedout_of_domain— completely unrelated topic injection
simulate_unhappy_paths() produces all 4 types for every input query (4× expansion).
MLflowLogger
Logs all GoldenReport fields as MLflow metrics and params. No-op if mlflow is not installed.
generate_locustfile(output_path, endpoint)
Writes a ready-to-run Locust load test file simulating realistic RAG traffic — short keyword (40%), long NL (20%), out-of-domain (10%).
CI regression gate
A new mlops-regression-gate job runs on every push to main:
- Builds golden dataset from SQuAD (CC BY-SA 4.0, no API key needed)
- Indexes passages in ChromaDB using
all-MiniLM-L6-v2(local, no API key) - Runs
GoldenRunnerasync across 20 samples - Calls
compare_or_fail()against committedexamples/baselines.json - Exits
0(pass) or1(regression detected)
Bug fixes
recall_at_know counts distinct relevant docs in top-k so duplicates cannot push recall above1.0BaselineRegistry.compare_or_failaccepts a separatelatency_thresholdparameter (default0.12) for looser P95 gating in noisy CI environments
Install
pip install ragfallback[mlops]python examples/build_golden_dataset.py
python examples/ci_regression_gate.pyFull changelog
See CHANGELOG.md