Releases: ForeverAngry/rig-retrieval-evals
Releases · ForeverAngry/rig-retrieval-evals
v0.4.0
Added
retriever::Retrievertrait plusVectorStoreRetrieveradapter and the
score_retriever/retrieve_alldrivers, so backends that are not vector
stores (lexical / BM25 / hybrid rerankers, remote search APIs) can be scored
with the same IR metrics.RetrievalHarnessnow delegates to this driver.Qrels::from_beirloads a downloaded BEIR / BRIGHT dataset
(queries.jsonl+qrels/<split>.tsv, both 3- and 4-column layouts).syntheticmodule: deterministic, seeded "needle in a haystack" corpus +
qrels generator for reproducible benchmarks and fixture-free tests.RetrievalHarness::with_bootstrapandMultiReport::with_bootstrap
attach percentile-bootstrap confidence intervals to every metric;
MultiReport::delta_markdownrenders a head-to-head delta table.
v0.3.2
v0.3.1
v0.3.0
Build
- Drop path override on rig-memvid dev-dep so release-plz can resolve crates.io fallback
CI
- Add release-plz workflow and config
Added
- Bootstrap confidence intervals on
MetricReport::mean: newMetricCi
type plusMetricReport::bootstrap_ci(iterations, level, seed)and
builderMetricReport::with_bootstrap_ci(...). Uses a deterministic
SplitMix64 stream (noranddependency) so the same seed reproduces
the same interval.MetricCiis#[serde(default, skip_serializing_if = "Option::is_none")]so existing JSON reports
stay schema-compatible.MetricDeltacarriescurrent_ci/
baseline_cialongside the existing mean delta. - Non-zero CI exit support on
ReportDiff:is_clean(&gate) -> bool
andexit_code(&gate) -> i32(0 = pass, 1 = regression). Lets eval
binaries gate CI withstd::process::exit(diff.exit_code(&gate))
without rebuilding the regression-walking logic at every call site. observefeature: emitMultiReportandReportDiffas
rig-tap-compatible JSON envelopes on therig_taptracing target
without depending onrig-tap. IntroducesEvalEnvelope+EvalKind
(eval.retrieval_report,eval.regression_diff),
report_envelopes,diff_envelopes, and tracing helpers
emit_report/emit_diff. Each event carries the full JSON envelope
plus stable scalarrig_tap.kind/rig_tap.metric/
rig_tap.regressed/rig_tap.conversation_idattributes so existing
OpenTelemetry collectors route eval reports the same way they route
prompt and tool events.FreshnessReportandFreshnessQueryRollupfor rolling
StalenessReport/ConflictReportoutputs intoMultiReport.
MultiReport::with_freshnessattaches the dataset/per-query rollup, while
with_freshness_metricsalso appends score-like
freshness.stale_free_rate@kandfreshness.conflict_free_rate@kmetric
rows so freshness regressions trip the existingRegressionGate/ diff
path.- Opt-in MinHash-style chunk near-duplicate linting via
NearDuplicateLintConfig,NearDuplicatePair,
ChunkStats::near_duplicate_pairs, and
ChunkLintWarning::NearDuplicateChunks. The default remains disabled so
existing ingestion gates keep their current warning set until hosts opt in.