Releases · ForeverAngry/rig-retrieval-evals · GitHub

07 Jun 14:48

v0.4.0 Latest

Latest

Added

retriever::Retriever trait plus VectorStoreRetriever adapter and the
score_retriever / retrieve_all drivers, so backends that are not vector
stores (lexical / BM25 / hybrid rerankers, remote search APIs) can be scored
with the same IR metrics. RetrievalHarness now delegates to this driver.
Qrels::from_beir loads a downloaded BEIR / BRIGHT dataset
(queries.jsonl + qrels/<split>.tsv, both 3- and 4-column layouts).
synthetic module: deterministic, seeded "needle in a haystack" corpus +
qrels generator for reproducible benchmarks and fixture-free tests.
RetrievalHarness::with_bootstrap and MultiReport::with_bootstrap
attach percentile-bootstrap confidence intervals to every metric;
MultiReport::delta_markdown renders a head-to-head delta table.

Assets 2

01 Jun 17:14

v0.3.2

Tests

Cover ingestion graph knowledge gain interaction (#7)

Assets 2

28 May 22:32

v0.3.1

Documentation

Align docs/decisions.md heading + AGENTS feature list with v0.2.0+ (#2)

Assets 2

28 May 17:26

v0.3.0

Build

Drop path override on rig-memvid dev-dep so release-plz can resolve crates.io fallback

CI

Add release-plz workflow and config

Added

Bootstrap confidence intervals on MetricReport::mean: new MetricCi
type plus MetricReport::bootstrap_ci(iterations, level, seed) and
builder MetricReport::with_bootstrap_ci(...). Uses a deterministic
SplitMix64 stream (no rand dependency) so the same seed reproduces
the same interval. MetricCi is #[serde(default, skip_serializing_if = "Option::is_none")] so existing JSON reports
stay schema-compatible. MetricDelta carries current_ci /
baseline_ci alongside the existing mean delta.
Non-zero CI exit support on ReportDiff: is_clean(&gate) -> bool
and exit_code(&gate) -> i32 (0 = pass, 1 = regression). Lets eval
binaries gate CI with std::process::exit(diff.exit_code(&gate))
without rebuilding the regression-walking logic at every call site.
observe feature: emit MultiReport and ReportDiff as
rig-tap-compatible JSON envelopes on the rig_tap tracing target
without depending on rig-tap. Introduces EvalEnvelope + EvalKind
(eval.retrieval_report, eval.regression_diff),
report_envelopes, diff_envelopes, and tracing helpers
emit_report / emit_diff. Each event carries the full JSON envelope
plus stable scalar rig_tap.kind / rig_tap.metric /
rig_tap.regressed / rig_tap.conversation_id attributes so existing
OpenTelemetry collectors route eval reports the same way they route
prompt and tool events.
FreshnessReport and FreshnessQueryRollup for rolling
StalenessReport / ConflictReport outputs into MultiReport.
MultiReport::with_freshness attaches the dataset/per-query rollup, while
with_freshness_metrics also appends score-like
freshness.stale_free_rate@k and freshness.conflict_free_rate@k metric
rows so freshness regressions trip the existing RegressionGate / diff
path.
Opt-in MinHash-style chunk near-duplicate linting via
NearDuplicateLintConfig, NearDuplicatePair,
ChunkStats::near_duplicate_pairs, and
ChunkLintWarning::NearDuplicateChunks. The default remains disabled so
existing ingestion gates keep their current warning set until hosts opt in.

Assets 2