Skip to content

Releases: ForeverAngry/rig-retrieval-evals

v0.4.0

07 Jun 14:48

Choose a tag to compare

Added

  • retriever::Retriever trait plus VectorStoreRetriever adapter and the
    score_retriever / retrieve_all drivers, so backends that are not vector
    stores (lexical / BM25 / hybrid rerankers, remote search APIs) can be scored
    with the same IR metrics. RetrievalHarness now delegates to this driver.
  • Qrels::from_beir loads a downloaded BEIR / BRIGHT dataset
    (queries.jsonl + qrels/<split>.tsv, both 3- and 4-column layouts).
  • synthetic module: deterministic, seeded "needle in a haystack" corpus +
    qrels generator for reproducible benchmarks and fixture-free tests.
  • RetrievalHarness::with_bootstrap and MultiReport::with_bootstrap
    attach percentile-bootstrap confidence intervals to every metric;
    MultiReport::delta_markdown renders a head-to-head delta table.

v0.3.2

01 Jun 17:14
d373013

Choose a tag to compare

Tests

  • Cover ingestion graph knowledge gain interaction (#7)

v0.3.1

28 May 22:32
eed1def

Choose a tag to compare

Documentation

  • Align docs/decisions.md heading + AGENTS feature list with v0.2.0+ (#2)

v0.3.0

28 May 17:26
8ff2cfd

Choose a tag to compare

Build

  • Drop path override on rig-memvid dev-dep so release-plz can resolve crates.io fallback

CI

  • Add release-plz workflow and config

Added

  • Bootstrap confidence intervals on MetricReport::mean: new MetricCi
    type plus MetricReport::bootstrap_ci(iterations, level, seed) and
    builder MetricReport::with_bootstrap_ci(...). Uses a deterministic
    SplitMix64 stream (no rand dependency) so the same seed reproduces
    the same interval. MetricCi is #[serde(default, skip_serializing_if = "Option::is_none")] so existing JSON reports
    stay schema-compatible. MetricDelta carries current_ci /
    baseline_ci alongside the existing mean delta.
  • Non-zero CI exit support on ReportDiff: is_clean(&gate) -> bool
    and exit_code(&gate) -> i32 (0 = pass, 1 = regression). Lets eval
    binaries gate CI with std::process::exit(diff.exit_code(&gate))
    without rebuilding the regression-walking logic at every call site.
  • observe feature: emit MultiReport and ReportDiff as
    rig-tap-compatible JSON envelopes on the rig_tap tracing target
    without depending on rig-tap. Introduces EvalEnvelope + EvalKind
    (eval.retrieval_report, eval.regression_diff),
    report_envelopes, diff_envelopes, and tracing helpers
    emit_report / emit_diff. Each event carries the full JSON envelope
    plus stable scalar rig_tap.kind / rig_tap.metric /
    rig_tap.regressed / rig_tap.conversation_id attributes so existing
    OpenTelemetry collectors route eval reports the same way they route
    prompt and tool events.
  • FreshnessReport and FreshnessQueryRollup for rolling
    StalenessReport / ConflictReport outputs into MultiReport.
    MultiReport::with_freshness attaches the dataset/per-query rollup, while
    with_freshness_metrics also appends score-like
    freshness.stale_free_rate@k and freshness.conflict_free_rate@k metric
    rows so freshness regressions trip the existing RegressionGate / diff
    path.
  • Opt-in MinHash-style chunk near-duplicate linting via
    NearDuplicateLintConfig, NearDuplicatePair,
    ChunkStats::near_duplicate_pairs, and
    ChunkLintWarning::NearDuplicateChunks. The default remains disabled so
    existing ingestion gates keep their current warning set until hosts opt in.