Skip to content

v0.1.0

Choose a tag to compare

@justram justram released this 23 Mar 09:04
· 71 commits to main since this release

pi-serini v0.1.0

Initial public release of pi-serini as a reusable, benchmark-driven pi search-agent workspace.

v0.1.0 establishes the current core product shape: index-driven benchmark execution, BM25-backed agentic search, benchmark-aware evaluation, and reproducible run artifacts.

Highlights

  • Index-driven benchmark and agentic search workflows for:
    • MS MARCO v1 Passage with dl19 and dl20
    • BrowseComp-Plus with q9, q100, q300, and qfull
  • BM25-backed pi search extension with search and document-reading flows over Lucene indexes
  • Shared BM25 RPC execution with single-process, shared-daemon, and sharded shared-daemon launch modes
  • Benchmark-aware retrieval evaluation, judge evaluation, summarization, and Markdown reporting
  • Reproducible run manifests via per-run benchmark_manifest_snapshot.json

Included in this release

Benchmarks

  • browsecomp-plus — default packaged benchmark
  • msmarco-v1-passage — MS MARCO v1 passage support for dl19 and dl20
  • benchmark-template — tiny local end-to-end demo benchmark for development and validation

Platform capabilities

  • Typed benchmark registry and benchmark-aware query/qrels/index resolution
  • Node.js/TypeScript-first orchestration entrypoints under src/orchestration/
  • Managed launch presets and operator surfaces under src/operator/
  • Internal and trec_eval-backed retrieval evaluation paths
  • Judge evaluation with benchmark-aware mode defaults and validation
  • BM25 comparison and tuning tooling

Scope note

This release is intentionally index-driven: benchmark runs execute against prepared Lucene indexes.

Document-ingestion-first indexing workflows built around Anserini IndexCollection are planned next, but are not part of v0.1.0.

Getting started

npm run setup:browsecomp-plus
npm run setup:msmarco-v1-passage

BENCHMARK=msmarco-v1-passage \
QUERY_SET=dl19 \
MODEL=openai-codex/gpt-5.4-mini \
npm run run:benchmark:query-set

See also:

  • README.md
  • CHANGELOG.md
  • docs/running-benchmarks.md
  • docs/evaluation.md