v0.3.0
Release focused on keeping pi-serini current with upstream Pi package migration while preserving the benchmark workflow surface.
Added
- Added a BrowseComp-Plus external-run adapter at
src/adapters/import_search_jsonl_run.tsand the package scriptnpm run adapt:search-jsonl-runfor normalizing one-JSON-object-per-line search-session artifacts into native run directories. - Added focused coverage for the external-run importer and response-confidence calibration helpers.
- Added README links from @ricky42613 for the Pi-Serini project page and released BrowseComp-Plus run datasets on Hugging Face.
Changed
- Migrated Pi package dependencies and source imports from
@mariozechner/*to@earendil-works/*. - Updated
@earendil-works/pi-coding-agentand@earendil-works/pi-tuito^0.74.0and refreshedpackage-lock.json. - Replaced Ajv-backed TypeBox validation with TypeBox v1 native compiler APIs while preserving protocol validation behavior and structured error metadata.
- Updated judge-evaluation calibration to use response self-reported confidence against gold-answer correctness.
Fixed
- Fixed benchmark launches against the current Pi CLI by using the explicit-extension-compatible
--no-builtin-toolsbehavior. - Fixed shared-BM25 liveness detection for root-relative log paths.
- Fixed sharded shared-BM25 merge metadata handling so merged runs synthesize canonical merged-level metadata instead of failing on shard-local metadata differences.
- Fixed calibration computation to include a final partial confidence bin.
Upgrade notes
- Install/update Pi to
0.74.0or newer. - Use
@earendil-works/pi-coding-agentand@earendil-works/pi-tuiin extension or SDK imports; the old@mariozechner/*Pi package names are retired upstream.