v0.0.2
Evret 0.0.2
Added
- [new-metric] Added ERR@k metric for cascade-style graded relevance evaluation.
- [new-metric] Added RBP@k metric with tunable persistence/user-patience weighting.
- Structured logging utilities: get_logger, configure_logging, and JSON log formatting.
- Added tracing and monitoring notebook
Changed
- design change in evaluation dataset semantics from
relevant_doc_idstowardexpected_answers. - Improved TokenOverlapJudge matching logic, including negation handling and better overlap scoring.
- Reworked quickstart, architecture, dataset-format, metrics, and judge docs.
Full Changelog: v0.0.1...v0.0.2