v0.7.0
HALLMARK Benchmark Improvements for bibtex-check
Enhanced the reference fact-checker to catch significantly more hallucinated citations, targeting an improvement in HALLMARK benchmark F1 from 0.394 to ~0.80-0.88.
Added
- Pre-API year validation: future dates (
year > current_year), implausible years (< 1800), and non-numeric years flagged before any API calls — zero cost - DOI resolution check: HEAD request to
doi.orgcatches fabricated DOIs before expensive API lookups - Alias-aware venue matching: 17 ML/AI venue aliases (NeurIPS/NIPS, ICML, ICLR, CVPR, ICCV, etc.) with canonical name resolution; known-different venues always flagged
- Preprint-vs-published detection: queries Semantic Scholar to detect entries claiming a venue when only an arXiv preprint exists
- Streaming JSONL output: results flushed after each entry; partial results survive timeouts, crashes, and Ctrl+C
- S2 API key support for
bibtex-check:--s2-api-keyflag andS2_API_KEYenv var for authenticated rate limits (1 req/s vs shared pool) - New CLI flags:
--no-cache,--no-check-dois,--no-check-years - 5 new status codes:
future_date,invalid_year,doi_not_found,preprint_only,published_version_exists - 38 new tests (86 total for fact-checker)
Changed
- Venue comparison uses
venues_match()with alias map instead of raw fuzzy score — eliminates false matches between similar-named but distinct conferences (e.g., CVPR vs ICCV) process_entries()accepts optionaljsonl_pathfor streaming outputFactCheckerConfiggainscheck_yearsandcheck_doisboolean flags (both defaultTrue)
Full Changelog: v0.6.1...v0.7.0