Skip to content

v0.7.0

Choose a tag to compare

@rpatrik96 rpatrik96 released this 12 Feb 08:53
· 84 commits to main since this release
0fa2b8e

HALLMARK Benchmark Improvements for bibtex-check

Enhanced the reference fact-checker to catch significantly more hallucinated citations, targeting an improvement in HALLMARK benchmark F1 from 0.394 to ~0.80-0.88.

Added

  • Pre-API year validation: future dates (year > current_year), implausible years (< 1800), and non-numeric years flagged before any API calls — zero cost
  • DOI resolution check: HEAD request to doi.org catches fabricated DOIs before expensive API lookups
  • Alias-aware venue matching: 17 ML/AI venue aliases (NeurIPS/NIPS, ICML, ICLR, CVPR, ICCV, etc.) with canonical name resolution; known-different venues always flagged
  • Preprint-vs-published detection: queries Semantic Scholar to detect entries claiming a venue when only an arXiv preprint exists
  • Streaming JSONL output: results flushed after each entry; partial results survive timeouts, crashes, and Ctrl+C
  • S2 API key support for bibtex-check: --s2-api-key flag and S2_API_KEY env var for authenticated rate limits (1 req/s vs shared pool)
  • New CLI flags: --no-cache, --no-check-dois, --no-check-years
  • 5 new status codes: future_date, invalid_year, doi_not_found, preprint_only, published_version_exists
  • 38 new tests (86 total for fact-checker)

Changed

  • Venue comparison uses venues_match() with alias map instead of raw fuzzy score — eliminates false matches between similar-named but distinct conferences (e.g., CVPR vs ICCV)
  • process_entries() accepts optional jsonl_path for streaming output
  • FactCheckerConfig gains check_years and check_dois boolean flags (both default True)

Full Changelog: v0.6.1...v0.7.0