Skip to content

v0.10.0

Choose a tag to compare

@rpatrik96 rpatrik96 released this 28 May 06:39
af83782

Single-source surname keys + self-checking oracles to prevent comparison asymmetry

The recurring "subtle wrong verdict" bugs (particle surnames, DBLP homonym suffix, venue collapsing) shared one root cause: comparison asymmetry — the BibTeX-entry side and the API-record side reduced the same surname/venue through different normalization, producing false AUTHOR_MISMATCH / HALLUCINATED. This release makes the asymmetry structurally impossible and adds self-checking oracles.

Changed

  • PublishedRecord.surname_keys() is now the single source of truth for record-side surname keys, routing each family through the same last_name_from_person the entry side uses. All 7 comparison sites consume it (including FieldFiller and WorkingPaperVerifier, which were still keying the record side raw). The drift-prone _record_surnames helper was removed. No thresholds, weights, or verdict logic changed.

Added

  • PublishedRecord.canonical_venue — single record-side venue accessor mirroring surname_keys.

Fixed

  • last_name_from_person strips a trailing 4-digit DBLP homonym suffix ("Sun 0020"sun) at the key level — defense-in-depth alongside the existing ingestion strip, guarded so an all-digits name is never emptied.

Tests

  • +54 tests across two new oracles: tests/test_record_roundtrip.py (a record→entry must verify against itself with zero field mismatches and clear MATCH_THRESHOLD) and tests/test_metamorphic_symmetry.py (each past bug stated as an invariance: name-order, diacritics, DBLP suffix, particle placement, score symmetry, sibling-journal non-collapse).
  • Full suite: 839 passed, 1 skipped (785 baseline + 54 new, zero regressions).

Full changelog: v0.9.2...v0.10.0