Skip to content

yasbd v0.1.2 - Bugfix release

Choose a tag to compare

@speedyk-005 speedyk-005 released this 01 Jun 21:40
· 125 commits to main since this release

Accuracy-focused release: 84-case golden benchmark, expanded abbreviations, faster regex compilation.

pip install --upgrade yasbd-lib

Added

  • 84-case golden benchmark suite (EN_GOLDEN_DATA.py): Covers abbreviations, ellipsis, contiguous terminators, parentheses, quotes, mixed CJK, decimal times, list markers, and exclamation-safe words. Used to compare all 7 libraries side-by-side.
  • Expanded abbreviations: Dozens of new abbreviations across all categories — reference (eq, ex, pp), date (Tue, Fri, Feb), street (Hwy, Ave, Blvd), title (Prof, Dr, Mr), and more.

Changed

  • Trie-based pattern building: Replaced "|".join() sorting with retrie.Trie for faster, more consistent abbreviation regex generation.
  • Abbreviation redistribution: Shared abbreviations (fr, ing, messrs, mlle, mme, etc.) moved to base class. Language-specific rules now only add their unique abbreviations.
  • Benchmarks rewritten: Cold/warm timing tables updated with real measured values; accuracy table and conclusion added.

Fixed

  • ModuleNotFoundError masking: boundary_detector.py no longer masks unrelated import errors when a language module exists but a sub-dependency is missing.
  • P.M. false positive: All-caps P.M. no longer caught by the acronym pattern (p\.m and a\.m explicitly excluded).