yasbd v0.1.2 - Bugfix release
Accuracy-focused release: 84-case golden benchmark, expanded abbreviations, faster regex compilation.
pip install --upgrade yasbd-libAdded
- 84-case golden benchmark suite (
EN_GOLDEN_DATA.py): Covers abbreviations, ellipsis, contiguous terminators, parentheses, quotes, mixed CJK, decimal times, list markers, and exclamation-safe words. Used to compare all 7 libraries side-by-side. - Expanded abbreviations: Dozens of new abbreviations across all categories — reference (
eq,ex,pp), date (Tue,Fri,Feb), street (Hwy,Ave,Blvd), title (Prof,Dr,Mr), and more.
Changed
- Trie-based pattern building: Replaced
"|".join()sorting withretrie.Triefor faster, more consistent abbreviation regex generation. - Abbreviation redistribution: Shared abbreviations (
fr,ing,messrs,mlle,mme, etc.) moved to base class. Language-specific rules now only add their unique abbreviations. - Benchmarks rewritten: Cold/warm timing tables updated with real measured values; accuracy table and conclusion added.
Fixed
- ModuleNotFoundError masking:
boundary_detector.pyno longer masks unrelated import errors when a language module exists but a sub-dependency is missing. - P.M. false positive: All-caps
P.M.no longer caught by the acronym pattern (p\.manda\.mexplicitly excluded).