yasbd-lib v0.2.0
What's Changed
🚀 Added
- Add configurable StreamCleaner cleanup stages by @speedyk-005 in #41
_post_process_boundarieshook: Added language-aware sentence boundary correction without modifying the regex core pipeline (PR #39).
⚙️ Changed
- Regex architecture refactor in
base.py: Promoted local regex patterns into class-level attributes for consistency and reuse by @speedyk-005. STREET_ABBRVSmerged intoMID_SENTENCE_ABBRVS: Now strictly non-splitting; English restores boundary logic via post-processing hook by @speedyk-005.COMMON_ORG_NOUNSrenamed toORG_PROPER_NOUNSand restricted to proper nouns only by @speedyk-005.- Geopolitical abbreviations normalization: Standardized casing across languages for consistent detection behavior by @speedyk-005.
🐛 Fixed
- Fix Spanish sentence boundaries (#31) by @JheanLL in #31
- Add opening bracket to reference abbreviation lookahead (#35) by @Jah-yee in #35
- Fix false negative for Spanish 'ave' due to street abbrv inheritance (#37) by @JheanLL in #37
- Fix sentence splitting after a.m./p.m. before date tokens (#40) by @Rajesh270712 in #40
- Fix sentence splitting after mixed-case scientific units (#42) by @Rajesh270712 in #42
- Fix/heading aware sbd (#44) by @speedyk-005 in #44
- Japanese over-matching boundary logic: Removed invalid
\bdependency in CJK context by @speedyk-005. - Time-date pipeline cleanup (English-specific logic): Ensures time/date handling is isolated to English rules by @Rajesh270712.
New Contributors
- @JheanLL made their first contribution in #31
- @Jah-yee made their first contribution in #35
- @Rajesh270712 made their first contribution in #40
Full Changelog: v0.1.3...v0.2.0