Skip to content

yasbd-lib v0.2.0

Choose a tag to compare

@github-actions github-actions released this 04 Jun 23:59
· 69 commits to main since this release

What's Changed

🚀 Added

  • Add configurable StreamCleaner cleanup stages by @speedyk-005 in #41
  • _post_process_boundaries hook: Added language-aware sentence boundary correction without modifying the regex core pipeline (PR #39).

⚙️ Changed

  • Regex architecture refactor in base.py: Promoted local regex patterns into class-level attributes for consistency and reuse by @speedyk-005.
  • STREET_ABBRVS merged into MID_SENTENCE_ABBRVS: Now strictly non-splitting; English restores boundary logic via post-processing hook by @speedyk-005.
  • COMMON_ORG_NOUNS renamed to ORG_PROPER_NOUNS and restricted to proper nouns only by @speedyk-005.
  • Geopolitical abbreviations normalization: Standardized casing across languages for consistent detection behavior by @speedyk-005.

🐛 Fixed

  • Fix Spanish sentence boundaries (#31) by @JheanLL in #31
  • Add opening bracket to reference abbreviation lookahead (#35) by @Jah-yee in #35
  • Fix false negative for Spanish 'ave' due to street abbrv inheritance (#37) by @JheanLL in #37
  • Fix sentence splitting after a.m./p.m. before date tokens (#40) by @Rajesh270712 in #40
  • Fix sentence splitting after mixed-case scientific units (#42) by @Rajesh270712 in #42
  • Fix/heading aware sbd (#44) by @speedyk-005 in #44
  • Japanese over-matching boundary logic: Removed invalid \b dependency in CJK context by @speedyk-005.
  • Time-date pipeline cleanup (English-specific logic): Ensures time/date handling is isolated to English rules by @Rajesh270712.

New Contributors

Full Changelog: v0.1.3...v0.2.0