Skip to content

TextHumanize 0.32.0 — Whitespace fidelity & guards

Choose a tag to compare

@ksanyok ksanyok released this 11 Jun 09:51
· 4 commits to main since this release

Hardens text fidelity in selective modes and adds release guards, following the client-reported minimal=True whitespace bug.

Added

  • detect_ai_sentences() returns leading_ws/trailing_ws with a documented offset/text contract, so consumers can rejoin sentences without losing spacing.
  • README counter guard in scripts/dev_check.py — fails when advertised test/module counts drift.
  • Hypothesis property test — minimal mode never introduces glued sentence boundaries on arbitrary inputs.
  • Regression bank — selective modes — entries support "mode": "minimal"/"only_flagged"; the client whitespace case is now a permanent regression.

Changed

  • humanize_sentences() preserves original spacing — newlines and paragraph breaks survive (offset-based rejoin instead of single-space join); after-scores are averaged over all result spans.
  • Docs — the benchmark methodology documents that the default verdict threshold is precision-first (recall ~0.40); recall-first workflows should pick a threshold from detector_calibration().

Builds on 0.31.1, which fixed the original minimal=True / only_flagged=True space-after-period corruption.