TextHumanize 0.32.0 — Whitespace fidelity & guards
Hardens text fidelity in selective modes and adds release guards, following the client-reported minimal=True whitespace bug.
Added
detect_ai_sentences()returnsleading_ws/trailing_wswith a documented offset/textcontract, so consumers can rejoin sentences without losing spacing.- README counter guard in
scripts/dev_check.py— fails when advertised test/module counts drift. - Hypothesis property test — minimal mode never introduces glued sentence boundaries on arbitrary inputs.
- Regression bank — selective modes — entries support
"mode": "minimal"/"only_flagged"; the client whitespace case is now a permanent regression.
Changed
humanize_sentences()preserves original spacing — newlines and paragraph breaks survive (offset-based rejoin instead of single-space join); after-scores are averaged over all result spans.- Docs — the benchmark methodology documents that the default verdict threshold is precision-first (recall ~0.40); recall-first workflows should pick a threshold from
detector_calibration().
Builds on 0.31.1, which fixed the original minimal=True / only_flagged=True space-after-period corruption.