TextHumanize 0.32.0 — Whitespace fidelity & guards

ksanyok released this 11 Jun 09:51

· 4 commits to main since this release

c53a6fd

Hardens text fidelity in selective modes and adds release guards, following the client-reported minimal=True whitespace bug.

Added

detect_ai_sentences() returns leading_ws/trailing_ws with a documented offset/text contract, so consumers can rejoin sentences without losing spacing.
README counter guard in scripts/dev_check.py — fails when advertised test/module counts drift.
Hypothesis property test — minimal mode never introduces glued sentence boundaries on arbitrary inputs.
Regression bank — selective modes — entries support "mode": "minimal"/"only_flagged"; the client whitespace case is now a permanent regression.

Changed

humanize_sentences() preserves original spacing — newlines and paragraph breaks survive (offset-based rejoin instead of single-space join); after-scores are averaged over all result spans.
Docs — the benchmark methodology documents that the default verdict threshold is precision-first (recall ~0.40); recall-first workflows should pick a threshold from detector_calibration().

Builds on 0.31.1, which fixed the original minimal=True / only_flagged=True space-after-period corruption.

Assets 2