TextHumanize 0.31.0 — Provable quality
A provable-quality wave: calibrate the detector, broaden the corpus, and make watermark forensics fast.
Added
- Detector calibration —
detector_calibration()sweeps decision thresholds over the labelled eval corpus and reports precision/recall/F1/accuracy per threshold, the best-F1 threshold (overall and per language), heavily-edited-AI recall, and an optional agreement comparison against externally supplied scores (e.g. GPTZero). Alsoscripts/calibrate_detector.py. (It surfaces that the 0.5 default favours precision over recall; best-F1 sits near 0.30.) - Bigger eval corpus — 6 languages (EN/RU/UK/DE/FR/ES) × 10 domains, so the leaderboard, release snapshot and calibration are more representative.
- Statistical watermark coverage — a statistical-watermark sample makes
watermark_eval()exercise both Unicode and statistical branches. - Brand voice — enforced bans —
brand_voice_lock()repairs forbidden substitutes frombanned_replacementsback to the canonical term.
Changed
- Faster watermark forensics —
watermark_report()on 100k characters dropped from ~44s to under 0.5s (token-capped statistical scan + diff short-circuit).
Fixed
- Statistical false positives — a marginal
possible_watermarkverdict (z 1.5-2.0) no longer flipshas_watermarkson ordinary prose.
Full details in CHANGELOG.md.