TextHumanize 0.31.0 — Provable quality

ksanyok released this 08 Jun 21:55

· 7 commits to main since this release

b4519c7

A provable-quality wave: calibrate the detector, broaden the corpus, and make watermark forensics fast.

Added

Detector calibration — detector_calibration() sweeps decision thresholds over the labelled eval corpus and reports precision/recall/F1/accuracy per threshold, the best-F1 threshold (overall and per language), heavily-edited-AI recall, and an optional agreement comparison against externally supplied scores (e.g. GPTZero). Also scripts/calibrate_detector.py. (It surfaces that the 0.5 default favours precision over recall; best-F1 sits near 0.30.)
Bigger eval corpus — 6 languages (EN/RU/UK/DE/FR/ES) × 10 domains, so the leaderboard, release snapshot and calibration are more representative.
Statistical watermark coverage — a statistical-watermark sample makes watermark_eval() exercise both Unicode and statistical branches.
Brand voice — enforced bans — brand_voice_lock() repairs forbidden substitutes from banned_replacements back to the canonical term.

Changed

Faster watermark forensics — watermark_report() on 100k characters dropped from ~44s to under 0.5s (token-capped statistical scan + diff short-circuit).

Fixed

Statistical false positives — a marginal possible_watermark verdict (z 1.5-2.0) no longer flips has_watermarks on ordinary prose.

Full details in CHANGELOG.md.

Assets 2