Skip to content

TextHumanize 0.31.0 — Provable quality

Choose a tag to compare

@ksanyok ksanyok released this 08 Jun 21:55
· 7 commits to main since this release

A provable-quality wave: calibrate the detector, broaden the corpus, and make watermark forensics fast.

Added

  • Detector calibrationdetector_calibration() sweeps decision thresholds over the labelled eval corpus and reports precision/recall/F1/accuracy per threshold, the best-F1 threshold (overall and per language), heavily-edited-AI recall, and an optional agreement comparison against externally supplied scores (e.g. GPTZero). Also scripts/calibrate_detector.py. (It surfaces that the 0.5 default favours precision over recall; best-F1 sits near 0.30.)
  • Bigger eval corpus — 6 languages (EN/RU/UK/DE/FR/ES) × 10 domains, so the leaderboard, release snapshot and calibration are more representative.
  • Statistical watermark coverage — a statistical-watermark sample makes watermark_eval() exercise both Unicode and statistical branches.
  • Brand voice — enforced bansbrand_voice_lock() repairs forbidden substitutes from banned_replacements back to the canonical term.

Changed

  • Faster watermark forensicswatermark_report() on 100k characters dropped from ~44s to under 0.5s (token-capped statistical scan + diff short-circuit).

Fixed

  • Statistical false positives — a marginal possible_watermark verdict (z 1.5-2.0) no longer flips has_watermarks on ordinary prose.

Full details in CHANGELOG.md.