hearsay 0.3.0
The headline: hearsay now turns media into TTS/STT training datasets (hearsay dataset <SOURCE> + a web UI mode), alongside the existing markdown/JSON engine.
New
- Dataset export mode — slice audio on word-level timestamps (never mid-word) into LJSpeech (
metadata.csv), NeMo (manifest.jsonl), and HFaudiofolderlayouts, with adataset_card.mdanddropped.jsonl. - Quality filtering (on by default) + opt-in clipping detection.
- Optional speaker diarization via
hearsay[diarize]—--diarize/--dominant-speaker(single-voice TTS) /--per-speaker. --normalize(two-pass EBU R128) and--pad(edge padding) with a de-click fade for clean clip boundaries.- Resumable combined builds for playlists / feeds.
Notes
- No new required dependency (audio via the ffmpeg hearsay already needs; diarization is the only, opt-in, extra).
- PyPI 0.2.0 (2026-06-14) was a maintenance release of the markdown engine, before the dataset mode existed — hence this is 0.3.0.
Full details in CHANGELOG.md.