Skip to content

v0.3.0 — dataset export mode + clean clip boundaries

Latest

Choose a tag to compare

@mudassar531 mudassar531 released this 14 Jun 19:52
· 3 commits to main since this release
217db92

hearsay 0.3.0

The headline: hearsay now turns media into TTS/STT training datasets (hearsay dataset <SOURCE> + a web UI mode), alongside the existing markdown/JSON engine.

New

  • Dataset export mode — slice audio on word-level timestamps (never mid-word) into LJSpeech (metadata.csv), NeMo (manifest.jsonl), and HF audiofolder layouts, with a dataset_card.md and dropped.jsonl.
  • Quality filtering (on by default) + opt-in clipping detection.
  • Optional speaker diarization via hearsay[diarize]--diarize / --dominant-speaker (single-voice TTS) / --per-speaker.
  • --normalize (two-pass EBU R128) and --pad (edge padding) with a de-click fade for clean clip boundaries.
  • Resumable combined builds for playlists / feeds.

Notes

  • No new required dependency (audio via the ffmpeg hearsay already needs; diarization is the only, opt-in, extra).
  • PyPI 0.2.0 (2026-06-14) was a maintenance release of the markdown engine, before the dataset mode existed — hence this is 0.3.0.

Full details in CHANGELOG.md.