Skip to content

Releases: mudassar531/hearsay

v0.3.0 — dataset export mode + clean clip boundaries

14 Jun 19:52
217db92

Choose a tag to compare

hearsay 0.3.0

The headline: hearsay now turns media into TTS/STT training datasets (hearsay dataset <SOURCE> + a web UI mode), alongside the existing markdown/JSON engine.

New

  • Dataset export mode — slice audio on word-level timestamps (never mid-word) into LJSpeech (metadata.csv), NeMo (manifest.jsonl), and HF audiofolder layouts, with a dataset_card.md and dropped.jsonl.
  • Quality filtering (on by default) + opt-in clipping detection.
  • Optional speaker diarization via hearsay[diarize]--diarize / --dominant-speaker (single-voice TTS) / --per-speaker.
  • --normalize (two-pass EBU R128) and --pad (edge padding) with a de-click fade for clean clip boundaries.
  • Resumable combined builds for playlists / feeds.

Notes

  • No new required dependency (audio via the ffmpeg hearsay already needs; diarization is the only, opt-in, extra).
  • PyPI 0.2.0 (2026-06-14) was a maintenance release of the markdown engine, before the dataset mode existed — hence this is 0.3.0.

Full details in CHANGELOG.md.

hearsay v0.1.0

13 Jun 10:36

Choose a tag to compare

crawl4ai for video & audio. One command turns any YouTube video, podcast episode, or local recording into clean, timestamped, chunked, LLM-ready markdown.

uv tool install hearsay        # or: pipx install hearsay
hearsay "https://www.youtube.com/watch?v=VIDEO_ID"

Highlights:

  • Captions-first (fast, no download) with automatic local Whisper fallback when a video has no captions; --transcribe to force it, --no-vad for music.
  • Readable paragraph grouping with real timestamps; chapters → sections.
  • Podcasts & YouTube playlists in batch; --json sidecar with a stable schema for RAG.
  • MCP server (hearsay mcp) so AI agents can ingest media themselves.

Requires Python 3.11+ and ffmpeg. MIT licensed.