Releases: mudassar531/hearsay
Releases · mudassar531/hearsay
v0.3.0 — dataset export mode + clean clip boundaries
hearsay 0.3.0
The headline: hearsay now turns media into TTS/STT training datasets (hearsay dataset <SOURCE> + a web UI mode), alongside the existing markdown/JSON engine.
New
- Dataset export mode — slice audio on word-level timestamps (never mid-word) into LJSpeech (
metadata.csv), NeMo (manifest.jsonl), and HFaudiofolderlayouts, with adataset_card.mdanddropped.jsonl. - Quality filtering (on by default) + opt-in clipping detection.
- Optional speaker diarization via
hearsay[diarize]—--diarize/--dominant-speaker(single-voice TTS) /--per-speaker. --normalize(two-pass EBU R128) and--pad(edge padding) with a de-click fade for clean clip boundaries.- Resumable combined builds for playlists / feeds.
Notes
- No new required dependency (audio via the ffmpeg hearsay already needs; diarization is the only, opt-in, extra).
- PyPI 0.2.0 (2026-06-14) was a maintenance release of the markdown engine, before the dataset mode existed — hence this is 0.3.0.
Full details in CHANGELOG.md.
hearsay v0.1.0
crawl4ai for video & audio. One command turns any YouTube video, podcast episode, or local recording into clean, timestamped, chunked, LLM-ready markdown.
uv tool install hearsay # or: pipx install hearsay
hearsay "https://www.youtube.com/watch?v=VIDEO_ID"Highlights:
- Captions-first (fast, no download) with automatic local Whisper fallback when a video has no captions;
--transcribeto force it,--no-vadfor music. - Readable paragraph grouping with real timestamps; chapters → sections.
- Podcasts & YouTube playlists in batch;
--jsonsidecar with a stable schema for RAG. - MCP server (
hearsay mcp) so AI agents can ingest media themselves.
Requires Python 3.11+ and ffmpeg. MIT licensed.