A Claude Code skill that turns YouTube videos into cleanly-edited transcripts and structured summaries.
Given one or more YouTube URLs, it:
- Pulls the best available subtitle track via
yt-dlp(manual captions preferred, auto-generated as fallback). - Mechanically cleans the captions — fillers stripped, paragraphs grouped, chapters detected.
- Has Claude rewrite the transcript at paragraph granularity: speaker labels, sectioned by topic shift, voice preserved, no timestamps.
- Generates a separate summary file with the key ideas, notable quotes, and source metadata.
By default, files land in ~/yt-transcripts/:
~/yt-transcripts/
├── transcripts/
│ └── 20260101-some-video-title.md
└── summaries/
└── 20260101-some-video-title-summary.md
Override the location by setting YT_TRANSCRIPT_DIR:
export YT_TRANSCRIPT_DIR=~/Documents/yt-archive-
Clone this repo into your Claude Code skills directory:
git clone https://github.com/<your-fork>/yt-transcript ~/.claude/skills/yt-transcript
-
Install
yt-dlp:brew install yt-dlp # or: pip install yt-dlp -
(Optional) Set a custom output directory:
export YT_TRANSCRIPT_DIR=~/Documents/yt-archive
That's it. Restart Claude Code and /yt-transcript will be available.
In Claude Code:
/yt-transcript https://www.youtube.com/watch?v=...
/yt-transcript <url1> <url2> <url3>
/yt-transcript <url> --summary-only
- One URL or many — multiple URLs are processed sequentially.
--summary-onlyskips the slow paragraph-rewrite step and only produces the summary. The raw transcript is deleted afterward.- Shortened URLs (
youtu.be/...),/shorts/, and timestamped URLs all work.
- Claude Code
- Python 3.9+
yt-dlp- (Optional)
pandocif you want to convert outputs to.docx
The skill is two pieces:
transcribe.py— fetches the subtitle track, parses YouTube'sjson3caption format, deduplicates overlapping events, strips fillers, and groups tokens into paragraphs based on pause length and sentence boundaries. Outputs a "raw mechanical cleanup" markdown file.SKILL.md— instructs Claude to read the raw output and rewrite it semantically: identify speakers, group by topic, smooth surface stutters, preserve voice. Then write a separate summary.
The mechanical pass handles the tedious parts (dedupe, filler removal, paragraph chunking). The Claude pass handles the parts that need judgment (who's speaking, where do topics actually shift, what counts as a "key idea").
MIT — see LICENSE.