YouTube & Spotify podcast to LLM-ready transcript in one click. Runs locally, costs nothing.
How_to_transcribe.mp4
Don't want to install anything? A hosted version is coming soon — no setup required. Join the waitlist →
git clone https://github.com/lifesized/youtube-transcriber.git
cd youtube-transcriber
npm run setup
npm run devOpen http://localhost:19720 — paste a YouTube or Spotify podcast URL, hit Transcribe, done.
npm run setupinstalls all dependencies (yt-dlp, ffmpeg, Whisper, MLX on Apple Silicon) and configures everything automatically. Requires Node.js 18+, Python 3.8+, and a package manager (Homebrew / apt / dnf / pacman).
Transcribe any YouTube video or Spotify podcast episode directly from your browser without leaving the page. The extension opens as a persistent side panel — it stays open as you navigate between videos and detects each one automatically.
Note: The extension is not yet on the Chrome Web Store. Install it manually in a few steps while we go through the review process.
- Make sure the service is running (
npm run dev) - Open Chrome and go to
chrome://extensions - Enable Developer mode (toggle, top right)
- Click Load unpacked
- Select the
extension/folder inside this repo - Click the YouTube Transcriber icon in your toolbar to open the side panel
Navigate to any YouTube video or Spotify episode, open the side panel, and click Transcribe. The extension uses the same transcription pipeline as the web app — captions first, Whisper fallback, cloud providers if configured. For Spotify, it discovers the podcast's public RSS feed and transcribes the audio.
Transcripts open directly in the web app at http://localhost:19720 where you can search, copy, export, or send to an LLM.
A Chrome Web Store listing is coming soon — once published, installation will be one click.
Two options — pick the one that fits your setup:
Claude_Desktop_Transcribe_Summarize.mp4
# Claude Code
cp -r contrib/claude-code ~/.claude/skills/youtube-transcriber
# OpenClaw
cp -r contrib/openclaw ~/.openclaw/skills/youtube-transcriberRequires the app running at localhost:19720 (npm run dev). Gives you Whisper fallback, speaker diarization, persistent library, and all API features.
cp contrib/claude-code/SKILL-lite.md ~/.claude/skills/youtube-transcriber/SKILL.mdWorks with just yt-dlp installed — no server needed. Extracts YouTube subtitles directly. Automatically upgrades to the full service when it detects it running.
| Lite | Full | |
|---|---|---|
| YouTube captions | Yes | Yes |
| Auto-generated subs | Yes | Yes |
| Whisper transcription | — | Yes |
| Speaker diarization | — | Yes |
| Persistent library | — | Yes |
| Setup | yt-dlp only |
Node + Python + service |
"summarize https://youtube.com/watch?v=..." "transcribe https://youtube.com/watch?v=..." "s https://youtube.com/watch?v=..." / "t https://youtube.com/watch?v=..." / "ts https://youtube.com/watch?v=..."
Or just paste a YouTube URL — the skill auto-activates.
Works with Claude Desktop, Cursor, and any MCP-compatible client. During npm run setup, choose "yes" when prompted to install the MCP server.
npm run mcp:config # prints config with your absolute pathAdd the output to your client config (full setup guide):
| Client | Config file |
|---|---|
| Claude Desktop | ~/Library/Application Support/Claude/claude_desktop_config.json |
| Claude Code | .claude/mcp.json or claude mcp add |
| Cursor | .cursor/mcp.json |
Available tools: transcribe, transcribe_and_summarize, list_transcripts, search_transcripts, get_transcript, delete_transcript, summarize_transcript
Once configured, just type naturally in Claude Desktop:
| You type | What happens |
|---|---|
transcribe https://youtube.com/watch?v=... |
Fetches and returns the full transcript |
transcribe and summarize https://youtube.com/watch?v=... |
Fetches transcript, then Claude summarizes it |
t https://youtube.com/watch?v=... |
Shorthand for transcribe |
s https://youtube.com/watch?v=... |
Shorthand for summarize |
ts https://youtube.com/watch?v=... |
Shorthand for transcribe + summarize |
You can also just paste a YouTube URL — Claude will offer to transcribe it.
Paste a URL. The app grabs the transcript using the fastest method available on your system:
YouTube:
- YouTube Captions — fetches official captions when they exist (< 5 sec)
- Cloud Whisper — optional Groq, OpenRouter, or custom API with your own key (10-30 sec for 10 min)
- MLX Whisper — local GPU transcription on Apple Silicon (30-60 sec for 10 min)
- OpenAI Whisper — local CPU fallback that works everywhere (2-5 min for 10 min)
Spotify Podcasts:
- Fetches episode metadata from Spotify's official API
- Discovers the podcast's public RSS feed via iTunes
- Downloads full episode audio from the podcast CDN
- Transcribes via Cloud Whisper or local Whisper
Spotify support requires
SPOTIFY_CLIENT_IDandSPOTIFY_CLIENT_SECRETin.env(free from developer.spotify.com). Spotify-exclusive podcasts without a public RSS feed are not supported.
Works fully offline by default for YouTube. Cloud Whisper is optional — bring your own API key to enable it.
- YouTube + Spotify — paste a YouTube video URL or Spotify podcast episode URL
- Local + cloud transcription — free local Whisper by default, optional cloud providers (Groq, OpenRouter, or custom endpoint) for faster results with your own API key
- Chrome extension — persistent side panel that transcribes YouTube videos and Spotify episodes from your browser
- Multi-language captions — request captions in any language YouTube supports (see Language Preference below)
- Summarize with LLM — send any transcript straight to ChatGPT or Claude. ChatGPT opens with the prompt pre-filled; Claude copies it to your clipboard so you can paste (⌘V) into a new chat
- Queue system — batch-process multiple videos
- Search & filter your transcript library
- Export as Markdown or copy to clipboard with timestamps
- Duplicate detection — same video won't be saved twice
- Speaker diarization — optional speaker identification with pyannote.audio
- SQLite storage — all data stays on your machine
- Fully offline-capable after initial setup
Full REST API docs: docs/API.md | OpenAPI spec: docs/openapi.yaml
Add one or more cloud providers in Settings (gear icon, bottom-left). Drag to reorder priority — the app tries each enabled provider in order, then falls back to local Whisper.
The fastest option — uses Groq's free Whisper API. No credit card required.
- Sign up at console.groq.com
- Go to API Keys → Create API Key
- Paste the key in Settings
Free tier limits: 14,400 audio-seconds per day (~4 hours). The Settings page shows a usage meter so you can track your quota.
Access dozens of transcription models through a single API key, including Gemini 2.5 Flash.
- Sign up at openrouter.ai/keys
- Create an API key
- Paste the key in Settings — pick your model from the dropdown
Point to any OpenAI-compatible transcription API by providing a base URL, API key, and model name.
By default, the app fetches English captions. You can change this per-request or globally.
Per-request — pass lang in the API body:
curl -X POST http://localhost:19720/api/transcripts \
-H 'Content-Type: application/json' \
-d '{"url": "https://youtube.com/watch?v=...", "lang": "es"}'Multi-language priority — tries each language in order, falls back to first available:
-d '{"url": "...", "lang": "ja,en"}' # Japanese preferred, English fallbackGlobal default — set in .env:
YTT_CAPTION_LANGS="zh-Hans,zh-Hant,en"The MCP tools (transcribe, transcribe_and_summarize) also accept an optional lang parameter.
If the automated setup doesn't work or you prefer to do it yourself:
Expand manual steps
git clone https://github.com/lifesized/youtube-transcriber.git
cd youtube-transcriber
# Install Node dependencies
npm install
# Set up Python virtual environment
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install Whisper
pip install openai-whisper
# Optional: MLX Whisper for Apple Silicon
pip install mlx-whisper
# Configure environment
cp .env.example .env
# Edit .env with your pathsEnvironment variables (.env):
DATABASE_URL="file:./dev.db"
WHISPER_CLI="/path/to/your/.venv/bin/whisper"
WHISPER_PYTHON_BIN="/path/to/your/.venv/bin/python3"
# Optional — local Whisper
# WHISPER_BACKEND="auto" # auto, mlx, or openai
# WHISPER_DEVICE="auto" # auto, cpu, mps
# WHISPER_TIMEOUT_MS="480000"
# Optional — cloud providers are configured in Settings (UI)
# Legacy env var still works for a single Groq key:
# WHISPER_CLOUD_API_KEY="gsk_..."Windows paths:
WHISPER_CLI="C:\\Users\\YourName\\project\\.venv\\Scripts\\whisper.exe"
WHISPER_PYTHON_BIN="C:\\Users\\YourName\\project\\.venv\\Scripts\\python.exe"After setup, verify everything is wired up correctly:
npm run test:setupThis checks Node.js, Python, ffmpeg, yt-dlp, Whisper, database, and environment configuration. Each check prints pass/fail with actionable fix messages. It runs automatically at the end of npm run setup.
For a running instance, hit the health endpoint:
curl http://localhost:19720/api/healthReturns JSON with per-check pass/fail — useful for Docker health checks or debugging. See docs/TESTING.md for the full test protocol.
"spawn whisper ENOENT" error
- Check that
WHISPER_CLIandWHISPER_PYTHON_BINpaths in.envare correct - Use absolute paths, not relative paths
- Restart the dev server after updating
.env
Slow transcription
- Enable cloud Whisper for the fastest option: set
WHISPER_CLOUD_API_KEYin.env(Groq free tier available) - On Apple Silicon, install
mlx-whisperfor 3-5x local speedup - Use smaller Whisper models (
tiny,base) for faster local results - Set
WHISPER_BACKEND="mlx"in.envto force MLX
Rate limiting / bot detection
- The app automatically tries multiple InnerTube clients
- Wait a few minutes and retry if YouTube blocks requests
- Disable VPN if you're getting consistent 403 errors
Contributions welcome — feel free to submit issues or pull requests.
If this saves you time, a ⭐ on GitHub helps others find it.
GNU Affero General Public License v3.0
Designed and built by lifesized.
Built with: Intent by Augment, Cursor, Codex, Claude Code, and Ghostty.