Watch and analyze videos using only your existing AI subscription. Works with YouTube, Vimeo, TikTok, X, Instagram (Reels), and any other site yt-dlp supports — plus local files.
Inspired by bradautomates/claude-video. Captions come straight from the source platform when available. For videos without captions (most Instagram Reels, TikToks, etc.), a local whisper.cpp runs on your machine — no API keys, no third-party services. As a last resort, falls back to vision-only analysis.
URL/file → yt-dlp → video.mp4 + (auto-captions.vtt | whisper.cpp transcript) → ffmpeg → frames/*.jpg
↓
Your agent reads selectively
yt-dlppulls the video (max 720p) and the platform's auto-generated captions if any.- If no captions,
whisper.cpp(running locally on CPU) transcribes the audio. Default model isbase(~140MB, multilingual), downloaded once to~/.cache/watch-yt/models/. ffmpegextracts frames at an adaptive FPS, downscaled to keep image-token cost low.- The orchestrator prints a markdown summary: metadata, transcript, frame index.
- Your agent reads the transcript and only the frames it actually needs to answer.
If both captioning paths fail (no captions, whisper.cpp not installed), the script switches to a denser frame sampling and the agent infers content from visuals alone.
Pick the trade-off you want:
| Mode | With transcript | Vision only | ~Image tokens |
|---|---|---|---|
fast |
15 frames @ 320px | 30 frames @ 480px | 10-20k |
balanced (default) |
25 frames @ 384px | 50 frames @ 512px | 25-60k |
accurate |
60 frames @ 512px | 100 frames @ 768px | 80-200k |
A transcript (captions or whisper.cpp output) adds a few hundred to a few thousand text tokens on top.
Works in Claude Code (CLI or desktop app), Cursor, Gemini CLI, or anything else with shell access:
"Install the watch skill from https://github.com/khou/watch-video"
The agent reads AGENTS.md, downloads the repo to ~/.local/share/watch-video, and symlinks it into ~/.claude/skills/watch (which Claude Code and Cursor auto-discover).
If you'd rather have Claude Code track updates for you, use the plugin marketplace instead:
/plugin marketplace add khou/watch-video
/plugin install watch-video@watch-video
brew install khou/watch-video/watch-video --HEAD
bash "$(brew --prefix watch-video)/libexec/install.sh"The first command installs ffmpeg, yt-dlp, and whisper-cpp as deps and drops the repo into $(brew --prefix)/opt/watch-video/libexec. The second symlinks it into ~/.claude/skills/watch for Claude Code and Cursor. Add --gemini to the final install.sh to also wire up Gemini CLI.
brew upgrade watch-video updates in place; the symlink follows automatically.
The same one-liner the agent runs, if you'd rather do it yourself:
mkdir -p ~/.local/share && \
curl -fsSL https://github.com/khou/watch-video/archive/refs/heads/main.tar.gz | \
tar xz -C ~/.local/share && \
rm -rf ~/.local/share/watch-video && \
mv ~/.local/share/watch-video-main ~/.local/share/watch-video && \
bash ~/.local/share/watch-video/install.shAdd --gemini to the final install.sh to also wire up Gemini CLI. If you've already cloned the repo, just bash install.sh from inside it.
ffmpeg, yt-dlp, and whisper.cpp install themselves the first time the script runs (Homebrew on macOS; apt/dnf/pacman + source build on Linux). No API keys. No third-party transcription service — Whisper runs locally.
macOS: install Homebrew first from https://brew.sh if you don't have it. Linux: building
whisper.cppneedsgit,cmake, andg++. The script skips it (and falls back to vision-only) if those aren't present. The consumer Claude Desktop app (claude.ai wrapper) is a separate product and isn't supported — its skills system is unrelated to Claude Code's.
Once installed, just ask the agent about a video URL:
Summarize https://www.youtube.com/watch?v=jNQXAC9IVRw
What does the speaker say about elephants in https://youtu.be/jNQXAC9IVRw ?
What's on the slide at 4:30 in https://youtu.be/<id> ?
To force a token budget:
Watch this in fast mode: https://...
Watch this in accurate mode: https://...
To skip the local-transcription fallback for a faster run on a caption-less video:
python3 scripts/watch.py <url> --no-transcribe
For full CLI options, run python3 scripts/watch.py --help.
MIT.