YouTube transcript ingestion and analysis pipeline — discover new videos, download transcripts with the full fallback chain (oEmbed → yt-dlp → yt-dlp+cookies → direct API → NotebookLM → Selenium → Whisper), and store results in CKS.
For implementation gotchas, recurring bugs, and lessons learned from live canaries, see CODEX_MEMORY.md.
# Check tracked channels for new videos
/yt-is sync
# Industrial Ingest (NLM Batch) - BEST FOR BACKLOG (worker-count dependent; benchmark sweep continues through 8 workers)
/yt-nlm
# Surgical Fetch (full transcript fallback chain)
/yt-is fetchIMPORTANT: This package supports three different deployment modes. Choose the right one for your use case.
For: When you're actively developing this package and want instant feedback.
Setup:
# Windows (Junction - No admin required)
New-Item -ItemType Junction -Path "$CLAUDE_ROOT/skills\yt-is" -Target "$CLAUDE_ROOT/skills\yt-is"
New-Item -ItemType Junction -Path "$CLAUDE_ROOT/skills\yt-nlm" -Target "$CLAUDE_ROOT/skills\yt-nlm"Key points:
- Skills are in
P://.claude/skills/yt-is/andP://.claude/skills/yt-nlm/ - Changes to skill files take effect immediately
- No reinstallation required
For: When you want yt-is and csf-source commands available in your terminal.
Setup:
# Symlink bin tools to a directory in your PATH
cmd /c "mklink P://bin/yt-is $CLAUDE_PLUGIN_ROOT/bin\yt-is"
cmd /c "mklink P://bin/csf-source $CLAUDE_PLUGIN_ROOT/bin\csf-source"Key points:
yt-is— channel management (sync, list, add, fetch)csf-source— backend for channel and transcript operations- Both commands share the same SQLite database
For: Distributing this package to other users via marketplace or GitHub.
Setup:
# End users install via /plugin command
/plugin P://packages/yt-is
# Or from marketplace (when published)
/plugin install yt-isCheck all tracked YouTube channels for new videos and manage your channel list.
Commands:
sync— Check all tracked channels for new videoslist— List all tracked channels with metadataadd <url>— Add a new channel or playlist to trackfetch— Download transcripts for all pending videos using the full fallback chain
Escalation Chain (per video):
- yt-dlp (WEB client) — Fastest (~5 seconds), works for most public videos
- yt-dlp with cookies — For age-restricted videos
- Selenium Firefox — Fallback for bot-check failures (~15-30 seconds)
Extract YouTube transcripts using NotebookLM's batch notebook workflow.
Recommended approach: Worker-owned batch notebooks (one notebook per worker title, reused across batches; batch size 200) — uses nlm source content (raw text), has auth auto-recovery built in.
Auth contract for tests: nlm login covers the CLI path only. The DOM/spinner readiness path uses a separate persistent Chrome profile and must be bootstrapped once with a signed-in browser session before DOM tests will work. yt-is also refreshes the NotebookLM CLI itself with uv tool install --upgrade notebooklm-mcp-cli unless YTIS_NLM_AUTO_UPDATE=0, then probes nlm login --check and falls back to the known-good pinned git spec via YTIS_NLM_FALLBACK_SPEC if the latest release breaks login on this machine.
Old approach (deprecated): Ephemeral notebooks — one notebook per video, slow, wastes NotebookLM slots.
Channel management CLI wrapping csf-source.
yt-is sync # Check all tracked channels for new videos
yt-is list # List all tracked channels
yt-is add <url> # Add a new channel to track
yt-is fetch # Download pending transcripts (full fallback chain)
yt-is fetch --dry-run # Preview what would be fetched
yt-is fetch --source <url> # Process only one channel
yt-is fetch --workers 2 # Use 2 parallel workersBackend implementation for channel and transcript operations.
csf-source list # List all tracked sources
csf-source add <url> # Add a new source
csf-source check <source> # Check one source for new videos
csf-source check-all # Check all sources for new videos
csf-source sync <source> # Process pending videos for a source
csf-source fetch # Download pending transcripts
csf-source fetch --dry-run # Preview what would be fetched/yt-is sync
↓
RSS check → Gap detection → API resolution
↓
batch_status.sqlite (pending videos)
↓
/yt-nlm (Industrial Cloud Ingest) —— [PRIMARY: 99% Signal SNR]
↓ OR
/yt-is fetch (Surgical Local) —— [FALLBACK: 40% Signal SNR]
↓
transcripts.sqlite (Provenance-tracked Clean Store)
↓
Combined markdown batches → CKS / Obsidian / analysis tools
channel_metadata table (SQLite)
│
├─► yt-is sync ──► RSS check ──► Gap detection ──► API resolution
│ │
│ ▼
│ batch_status table (pending)
│
├─► yt-is fetch ──► FULL FALLBACK CHAIN ──► transcripts.sqlite
│
└─► /yt-nlm ──► Batch notebooks ──► nlm source content ──► transcripts.sqlite
- batch_status.sqlite — Channel metadata and video tracking
channel_metadata— tracked channels with playlist IDsanalysis_status— video status (pending/complete/failed), last_stage, failure_reason
- transcripts.sqlite — Cached transcripts keyed by video_id
| Variable | Required | Description |
|---|---|---|
YOUTUBE_API_KEY |
For gap resolution | YouTube Data API v3 key for filling RSS gaps |
NLM_AUTH_TOKEN |
For NotebookLM | NotebookLM session token |
NLM_PROJECT_ID |
For NotebookLM | GCP project ID for NotebookLM |
YTIS_SCAN_STATUS_INTERVAL_S |
Optional | Emit scan status heartbeats this often during /yt-is sync and csf-source fetch scans (default: 30) |
- Python 3.12+
yt-dlp>=2024.0.0nlmCLI (NotebookLM command-line interface, auto-refreshed by yt-is viauv tool install --upgrade notebooklm-mcp-cliwith a pinned fallback if the latest build fails login)- Firefox (for Selenium fallback)
yt-is/
├── bin/
│ ├── yt-is # Channel management CLI
│ └── csf-source # Backend implementation
├── csf/
│ ├── transcript.py # Transcript fetching (yt-dlp, NLM)
│ ├── batch_status.py # SQLite storage for video tracking
│ ├── source_enumerator.py # RSS + API enumeration
│ └── cache.py # Transcript caching
└── skills/
├── yt-is/SKILL.md # Channel management
├── yt-nlm/SKILL.md # NotebookLM batch extraction
└── yt-dlp/SKILL.md # Local yt-dlp transcript fetching
graph TB
User[/"User: /yt-is or /yt-nlm"/] --> Detect[Detect Skill Invoked]
Detect -->|yt-is| ChannelSkill[yt-is Skill]
Detect -->|yt-nlm| NLMSkill[yt-nlm Skill]
ChannelSkill --> CSFSource[csf-source backend]
NLMSkill --> NLMBatch[csf/nlm_batch.py]
CSFSource --> RSS[RSS Check]
CSFSource --> Gap[Gap Detection]
RSS --> DB[(batch_status.sqlite)]
Gap --> DB
NLMBatch --> NLM[NotebookLM Cloud]
NLM --> Cache[(transcripts.sqlite)]
DB --> NLMBatch
Key features:
- Automatic full fallback chain for transcript download
- Batch NotebookLM workflow with shared defaults in
csf/nlm_config.py(notebook_batch_size = 50,notebook_source_cap = 50) and one notebook per worker title - Auth auto-recovery for NotebookLM sessions
- Configurable NotebookLM policy via
csf/nlm_config.pyand theYTIS_NLM_*env vars it reads - External transcript provider hook for custom sources
- Multi-terminal safe batch processing with InterProcessLock
- See PLAYBOOK_LINKS.md for the debugging playbook, handoff, and memory pointers.