Small TypeScript CLI for resilient YouTube transcript ingestion.
The project fetches player metadata, selects caption tracks, downloads srv3 subtitles, normalizes them into plain text, and writes typed JSON artifacts for downstream analysis. The repo is intentionally scoped as a clean CLI/data-pipeline showcase rather than a larger app.
- Fetches transcript data for a single video or a batch of video IDs
- Retries transient player request failures with backoff and cooldown windows
- Normalizes caption XML into compact text output
- Writes per-video records under
video_data/ - Produces combined channel exports under
exports/
- Use Node
v22.22.0 - Run
npm install - Copy
.env.exampleto.envand setYT_INTEL_API_KEYS - Fetch one video:
npm run fetch -- VIDEO_ID [lang] [channel] - Export combined channel data:
npm run export [channels...]
Preferred environment variables are app-scoped:
YT_INTEL_API_KEYS=your_api_key1,your_api_key2
YT_INTEL_USER_AGENTS=agent1,agent2
YT_INTEL_LANG=ru
YT_INTEL_MIN_DELAY=1.5
YT_INTEL_MAX_DELAY=3.5
YT_INTEL_LOG_LEVEL=infoLegacy names such as API_KEYS and LANG are still accepted for compatibility.
src/application codetests/unit, CLI, and regression coverageexamples/small curated sample artifacts for quick reviewvideo_data/per-video transcript recordsexports/combined per-channel JSON exportslists/channel input listsarchive/historical artifacts kept out of the main flow
Large local datasets are intentionally gitignored. The public repo contract is the code, docs, fixtures, and small representative samples, not the full working dataset from a local machine.
npm run typechecknpm testnpm run build
The current implementation uses the youtubei/v1/player endpoint as its source adapter. That detail is documented here because it affects reliability and maintenance, but it is not the main design goal of the project.
YT_INTEL.mdfor the runbook and operational notesREADME_DATA.mdfor the data layout contractexamples/README.mdfor tiny representative outputsdocs/architecture.mdfor the code structure and design decisions