Transform YouTube videos into LLM-ready knowledge bases with a production-ready MCP backend.
Quick Start β’ Features β’ Documentation β’ MCP Tools
- π€ MCP Integration - 12 AI-callable tools for seamless agent integration
- π― Smart Chunking - Semantic text chunking with timeline timestamps
- π§ Vector Embeddings - 384-dimensional embeddings for semantic search
- π Full-Text Search - Context-aware transcript search
- π SEO Intelligence - AI-powered title, tag, and description optimization
- β±οΈ Timeline Analysis - Topic evolution and keyword density tracking
- ποΈ Microservices - 11 independent, composable services
- π Type-Safe - Pydantic models throughout
- β‘ Async-First - Non-blocking I/O operations
- ποΈ Multi-Backend - ChromaDB, FAISS, Qdrant support
# Install
git clone https://github.com/leolech14/ytpipe.git
cd ytpipe
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Process a video
ytpipe "https://youtube.com/watch?v=dQw4w9WgXcQ"Result: Metadata + Transcript + Semantic Chunks + Embeddings + Vector Storage
python -m ytpipe.mcp.serverThen from Claude Code:
"Process this video: https://youtube.com/watch?v=VIDEO_ID"
"Search video dQw4w9WgXcQ for 'machine learning'"
"Optimize SEO for video dQw4w9WgXcQ"
# Basic
ytpipe "https://youtube.com/watch?v=VIDEO_ID"
# Advanced
ytpipe URL --backend faiss --whisper-model large --verbosefrom ytpipe.core.pipeline import Pipeline
pipeline = Pipeline(output_dir="./output")
result = await pipeline.process(url)
print(f"β
{result.metadata.title}")
print(f" Chunks: {len(result.chunks)}")
print(f" Time: {result.processing_time:.1f}s")ytpipe_process_video- Full pipelineytpipe_download- Download onlyytpipe_transcribe- Transcribe audioytpipe_embed- Generate embeddings
ytpipe_search- Full-text searchytpipe_find_similar- Semantic searchytpipe_get_chunk- Get chunk by IDytpipe_get_metadata- Get video info
ytpipe_seo_optimize- SEO recommendationsytpipe_quality_report- Quality metricsytpipe_topic_timeline- Topic evolutionytpipe_benchmark- Performance analysis
MCP Server (12 tools) β Pipeline Orchestrator β 11 Services β Pydantic Models
Services:
- Extractors (2): Download, Transcriber
- Processors (4): Chunker, Embedder, VectorStore, Docling
- Intelligence (4): Search, SEO, Timeline, Analyzer
- Exporters (1): Dashboard
8 Processing Phases:
- Download β 2. Transcription β 3. Chunking β 4. Embeddings β
- Export β 6. Dashboard β 7. Docling β 8. Vector Storage
| Metric | Value |
|---|---|
| Processing Speed | 4-13x real-time |
| Memory Usage | <2GB peak |
| Chunk Quality | 85%+ high quality |
| Embedding Dimension | 384 |
- Python 3.8+
- FFmpeg (for audio extraction)
- 4GB+ RAM recommended
- GPU optional (CUDA for acceleration)
Contributions welcome! Please read CONTRIBUTING.md first.
MIT License - see LICENSE for details.
Built with:
- FastMCP - MCP server framework
- OpenAI Whisper - Speech-to-text
- sentence-transformers - Text embeddings
- Model Context Protocol - AI tool standard
Leonardo Lech
- Email: leonardo.lech@gmail.com
- GitHub: @leolech14
β Star this repo if you find it useful!
Transform YouTube β Knowledge Base in seconds
