🎬 YTPipe - AI-Native YouTube Processing Pipeline

Transform YouTube videos into LLM-ready knowledge bases with a production-ready MCP backend.

Quick Start • Features • Documentation • MCP Tools

✨ Features

🤖 MCP Integration - 12 AI-callable tools for seamless agent integration
🎯 Smart Chunking - Semantic text chunking with timeline timestamps
🧠 Vector Embeddings - 384-dimensional embeddings for semantic search
🔍 Full-Text Search - Context-aware transcript search
📊 SEO Intelligence - AI-powered title, tag, and description optimization
⏱️ Timeline Analysis - Topic evolution and keyword density tracking
🏗️ Microservices - 11 independent, composable services
🔐 Type-Safe - Pydantic models throughout
⚡ Async-First - Non-blocking I/O operations
🗄️ Multi-Backend - ChromaDB, FAISS, Qdrant support

🚀 Quick Start

# Install
git clone https://github.com/leolech14/ytpipe.git
cd ytpipe
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Process a video
ytpipe "https://youtube.com/watch?v=dQw4w9WgXcQ"

Result: Metadata + Transcript + Semantic Chunks + Embeddings + Vector Storage

🎯 Usage Examples

MCP Server (AI Agents)

python -m ytpipe.mcp.server

Then from Claude Code:

"Process this video: https://youtube.com/watch?v=VIDEO_ID"
"Search video dQw4w9WgXcQ for 'machine learning'"
"Optimize SEO for video dQw4w9WgXcQ"

CLI (Humans)

# Basic
ytpipe "https://youtube.com/watch?v=VIDEO_ID"

# Advanced
ytpipe URL --backend faiss --whisper-model large --verbose

Python API (Developers)

from ytpipe.core.pipeline import Pipeline

pipeline = Pipeline(output_dir="./output")
result = await pipeline.process(url)

print(f"✅ {result.metadata.title}")
print(f"   Chunks: {len(result.chunks)}")
print(f"   Time: {result.processing_time:.1f}s")

📋 MCP Tools

Pipeline (4 tools)

ytpipe_process_video - Full pipeline
ytpipe_download - Download only
ytpipe_transcribe - Transcribe audio
ytpipe_embed - Generate embeddings

Query (4 tools)

ytpipe_search - Full-text search
ytpipe_find_similar - Semantic search
ytpipe_get_chunk - Get chunk by ID
ytpipe_get_metadata - Get video info

Analytics (4 tools)

ytpipe_seo_optimize - SEO recommendations
ytpipe_quality_report - Quality metrics
ytpipe_topic_timeline - Topic evolution
ytpipe_benchmark - Performance analysis

🏗️ Architecture

MCP Server (12 tools) → Pipeline Orchestrator → 11 Services → Pydantic Models

Services:

Extractors (2): Download, Transcriber
Processors (4): Chunker, Embedder, VectorStore, Docling
Intelligence (4): Search, SEO, Timeline, Analyzer
Exporters (1): Dashboard

8 Processing Phases:

Download → 2. Transcription → 3. Chunking → 4. Embeddings →
Export → 6. Dashboard → 7. Docling → 8. Vector Storage

📊 Performance

Metric	Value
Processing Speed	4-13x real-time
Memory Usage	<2GB peak
Chunk Quality	85%+ high quality
Embedding Dimension	384

🔧 Requirements

Python 3.8+
FFmpeg (for audio extraction)
4GB+ RAM recommended
GPU optional (CUDA for acceleration)

📖 Documentation

🤝 Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

📝 License

MIT License - see LICENSE for details.

🙏 Credits

Built with:

FastMCP - MCP server framework
OpenAI Whisper - Speech-to-text
sentence-transformers - Text embeddings
Model Context Protocol - AI tool standard

📧 Contact

Leonardo Lech

Email: leonardo.lech@gmail.com
GitHub: @leolech14

⭐ Star this repo if you find it useful!

Transform YouTube → Knowledge Base in seconds

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
AUTO_TEST/dQw4w9WgXcQ		AUTO_TEST/dQw4w9WgXcQ
COMPLETE/dQw4w9WgXcQ		COMPLETE/dQw4w9WgXcQ
DEMO/dQw4w9WgXcQ		DEMO/dQw4w9WgXcQ
FINAL_DEMO/dQw4w9WgXcQ		FINAL_DEMO/dQw4w9WgXcQ
FULL_TEST/dQw4w9WgXcQ		FULL_TEST/dQw4w9WgXcQ
INTERACTIVE_OUTPUT		INTERACTIVE_OUTPUT
LAB_OUTPUT		LAB_OUTPUT
SYNC_TEST/dQw4w9WgXcQ		SYNC_TEST/dQw4w9WgXcQ
USER_VIDEOS/CEJ_R5226xE		USER_VIDEOS/CEJ_R5226xE
VIDEOS/-lC4UY3xcxw		VIDEOS/-lC4UY3xcxw
assets		assets
docs		docs
ytpipe		ytpipe
.gitignore		.gitignore
AGENT_KERNEL.md		AGENT_KERNEL.md
ANALYZER_ENHANCEMENT_SUMMARY.md		ANALYZER_ENHANCEMENT_SUMMARY.md
ANALYZER_QUICK_REFERENCE.md		ANALYZER_QUICK_REFERENCE.md
BEFORE_AFTER_COMPARISON.md		BEFORE_AFTER_COMPARISON.md
CHANGELOG_ANALYZER_ENHANCEMENT.md		CHANGELOG_ANALYZER_ENHANCEMENT.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DASHBOARD_FIX_SUMMARY.md		DASHBOARD_FIX_SUMMARY.md
ENHANCED_ANALYZER_DOCS.md		ENHANCED_ANALYZER_DOCS.md
INTEGRATION_EXAMPLE.py		INTEGRATION_EXAMPLE.py
JOURNEY_COMPLETE.md		JOURNEY_COMPLETE.md
LAB_DASHBOARD.md		LAB_DASHBOARD.md
LAB_DASHBOARD_DESIGN_TOKENS.md		LAB_DASHBOARD_DESIGN_TOKENS.md
LAB_DASHBOARD_SUMMARY.md		LAB_DASHBOARD_SUMMARY.md
LICENSE		LICENSE
LIVE_TEST_RESULTS.md		LIVE_TEST_RESULTS.md
MISSION_ACCOMPLISHED.md		MISSION_ACCOMPLISHED.md
OUTPUT_SCHEMA_COMPLETE.md		OUTPUT_SCHEMA_COMPLETE.md
PARALLEL_SWARM_VICTORY.md		PARALLEL_SWARM_VICTORY.md
PROJECT_MANIFEST.md		PROJECT_MANIFEST.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
RELEASE_ANNOUNCEMENT.md		RELEASE_ANNOUNCEMENT.md
SETUP_INSTRUCTIONS.md		SETUP_INSTRUCTIONS.md
SYNC_TRANSCRIPT_FEATURE.md		SYNC_TRANSCRIPT_FEATURE.md
TEST_WITH_CLAUDE_CODE.md		TEST_WITH_CLAUDE_CODE.md
TRANSFORMATION_COMPLETE.md		TRANSFORMATION_COMPLETE.md
UI_COMPRESSION_SUMMARY.md		UI_COMPRESSION_SUMMARY.md
comprehensive_report_generator.py		comprehensive_report_generator.py
mcp_config.json		mcp_config.json
requirements.txt		requirements.txt
setup.py		setup.py
test_analyzer_methods.py		test_analyzer_methods.py
test_dashboard_fix.py		test_dashboard_fix.py
test_enhanced_analyzer.py		test_enhanced_analyzer.py
test_lab_dashboard.py		test_lab_dashboard.py
vector_store_manager.py		vector_store_manager.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 YTPipe - AI-Native YouTube Processing Pipeline

✨ Features

🚀 Quick Start

🎯 Usage Examples

MCP Server (AI Agents)

CLI (Humans)

Python API (Developers)

📋 MCP Tools

Pipeline (4 tools)

Query (4 tools)

Analytics (4 tools)

🏗️ Architecture

📊 Performance

🔧 Requirements

📖 Documentation

🤝 Contributing

📝 License

🙏 Credits

📧 Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

leolech14/ytpipe

Folders and files

Latest commit

History

Repository files navigation

🎬 YTPipe - AI-Native YouTube Processing Pipeline

✨ Features

🚀 Quick Start

🎯 Usage Examples

MCP Server (AI Agents)

CLI (Humans)

Python API (Developers)

📋 MCP Tools

Pipeline (4 tools)

Query (4 tools)

Analytics (4 tools)

🏗️ Architecture

📊 Performance

🔧 Requirements

📖 Documentation

🤝 Contributing

📝 License

🙏 Credits

📧 Contact

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages