Skip to content

🎬 AI-Native YouTube Processing Pipeline - Transform videos into LLM-ready knowledge bases with MCP integration

License

Notifications You must be signed in to change notification settings

leolech14/ytpipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

YTPipe Banner

🎬 YTPipe - AI-Native YouTube Processing Pipeline

Python 3.8+ License: MIT MCP Compatible Code style: black

Transform YouTube videos into LLM-ready knowledge bases with a production-ready MCP backend.

Quick Start β€’ Features β€’ Documentation β€’ MCP Tools

✨ Features

  • πŸ€– MCP Integration - 12 AI-callable tools for seamless agent integration
  • 🎯 Smart Chunking - Semantic text chunking with timeline timestamps
  • 🧠 Vector Embeddings - 384-dimensional embeddings for semantic search
  • πŸ” Full-Text Search - Context-aware transcript search
  • πŸ“Š SEO Intelligence - AI-powered title, tag, and description optimization
  • ⏱️ Timeline Analysis - Topic evolution and keyword density tracking
  • πŸ—οΈ Microservices - 11 independent, composable services
  • πŸ” Type-Safe - Pydantic models throughout
  • ⚑ Async-First - Non-blocking I/O operations
  • πŸ—„οΈ Multi-Backend - ChromaDB, FAISS, Qdrant support

πŸš€ Quick Start

# Install
git clone https://github.com/leolech14/ytpipe.git
cd ytpipe
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Process a video
ytpipe "https://youtube.com/watch?v=dQw4w9WgXcQ"

Result: Metadata + Transcript + Semantic Chunks + Embeddings + Vector Storage


🎯 Usage Examples

MCP Server (AI Agents)

python -m ytpipe.mcp.server

Then from Claude Code:

"Process this video: https://youtube.com/watch?v=VIDEO_ID"
"Search video dQw4w9WgXcQ for 'machine learning'"
"Optimize SEO for video dQw4w9WgXcQ"

CLI (Humans)

# Basic
ytpipe "https://youtube.com/watch?v=VIDEO_ID"

# Advanced
ytpipe URL --backend faiss --whisper-model large --verbose

Python API (Developers)

from ytpipe.core.pipeline import Pipeline

pipeline = Pipeline(output_dir="./output")
result = await pipeline.process(url)

print(f"βœ… {result.metadata.title}")
print(f"   Chunks: {len(result.chunks)}")
print(f"   Time: {result.processing_time:.1f}s")

πŸ“‹ MCP Tools

Pipeline (4 tools)

  • ytpipe_process_video - Full pipeline
  • ytpipe_download - Download only
  • ytpipe_transcribe - Transcribe audio
  • ytpipe_embed - Generate embeddings

Query (4 tools)

  • ytpipe_search - Full-text search
  • ytpipe_find_similar - Semantic search
  • ytpipe_get_chunk - Get chunk by ID
  • ytpipe_get_metadata - Get video info

Analytics (4 tools)

  • ytpipe_seo_optimize - SEO recommendations
  • ytpipe_quality_report - Quality metrics
  • ytpipe_topic_timeline - Topic evolution
  • ytpipe_benchmark - Performance analysis

πŸ—οΈ Architecture

MCP Server (12 tools) β†’ Pipeline Orchestrator β†’ 11 Services β†’ Pydantic Models

Services:

  • Extractors (2): Download, Transcriber
  • Processors (4): Chunker, Embedder, VectorStore, Docling
  • Intelligence (4): Search, SEO, Timeline, Analyzer
  • Exporters (1): Dashboard

8 Processing Phases:

  1. Download β†’ 2. Transcription β†’ 3. Chunking β†’ 4. Embeddings β†’
  2. Export β†’ 6. Dashboard β†’ 7. Docling β†’ 8. Vector Storage

πŸ“Š Performance

Metric Value
Processing Speed 4-13x real-time
Memory Usage <2GB peak
Chunk Quality 85%+ high quality
Embedding Dimension 384

πŸ”§ Requirements

  • Python 3.8+
  • FFmpeg (for audio extraction)
  • 4GB+ RAM recommended
  • GPU optional (CUDA for acceleration)

πŸ“– Documentation


🀝 Contributing

Contributions welcome! Please read CONTRIBUTING.md first.


πŸ“ License

MIT License - see LICENSE for details.


πŸ™ Credits

Built with:


πŸ“§ Contact

Leonardo Lech


⭐ Star this repo if you find it useful!

Transform YouTube β†’ Knowledge Base in seconds

About

🎬 AI-Native YouTube Processing Pipeline - Transform videos into LLM-ready knowledge bases with MCP integration

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •