AI Video Generation Pipeline

Production-grade pipeline for generating 15-20 minute AI videos from text prompts using open-source models.

🎬 Overview

This pipeline transforms a single text prompt into a complete long-form video by:

Splitting the concept into scenes using LLM
Planning individual camera shots for each scene
Generating 3-12 second video clips using AI models
Maintaining continuity between clips via frame analysis
Creating audio (narration + background music)
Stitching everything with FFmpeg into a final MP4

User Prompt → Scene Splitter → Shot Planner → Clip Generator → Continuity Manager
                                                    ↓
                                            Audio Pipeline → FFmpeg Stitcher → Final Video

⚠️ Hardware Requirements

Component	Minimum	Recommended
RAM	16GB	64GB
GPU VRAM	None (cloud)	24GB+ (local)
Storage	50GB	200GB
CPU	8 cores	16+ cores

Note: This pipeline is designed for cloud-first execution (Replicate, HuggingFace). Local video generation requires 12GB+ VRAM.

🚀 Quick Start

1. Clone and Setup

cd AI_VIDEO
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Configure API Keys (Optional but Recommended)

# For video generation
export REPLICATE_API_TOKEN="your_token_here"

# For scene splitting and HuggingFace models
export HF_TOKEN="your_token_here"

Get free tokens:

Replicate: https://replicate.com (50 free predictions/month)
HuggingFace: https://huggingface.co/settings/tokens (free tier available)

3. Run Your First Video

# Dry run to test setup
python -m pipeline.orchestrator --dry-run --example

# Generate a 90-second test video
python -m pipeline.orchestrator "A cinematic sunset over mountains" --duration 1.5

# Full 12-minute documentary
python -m pipeline.orchestrator "Your video concept here" --duration 12

📁 Project Structure

AI_VIDEO/
├── pipeline/                    # Core pipeline modules
│   ├── __init__.py
│   ├── orchestrator.py         # Main coordinator
│   ├── scene_splitter.py       # Prompt → Scenes (LLM)
│   ├── shot_planner.py         # Scenes → Shots
│   ├── prompt_templates.py     # Jinja2 templates
│   ├── clip_worker.py          # Video generation
│   ├── continuity_manager.py   # Visual continuity
│   └── audio_pipeline.py       # TTS + Music
├── scripts/
│   └── ffmpeg_tools.sh         # FFmpeg commands
├── examples/
│   └── salt_flats_shots.json   # Sample shot sequence
├── tests/
│   └── test_prompts.py         # Unit tests
├── config.yaml                 # Configuration
├── requirements.txt            # Dependencies
├── sample_run.sh               # Example script
└── manifest.json               # Project manifest

🎯 Usage Examples

Basic Video Generation

from pipeline import VideoPipeline

pipeline = VideoPipeline()
result = pipeline.run(
    user_prompt="A serene Japanese garden in autumn, falling maple leaves, koi pond",
    target_duration=5.0  # minutes
)

print(f"Final video: {result['final_video']}")

Custom Configuration

from pipeline.orchestrator import PipelineConfig, VideoPipeline

config = PipelineConfig(
    resolution=1080,
    default_clip_duration=8.0,
    primary_backend="replicate",
    tts_engine="bark",
    skip_music=False
)

pipeline = VideoPipeline(config)

CLI Options

python -m pipeline.orchestrator "Your prompt" \
    --duration 10 \           # Target duration in minutes
    --resolution 1080 \       # 720 or 1080
    --backend replicate \     # replicate, huggingface, local
    --skip-audio \            # Skip audio generation
    --dry-run                 # Test without generation

🎨 Sample: "Ghosts of the Salt Flats"

The pipeline includes a complete example for a 12-minute documentary:

# View the sample shot breakdown
cat examples/salt_flats_shots.json

# Generate using the example
python -m pipeline.orchestrator --example --duration 2

Sample Scenes:

Opening - Empty Horizon (vast salt flats, dusk)
First Glimpse - The Traveler (silhouette in green coat)
Lost Machine - The Plane (rusted aircraft in salt)

🔧 Configuration

Edit config.yaml to customize:

video:
  resolution: 720
  fps: 24
  default_clip_duration: 6

backend:
  primary: "replicate"
  fallback: "huggingface"

audio:
  tts:
    engine: "bark"
    voice: "v2/en_speaker_6"
  music:
    engine: "musicgen"

🧪 Testing

# Run unit tests
python -m pytest tests/ -v

# Validate FFmpeg installation
bash scripts/ffmpeg_tools.sh --validate

# Test module imports
python -c "from pipeline import VideoPipeline; print('OK')"

📊 Evaluation Metrics

The pipeline tracks:

Metric	Description	Target
Continuity Score	Visual consistency between clips	>0.8
Audio Sync	Narration/video alignment	±0.5s
Color Match	Cross-clip color consistency	ΔE <10
Generation Success	Clips generated vs planned	>95%

🔄 Pipeline Flow

flowchart TD
    A[User Prompt] --> B[Scene Splitter]
    B --> C[Shot Planner]
    C --> D{For Each Shot}
    D --> E[Get Continuity Context]
    E --> F[Generate Clip]
    F --> G[Extract Last Frame]
    G --> H[Analyze for Continuity]
    H --> D
    D --> I[Audio Pipeline]
    I --> J[TTS Narration]
    I --> K[Music Generation]
    J --> L[FFmpeg Stitch]
    K --> L
    F --> L
    L --> M[Final MP4]

⚡ Performance Tips

Batch Processing: Generate clips overnight to work within free tier limits
Checkpointing: Pipeline saves progress; resume with --continue-from
Local Testing: Use --backend local for fast pipeline testing (placeholders only)
Resolution: Start with 720p; upscale final video if needed

🆘 Troubleshooting

"No API token provided"

export REPLICATE_API_TOKEN="your_token"
# or
export HF_TOKEN="your_token"

"FFmpeg not found"

Install FFmpeg: https://ffmpeg.org/download.html

"Out of GPU memory" (local mode)

Use cloud backends instead:

python -m pipeline.orchestrator "prompt" --backend replicate

"Rate limit exceeded"

Wait and retry, or use checkpoints to resume later.

📄 License

MIT License - see LICENSE file.

🙏 Credits

Video Models: Stability AI, Replicate, HuggingFace
Audio: Bark TTS, MusicGen
Processing: FFmpeg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Video Generation Pipeline

🎬 Overview

⚠️ Hardware Requirements

🚀 Quick Start

1. Clone and Setup

2. Configure API Keys (Optional but Recommended)

3. Run Your First Video

📁 Project Structure

🎯 Usage Examples

Basic Video Generation

Custom Configuration

CLI Options

🎨 Sample: "Ghosts of the Salt Flats"

🔧 Configuration

🧪 Testing

📊 Evaluation Metrics

🔄 Pipeline Flow

⚡ Performance Tips

🆘 Troubleshooting

"No API token provided"

"FFmpeg not found"

"Out of GPU memory" (local mode)

"Rate limit exceeded"

📄 License

🙏 Credits

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
pipeline		pipeline
scripts		scripts
static		static
tests		tests
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
manifest.json		manifest.json
requirements.txt		requirements.txt
sample_run.bat		sample_run.bat
sample_run.sh		sample_run.sh
server.py		server.py

reth0608/Long_AI_Video_Generator

Folders and files

Latest commit

History

Repository files navigation

AI Video Generation Pipeline

🎬 Overview

⚠️ Hardware Requirements

🚀 Quick Start

1. Clone and Setup

2. Configure API Keys (Optional but Recommended)

3. Run Your First Video

📁 Project Structure

🎯 Usage Examples

Basic Video Generation

Custom Configuration

CLI Options

🎨 Sample: "Ghosts of the Salt Flats"

🔧 Configuration

🧪 Testing

📊 Evaluation Metrics

🔄 Pipeline Flow

⚡ Performance Tips

🆘 Troubleshooting

"No API token provided"

"FFmpeg not found"

"Out of GPU memory" (local mode)

"Rate limit exceeded"

📄 License

🙏 Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages