Production-ready pipeline for audio-driven animation in Blender
A configuration-first, modular system demonstrating Blender automation, audio analysis integration, and headless rendering architecture.
A fully functional pipeline that transforms audio files into animated videos with synchronized lip movements, beat-reactive gestures, and timed lyrics — all driven by YAML configuration files instead of manual animation.
But more importantly: A technical demonstration of production-ready Blender automation, showcasing:
- ✅ Configuration-first architecture (no code changes for different outputs)
- ✅ Headless rendering (cloud/container deployment ready)
- ✅ Modular 4-phase pipeline with clean separation of concerns
- ✅ Extensible plugin system (easy to add new animation modes)
- ✅ Real-world performance benchmarks (tested in cloud environments)
Use Case: Automated music video generation (lyric videos, podcasts, educational content)
Learning Value: Demonstrates Blender Python API patterns, audio analysis integration, and pipeline architecture rarely documented elsewhere.
# 1. Install dependencies
pip install -r requirements.txt
# 2. Install Blender 4.0+ and FFmpeg
# https://www.blender.org/download/
# https://ffmpeg.org/download.html
# 3. Run the pipeline with test config (renders in 4-6 minutes)
python main.py --config config_ultra_fast.yaml
# 4. Find output video
ls outputs/ultra_fast/ultra_fast.mp4Result: 30-second video with animated mascot, lip sync, and lyrics.
- ARCHITECTURE.md - System design, data flow, extension points, deployment patterns
- DEVELOPER_GUIDE.md - Step-by-step tutorials for adding modes, effects, and audio analysis
- CASE_STUDIES.md - Real-world benchmarks, cloud rendering, performance optimization
- TESTING_GUIDE.md - Quality/speed configurations, testing workflow
- AUTOMATED_LYRICS_GUIDE.md - Whisper integration for auto lyrics timing
- POSITIONING_GUIDE.md - Scene layout and debug visualization
- PIPELINE_TEST_EVALUATION.md - Complete test results from cloud environment
- CROSS_PLATFORM_DEV_GUIDE.md - Windows/Linux development setup
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Phase 1 │────▶│ Phase 2 │────▶│ Phase 3 │
│ Audio Prep │ │ Rendering │ │ Export │
│ │ │ │ │ │
│ - Beats │ │ - 2D/3D Mode │ │ - MP4 │
│ - Phonemes │ │ - Lip Sync │ │ - H.264 │
│ - Lyrics │ │ - Gestures │ │ - Audio Sync│
└─────────────┘ └──────────────┘ └─────────────┘
↓ ↓ ↓
prep_data.json PNG frames final.mp4
Key Design Principles:
- Separation of concerns: Each phase independent, cacheable outputs
- Configuration over code: YAML drives all behavior
- Extensibility: Plugin-style animation modes
- Production-ready: Headless rendering, error handling, validation
See ARCHITECTURE.md for complete system design.
Phase 1: Audio Preprocessing
- Beat/onset detection (LibROSA)
- Phoneme extraction (Rhubarb Lip Sync or mock fallback)
- Lyrics parsing (manual or automated with Whisper)
- JSON output for downstream processing
Phase 2: Blender Rendering
- 2D Grease Pencil mode (fast, stylized)
- 3D mesh mode (planned)
- Hybrid mode (planned)
- Automated lip sync from phonemes
- Beat-synchronized gestures
- Timed lyric text objects
Phase 3: Video Export
- FFmpeg integration (H.264, H.265, VP9)
- Quality presets (low, medium, high, ultra)
- Preview mode for rapid iteration
- Audio synchronization
Phase 4: 2D Animation System
- Image-to-stroke conversion
- Grease Pencil animation
- ~2x faster rendering than 3D
- Stylized artistic output
Headless Rendering
- Tested in Docker containers with Xvfb
- No GUI required
- Cloud deployment ready (AWS, GCP)
- See CASE_STUDIES.md for cloud setup
Performance Optimization
- Progressive quality configs (180p → 360p → 1080p)
- Render time: 4 min (ultra-fast) to 50 min (production) for 30s video
- Benchmarks included in CASE_STUDIES.md
Automated Lyrics
- Whisper integration for auto-transcription
- Gentle forced alignment
- Beat-based distribution
- See AUTOMATED_LYRICS_GUIDE.md
No code changes needed - just swap YAML files:
# config_ultra_fast.yaml (testing - 4 min render)
video:
resolution: [320, 180]
fps: 12
samples: 16
# config_quick_test.yaml (preview - 12 min render)
video:
resolution: [640, 360]
fps: 24
samples: 32
# config.yaml (production - 50 min render)
video:
resolution: [1920, 1080]
fps: 24
samples: 64Run with: python main.py --config <config_file>
# Run complete pipeline (all 3 phases)
python main.py --config config.yaml
# Run individual phases
python main.py --config config.yaml --phase 1 # Audio prep only
python main.py --config config.yaml --phase 2 # Render only
python main.py --config config.yaml --phase 3 # Export only
# Validate configuration
python main.py --config config.yaml --validate# Instead of manual lyrics.txt, auto-generate with Whisper
pip install openai-whisper
python auto_lyrics_whisper.py assets/song.wav --output assets/lyrics.txt
# Then run pipeline as normal
python main.py# Use ultra-fast config for rapid iteration (4 min for 30s video)
python main.py --config config_ultra_fast.yaml
# Or use the quick test script
python quick_test.py --auto-lyrics --debugSee DEVELOPER_GUIDE.md for complete tutorials.
Quick example - Add particle system mode:
- Create
particle_system.pywith builder class - Register in
blender_script.pydispatcher - Add
mode: "particles"to config - Run pipeline - no other code changes needed
Full tutorial with code samples in DEVELOPER_GUIDE.md
Example - Camera shake on beats:
# effects.py
class CameraShakeEffect:
def apply(self, camera):
for beat_frame in self.prep_data['beats']['beat_frames']:
# Add shake keyframes
camera.location = shake_position
camera.keyframe_insert(data_path="location", frame=beat_frame)Add to config:
effects:
camera_shake:
enabled: true
intensity: 0.2Full implementation in DEVELOPER_GUIDE.md
semantic-foragecast-engine/
├── main.py # Orchestrator
├── prep_audio.py # Phase 1: Audio analysis
├── blender_script.py # Phase 2: Blender automation
├── grease_pencil.py # 2D animation mode
├── export_video.py # Phase 3: FFmpeg export
├── config.yaml # Production config
├── config_ultra_fast.yaml # Fast testing config
├── config_360p_12fps.yaml # Mid-quality config
├── quick_test.py # Automated testing script
├── auto_lyrics_whisper.py # Automated lyrics (Whisper)
├── auto_lyrics_gentle.py # Automated lyrics (Gentle)
├── auto_lyrics_beats.py # Beat-based lyrics
├── assets/ # Sample inputs
│ ├── song.wav # 30s test audio
│ ├── fox.png # Mascot image
│ └── lyrics.txt # Timed lyrics
├── outputs/ # Generated outputs
│ ├── ultra_fast/ # Fast test outputs
│ ├── test_360p/ # Mid-quality outputs
│ └── production/ # High-quality outputs
├── docs/ # Documentation
│ ├── ARCHITECTURE.md # System design
│ ├── DEVELOPER_GUIDE.md # Extension tutorials
│ ├── CASE_STUDIES.md # Benchmarks & examples
│ ├── TESTING_GUIDE.md # Quality/speed configs
│ ├── AUTOMATED_LYRICS_GUIDE.md
│ └── POSITIONING_GUIDE.md
└── tests/ # Unit tests
30-second video render times (tested in cloud container, CPU only):
| Config | Resolution | FPS | Samples | Render Time | File Size | Use Case |
|---|---|---|---|---|---|---|
| Ultra Fast | 320x180 | 12 | 16 | 4 min | 489 KB | Testing pipeline |
| 360p 12fps | 640x360 | 12 | 16 | 6 min | 806 KB | Quality check |
| Quick Test | 640x360 | 24 | 32 | 13 min | ~1.5 MB | Preview |
| Production | 1920x1080 | 24 | 64 | 50 min | ~8 MB | Final output |
Key finding: 360p @ 12fps is the sweet spot for development (6 min, good quality)
See CASE_STUDIES.md for complete benchmarks and optimization strategies.
Core:
- Python 3.11+
- Blender 4.0+ (Python API)
- FFmpeg 4.4+
Audio Analysis:
- LibROSA 0.10.1 (beat detection, tempo)
- Rhubarb Lip Sync (phoneme extraction)
- Whisper (optional, auto lyrics)
Rendering:
- Blender EEVEE engine
- Grease Pencil for 2D mode
- Xvfb for headless rendering
Configuration:
- PyYAML 6.0.1
- JSON for intermediate data
- Development: Windows 11, macOS, Linux
- Production: Ubuntu 22.04/24.04 (tested in Docker)
- Cloud: AWS EC2, GCP Compute (headless mode)
- Offline: No cloud dependencies required
See CROSS_PLATFORM_DEV_GUIDE.md for setup instructions.
Tested Use Cases:
- Music lyric videos - Automated generation for indie musicians
- Podcast visualization - Animated host for audio podcasts
- Educational content - Narrated lessons with animated teacher
- Brand mascot videos - Company mascot delivering announcements
Deployment Scenarios:
- Local rendering (Windows/Mac development)
- Docker containers (reproducible builds)
- Cloud rendering (AWS/GCP for batch processing)
- CI/CD integration (automated video generation)
See CASE_STUDIES.md for detailed case studies.
Problem: Few production-ready examples exist for Blender automation. Most tutorials show basic concepts but not real-world architecture.
Solution: This project demonstrates:
- How to structure a multi-phase pipeline
- Configuration-first design patterns
- Headless rendering in cloud environments
- Audio-driven procedural animation
- Extensible plugin architecture
Target Audience:
- Developers learning Blender Python API
- Pipeline engineers building automation tools
- DevOps teams deploying headless rendering
- Anyone needing automated video generation
# Run audio prep manually
python prep_audio.py assets/song.wav --output outputs/prep_data.json
# With lyrics
python prep_audio.py assets/song.wav --lyrics assets/lyrics.txt --output outputs/prep_data.json
# With Rhubarb for real phonemes (not mock)
python prep_audio.py assets/song.wav --rhubarb /path/to/rhubarb --output outputs/prep_data.jsonOutput: prep_data.json containing beats, phonemes, and lyrics timing
# Render with 2D Grease Pencil mode (fastest)
python main.py --config config.yaml --phase 2
# Enable debug visualization (colored position markers)
# Set debug_mode: true in config.yaml, then:
python main.py --config config.yaml --phase 2Output: PNG frames in outputs/*/frames/
# Encode frames to video
python main.py --config config.yaml --phase 3
# Or use export_video.py directly
python export_video.py \
--frames outputs/frames \
--audio assets/song.wav \
--output outputs/video.mp4 \
--quality highOutput: Final MP4 video
# Method 1: Whisper (auto-transcribe, no lyrics needed)
pip install openai-whisper
python auto_lyrics_whisper.py assets/song.wav --output assets/lyrics.txt
# Method 2: Gentle (align known lyrics to audio)
docker run -p 8765:8765 lowerquality/gentle
python auto_lyrics_gentle.py --audio song.wav --lyrics text.txt --output lyrics.txt
# Method 3: Beat-based (distribute lyrics on beats)
python auto_lyrics_beats.py --prep-data prep_data.json --lyrics-text "Your lyrics here"See AUTOMATED_LYRICS_GUIDE.md for detailed comparison.
video:
resolution: [1920, 1080] # Output resolution
fps: 24 # Frame rate
render_engine: "EEVEE" # EEVEE (fast) or CYCLES (quality)
samples: 64 # Render samples (16-256)
codec: "libx264" # Video codec
quality: "high" # low, medium, high, ultraanimation:
mode: "2d_grease" # 2d_grease, 3d, or hybrid
enable_lipsync: true # Phoneme-based lip sync
enable_gestures: true # Beat-synced movement
enable_lyrics: true # Timed lyric text
gesture_intensity: 0.7 # 0.0-1.0style:
lighting: "jazzy" # Lighting preset
colors:
primary: [0.8, 0.3, 0.9]
secondary: [0.3, 0.8, 0.9]
accent: [0.9, 0.8, 0.3]
background: "solid" # solid or hdri
gp_style: # 2D mode only
stroke_thickness: 3
ink_type: "clean" # clean, sketchy, wobbly
enable_wobble: false
wobble_intensity: 0.5advanced:
debug_mode: false # Show position markers
preview_mode: false # Low-res preview
preview_scale: 0.5 # Preview resolution scale
threads: null # Render threads (null = auto)
verbose: true # Detailed logging# Run all tests
python -m unittest discover tests/
# Test specific phase
python tests/test_prep_audio.py
python tests/test_export_video.py# Test complete pipeline with ultra-fast config
python main.py --config config_ultra_fast.yaml
# Automated testing script
python quick_test.py# Enable debug mode to visualize positioning
# In config.yaml: debug_mode: true
python main.py --config config.yaml --phase 2
# Check frame 100 for colored markers
ls outputs/*/frames/frame_0100.png# Linux: Install via apt
sudo apt-get install blender
# Mac: Install via Homebrew
brew install --cask blender
# Windows: Download installer
# https://www.blender.org/download/# Install Xvfb virtual display
sudo apt-get install xvfb
# Run with xvfb-run
xvfb-run -a python main.py --config config.yaml --phase 2# Linux
sudo apt-get install ffmpeg
# Mac
brew install ffmpeg
# Windows: Download from https://ffmpeg.org/Check positioning in config - text should be at y=-2.0, z=0.2:
- See POSITIONING_GUIDE.md
- Enable
debug_mode: trueto see position markers
- Fork the repository
- Create feature branch:
git checkout -b feature/my-feature - Make changes with tests
- Update documentation
- Submit pull request
- New animation modes (3D, particle systems, etc.)
- Audio analysis improvements (melody extraction, harmony)
- Effects (camera movements, post-processing)
- Performance optimizations
- Bug fixes with tests
- Documentation improvements
See DEVELOPER_GUIDE.md for extension tutorials.
- Phase 1: Audio preprocessing
- Phase 2: Blender automation
- Phase 3: Video export
- Phase 4: 2D Grease Pencil mode
- Headless rendering support
- Automated lyrics (Whisper)
- Debug visualization
- Comprehensive documentation
- 3D mesh animation mode
- Hybrid mode (2D + 3D)
- Advanced effects (fog, particles, camera shake)
- Melody extraction and pitch-based animation
- Multi-character support
- Web UI for configuration
- Real-time preview
Q: Can I use this for commercial projects? A: Yes, MIT licensed. Attribution appreciated.
Q: Why is rendering slow?
A: Use config_ultra_fast.yaml for testing (4 min). Production 1080p takes 50 min for 30s video.
Q: Can I run this without Blender installed? A: No, Phase 2 requires Blender. But you can run Phase 1 (audio prep) standalone.
Q: Does this require GPU? A: No, CPU rendering works. GPU recommended for faster production renders.
Q: Can I deploy this in Docker? A: Yes, see CASE_STUDIES.md for cloud deployment example.
Q: Is this AI-generated? A: No, this is procedural animation based on audio analysis, not machine learning.
MIT License - See LICENSE file for details
- LibROSA - Audio analysis library
- Rhubarb Lip Sync - Phoneme extraction
- Blender - 3D creation suite
- FFmpeg - Video encoding
- Whisper - Speech recognition
- Documentation: See
docs/directory - Issues: GitHub Issues
- Discussions: GitHub Discussions
Built with ❤️ for the Blender automation community