An open-source AI video studio with an 8-agent director pipeline โ from a one-line idea to a fully rendered video with voiceover, BGM, and storyboard.
OpenDirector is a Docker-first, self-hosted AI video production studio. Describe your idea in one sentence, and a team of 8 specialized AI agents collaborate to produce a complete video โ with a storyboard, character designs, voiceover, background music, and rendered output.
Just docker compose up and start creating.
| AI Director Chat | Batch Production |
|---|---|
![]() |
![]() |
| Creation Editor | Storyboard Preview |
![]() |
![]() |
Your Idea
|
v
[Script Agent] --> [Art Style Agent] --> [Storyboard Agent]
| |
v v
[Character Agent] [Location Agent] [Voice Agent] [BGM Agent]
| | | |
+--------+---------+-------------------+--------------+
|
v
[Media Agent] --> [Render Worker] --> Final Video
8 specialized agents work in a pipeline:
- Script Agent โ generates the story outline and narrative structure
- Art Style Agent โ selects from 34 built-in styles (e.g. Futuristic Neon Noir, Dreamscape Watercolor Anime, Documentary Realism)
- Storyboard Agent โ breaks the story into scenes with shot descriptions and dialogue
- Character Agent โ designs characters with visual prompts and assigns voice profiles
- Location Agent โ creates environment concepts for each scene
- Voice Agent โ assigns TTS voices matched to character personality and gender
- BGM Agent โ generates background music based on story atmosphere
- Media Agent โ orchestrates image/voice/music generation into final assets
Each agent is a LangGraph node that streams its output in real-time โ you can watch the plan build step by step.
- Input one sentence, AI director auto-generates complete plan: brief, story, storyboard, voiceover, images, BGM
- 34 built-in art styles across 9 categories: Cinematic, Commercial, Futuristic, Retro, Anime, 3D, Illustration, Realistic, Experimental
- AI-generated story scripts, editable manually
- AI voiceover with multiple voice options, real-time preview
- AI background music, auto-generated based on story atmosphere
- Storyboard preview with image + voiceover + BGM synced playback
- Support 16:9 / 9:16 / 1:1 aspect ratios
- Export at 480p / 720p / 1080p
- Input topics, AI auto-generates multiple scripts, batch produce short videos
- Configurable clip duration (2-10 seconds), control material switching rhythm
- Support Chinese and English video scripts
- Multiple TTS voices with built-in Edge TTS (free), real-time preview
- Subtitle generation with customizable font, size, color, position, stroke
- Background music โ random or specified local files, adjustable volume
- Video materials are HD and royalty-free (Pexels / Pixabay), local files also supported
- Generate multiple output variations at once, pick the best one
- Multiple AI model providers โ OpenAI, Google Gemini, DeepSeek, Qwen, MiniMax, Ollama, and more
- Pluggable media providers โ AiHubMix, WaveSpeed, switch via environment variable
- Docker one-click deploy โ
docker compose upand you're ready - Fully self-hosted โ data stays on your server
- Chinese and English UI
- Docker & Docker Compose
- (Optional) Node.js 20+ for local development
git clone https://github.com/seme-org/open-director.git
cd open-director
cp .env.example .env
# Edit .env with your API keys
docker compose up --buildThen open http://localhost:3000.
| Service | URL | Credentials |
|---|---|---|
| App | http://localhost:3000 | โ |
| MinIO Console | http://localhost:9001 | opendirector / opendirector-secret |
| MySQL | localhost:3307 | See .env.prod |
| Redis | localhost:6379 | โ |
OpenDirector supports multiple media generation providers through a pluggable architecture. Set MEDIA_PROVIDER in .env to choose:
AiHubMix is a unified AI API platform with free tiers for many models. Register and get an API key at aihubmix.com.
MEDIA_PROVIDER="aihubmix"
AIHUBMIX_API_KEY="sk-your-key"
# Image generation (free options available)
AIHUBMIX_IMAGE_MODEL="gemini-3.1-flash-image-preview-free"
AIHUBMIX_IMAGE_EDIT_MODEL="gemini-3.1-flash-image-preview-free"
# TTS (free via Edge TTS)
AIHUBMIX_TTS_MODEL="edge"
EDGE_TTS_VOICE="zh-CN-XiaoxiaoNeural"
# BGM (uses pre-uploaded tracks from database, randomly selected)Free model options:
| Capability | Free Model | Notes |
|---|---|---|
| Image generation | gemini-3.1-flash-image-preview-free |
Gemini image generation, has free tier |
| Image editing | gemini-3.1-flash-image-preview-free |
Same model, for character/scene editing |
| TTS | edge |
Microsoft Edge TTS, completely free |
| BGM | Local tracks | Randomly selected from pre-uploaded database tracks |
| LLM | gpt-4.1-free |
For recipe/script generation |
WaveSpeed provides high-quality AI models for media generation.
MEDIA_PROVIDER="wavespeed"
WAVESPEED_API_KEY="your-wavespeed-key"
# Optional: free alternatives
WAVESPEED_TTS_MODEL="edge" # Free Edge TTS
WAVESPEED_MUSIC_MODEL="local" # Free local tracks from database| Feature | AiHubMix | WaveSpeed |
|---|---|---|
| Free tier | Yes (limited calls) | No |
| Image models | Multiple (Gemini, GPT, etc.) | Nano Banana, Seedream |
| TTS | Edge TTS (free) or paid models | MiniMax or Edge TTS |
| BGM | Local tracks (database) | AI-generated or local tracks |
| Setup | Register at aihubmix.com | Register at wavespeed.ai |
The LLM is used for recipe generation, script writing, and the AI director. It uses OpenAI-compatible API format.
OPENAI_API_KEY="your-key"
OPENAI_BASE_URL="https://api.openai.com/v1"
OPENAI_MODEL="gpt-4o-mini"Supported providers:
- OpenAI (direct)
- AiHubMix (
https://aihubmix.com/v1) - Google Gemini (via OpenAI-compatible endpoint)
- Any OpenAI-compatible API (OpenRouter, LiteLLM, Ollama, etc.)
pnpm install
pnpm db:generate
pnpm devThis starts the Next.js dev server on http://localhost:3000.
| File | Purpose |
|---|---|
.env.example |
Documented template for all variables |
.env |
Local machine overrides (git-ignored) |
.env.prod |
Docker Compose production defaults |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Next.js App โ
โ (App Router, React 19, TypeScript, Tailwind) โ
โโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโผโโโโ โโโโโโผโโโโ โโโโโผโโโโโ
โ MySQL โ โ Redis โ โ MinIO โ
โ 8.4 โ โ 7 โ โ S3 โ
โโโโโโโโโโ โโโโโโฌโโโโ โโโโโโโโโโ
โ
โโโโโโโผโโโโโโโ
โ Worker โ
โ (FFCreator) โ
โโโโโโโโโโโโโโ
open-director/
โโโ apps/
โ โโโ web/ # Next.js frontend + API routes + 8 AI agents
โ โโโ render/ # BullMQ render worker (FFCreator)
โโโ assets/
โ โโโ fonts/ # Subtitle rendering fonts
โโโ prisma/
โ โโโ schema.prisma # Database schema (voices, art_styles, bgms, etc.)
โโโ docker-compose.yml
โโโ package.json
apps/web/src/server/agent/
โโโ media-provider.ts # Types + factory + orchestrator
โโโ voices.ts # TTS voice catalog (loaded from database)
โโโ art-styles.ts # Art style catalog (loaded from database)
โโโ providers/
โ โโโ wavespeed.ts # WaveSpeed implementation
โ โโโ aihubmix.ts # AiHubMix implementation
โ โโโ local-bgm.ts # Local BGM (random track from database)
โ โโโ wavespeed.test.ts # Provider tests
โโโ graph/nodes/recipe/ # 8 LangGraph agent nodes
| Layer | Technology |
|---|---|
| Frontend | Next.js 16, React 19, TypeScript, Tailwind CSS 4 |
| AI | LangChain + LangGraph |
| Database | Prisma + MySQL 8.4 |
| Queue | BullMQ + Redis |
| Storage | MinIO (S3-compatible) |
| Render | FFCreator (FFmpeg-based) |
| Auth | Custom credentials (Prisma-backed) |
| i18n | next-intl (English + Chinese) |
| Route | Description |
|---|---|
/ |
Landing page |
/chat |
AI director studio |
/chat/[id] |
Existing conversation |
/creation/[id] |
Creation editor (storyboard preview + export) |
/space |
User workspace |
/batch |
Batch video production |
/signin, /signup |
Authentication |
| Endpoint | Description |
|---|---|
/api/agent-chat |
AI director chat (streaming) |
/api/threads |
Thread CRUD |
/api/messages |
Message CRUD |
/api/assets |
Asset management |
/api/recipes/thread/[id] |
Recipe operations |
/api/uploads/init, /complete |
File upload |
/api/render/quick-concat |
Video render |
/api/jobs/[id] |
Job status |
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
Set in .env.prod |
MySQL connection string |
REDIS_HOST |
redis |
Redis host |
REDIS_PORT |
6379 |
Redis port |
S3_ENDPOINT |
http://minio:9000 |
S3-compatible storage endpoint |
S3_ACCESS_KEY_ID |
opendirector |
S3 access key |
S3_SECRET_ACCESS_KEY |
opendirector-secret |
S3 secret key |
S3_BUCKET |
open-director |
S3 bucket name |
| Variable | Default | Description |
|---|---|---|
MEDIA_PROVIDER |
aihubmix |
Provider: aihubmix or wavespeed |
AIHUBMIX_API_KEY |
โ | AiHubMix API key |
AIHUBMIX_IMAGE_MODEL |
gemini-3.1-flash-image-preview-free |
Image generation model |
AIHUBMIX_TTS_MODEL |
edge |
TTS model (edge for free) |
EDGE_TTS_VOICE |
zh-CN-XiaoxiaoNeural |
Edge TTS voice |
WAVESPEED_API_KEY |
โ | WaveSpeed API key |
WAVESPEED_TTS_MODEL |
edge |
WaveSpeed TTS model (edge for free) |
WAVESPEED_MUSIC_MODEL |
local |
WaveSpeed BGM model (local for database tracks) |
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
โ | OpenAI-compatible API key |
OPENAI_BASE_URL |
โ | API base URL |
OPENAI_MODEL |
gpt-4o-mini |
Model name |
| Variable | Default | Description |
|---|---|---|
PEXELS_API_KEY |
โ | Pexels API key for stock videos |
PIXABAY_API_KEY |
โ | Pixabay API key for stock videos |
BATCH_TTS_PROVIDER |
edge |
Batch TTS provider |
BATCH_EDGE_TTS_VOICE |
zh-CN-XiaoxiaoNeural |
Batch Edge TTS voice |
docker compose up -d --buildThis starts all services: MySQL, Redis, MinIO, web app, and render worker.
pnpm install
pnpm db:generate
pnpm db:migrate
pnpm build
pnpm start- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Commit your changes:
git commit -m "feat: add my feature" - Push to the branch:
git push origin feature/my-feature - Open a Pull Request
- Run
pnpm typecheckbefore committing - Run
pnpm lintto check code style - Follow Conventional Commits for commit messages
- AI Digital Human โ talking-head video generation with digital avatars
- Manga Drama โ comic panel animation with expression switching and camera effects
- Multi-language voiceover โ expand TTS voice catalog with more languages



