Skip to content

seme-org/open-director

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

33 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

OpenDirector

An open-source AI video studio with a 9-agent director pipeline โ€” from a one-line idea to a fully rendered video with voiceover, BGM, and storyboard.

English | ไธญๆ–‡

๐ŸŒ Official Website: https://od.seme.cc


What is OpenDirector?

OpenDirector is a Docker-first, self-hosted AI video production studio. Describe your idea in one sentence, and a team of 9 specialized AI agents collaborate to produce a complete video โ€” with optional web research, a storyboard, character designs, voiceover, background music, and rendered output.

Just docker compose up and start creating.


Screenshots

AI Director Chat Batch Production
AI Director Batch
Creation Editor Storyboard Preview
Creation Editor Storyboard Preview

Demo Videos

1.mp4
2.mp4
cmp3hhcas00g5hjrqfisin8y4.mp4
cmp3k3mxu006z8y7pxjl9iiuj.mp4

How It Works

Your Idea
   |
   v
[Research Agent] --> [Script Agent] --> [Art Style Agent] --> [Storyboard Agent]
                                                            |
                                                            v
[Character Agent] --> [Location Agent] --> [Voice Agent] --> [BGM Agent]
                                                            |
                                                            v
                                                     [Media Agent]
                                                            |
                                                            v
                                                  [Render Worker] --> Final Video

9 specialized agents work in a pipeline:

  1. Research Agent โ€” uses OpenAI web_search_preview when needed to check known stories, factual references, brands, products, and source notes
  2. Script Agent โ€” generates the story outline and narrative structure, using research notes when available
  3. Art Style Agent โ€” selects from 34 built-in styles (e.g. Futuristic Neon Noir, Dreamscape Watercolor Anime, Documentary Realism)
  4. Storyboard Agent โ€” breaks the story into scenes with shot descriptions and dialogue
  5. Character Agent โ€” designs characters with visual prompts and assigns voice profiles
  6. Location Agent โ€” creates environment concepts for each scene
  7. Voice Agent โ€” assigns TTS voices matched to character personality and gender
  8. BGM Agent โ€” generates background music based on story atmosphere
  9. Media Agent โ€” orchestrates image/voice/music generation into final assets

Each agent is a LangGraph node that streams its output in real-time โ€” you can watch the plan build step by step. The shared graph state now includes a research field with notes, cautions, and sources. The Script Agent consumes those notes without copying source text.


Features

Creative Mode (AI Director Full Workflow)

  • Input one sentence, AI director auto-generates complete plan: brief, story, storyboard, voiceover, images, BGM
  • Optional web research for known stories, factual references, brands, products, and public information
  • 34 built-in art styles across 9 categories: Cinematic, Commercial, Futuristic, Retro, Anime, 3D, Illustration, Realistic, Experimental
  • AI-generated story scripts, editable manually
  • AI voiceover with multiple voice options, real-time preview
  • AI background music, auto-generated based on story atmosphere
  • Storyboard preview with image + voiceover + BGM synced playback
  • Support 16:9 / 9:16 / 1:1 aspect ratios
  • Export at 480p / 720p / 1080p

Batch Mode (Short Video Mass Production)

  • Input topics, AI auto-generates multiple scripts, batch produce short videos
  • Configurable clip duration (2-10 seconds), control material switching rhythm
  • Support Chinese and English video scripts
  • Multiple TTS voices with built-in Edge TTS (free), real-time preview
  • Subtitle generation with customizable font, size, color, position, stroke
  • Background music โ€” random or specified local files, adjustable volume
  • Video materials are HD and royalty-free (Pexels / Pixabay), local files also supported
  • Generate multiple output variations at once, pick the best one

General

  • Multiple AI model providers โ€” OpenAI, Google Gemini, DeepSeek, Qwen, MiniMax, Ollama, and more
  • Pluggable media providers โ€” AiHubMix, WaveSpeed, switch via environment variable
  • Docker one-click deploy โ€” docker compose up and you're ready
  • Fully self-hosted โ€” data stays on your server
  • Chinese and English UI

Quick Start

Prerequisites

  • Docker & Docker Compose
  • (Optional) Node.js 20+ for local development

One-command start

git clone https://github.com/seme-org/open-director.git
cd open-director
cp .env.example .env
# Edit .env with your API keys
docker compose up --build

Then open http://localhost:3000.

Default services

Service URL Credentials
App http://localhost:3000 โ€”
MinIO Console http://localhost:9001 opendirector / opendirector-secret
MySQL localhost:3307 See .env.prod
Redis localhost:6379 โ€”

Media Generation

OpenDirector uses WaveSpeed for image generation. Character and location plates use a text-to-image model, while storyboard frames with character references must use an image-to-image/edit model so reference images are honored.

WAVESPEED_API_KEY="your-wavespeed-key"
WAVESPEED_IMAGE_MODEL="nano-banana"
WAVESPEED_IMAGE_TO_IMAGE_MODEL="nano-banana-2-edit"
EDGE_TTS_VOICE="zh-CN-XiaoxiaoNeural"

Do not set WAVESPEED_IMAGE_TO_IMAGE_MODEL to nano-banana: that alias routes to a text-to-image endpoint and can ignore character reference images.

Speech uses local Edge TTS. Background music uses local tracks from assets/bgm/default/.


LLM Configuration

The LLM is used for recipe generation, script writing, and the AI director. It uses OpenAI-compatible API format.

OPENAI_API_KEY="your-key"
OPENAI_BASE_URL="https://api.openai.com/v1"
OPENAI_MODEL="gpt-4o-mini"

Use OpenAI directly or an OpenAI-compatible endpoint for the LLM.


Local Development

pnpm install
pnpm db:generate
pnpm dev

This starts the Next.js dev server on http://localhost:3000.

Environment files

File Purpose
.env.example Documented template for all variables
.env Local machine overrides (git-ignored)
.env.prod Docker Compose production defaults

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   Next.js App                   โ”‚
โ”‚  (App Router, React 19, TypeScript, Tailwind)   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚          โ”‚          โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”
    โ”‚ MySQL  โ”‚ โ”‚ Redis  โ”‚ โ”‚ MinIO  โ”‚
    โ”‚   8.4  โ”‚ โ”‚   7    โ”‚ โ”‚  S3    โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚   Worker   โ”‚
              โ”‚ (FFCreator) โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Monorepo structure

open-director/
โ”œโ”€โ”€ apps/
โ”‚   โ”œโ”€โ”€ web/          # Next.js frontend + API routes + 9 AI agents
โ”‚   โ””โ”€โ”€ render/       # BullMQ render worker (FFCreator)
โ”œโ”€โ”€ assets/
โ”‚   โ””โ”€โ”€ fonts/        # Subtitle rendering fonts
โ”œโ”€โ”€ prisma/
โ”‚   โ””โ”€โ”€ schema.prisma # Database schema (voices, art_styles, bgms, etc.)
โ”œโ”€โ”€ docker-compose.yml
โ””โ”€โ”€ package.json

Media provider architecture

apps/web/src/server/agent/
โ”œโ”€โ”€ media-provider.ts          # Types + factory + orchestrator
โ”œโ”€โ”€ schemas/
โ”‚   โ””โ”€โ”€ research.ts            # Research notes, cautions, and source schema
โ”œโ”€โ”€ voices.ts                  # TTS voice catalog (loaded from database)
โ”œโ”€โ”€ art-styles.ts              # Art style catalog (loaded from database)
โ”œโ”€โ”€ providers/
โ”‚   โ”œโ”€โ”€ wavespeed.ts           # WaveSpeed implementation
โ”‚   โ”œโ”€โ”€ aihubmix.ts            # AiHubMix implementation
โ”‚   โ”œโ”€โ”€ local-bgm.ts           # Local BGM (random track from database)
โ”‚   โ””โ”€โ”€ wavespeed.test.ts      # Provider tests
โ””โ”€โ”€ graph/nodes/recipe/        # 9 LangGraph agent nodes

Tech stack

Layer Technology
Frontend Next.js 16, React 19, TypeScript, Tailwind CSS 4
AI LangChain + LangGraph
Database Prisma + MySQL 8.4
Queue BullMQ + Redis
Storage MinIO (S3-compatible)
Render FFCreator (FFmpeg-based)
Auth Custom credentials (Prisma-backed)
i18n next-intl (English + Chinese)

Routes

Pages

Route Description
/ Landing page
/chat AI director studio
/chat/[id] Existing conversation
/creation/[id] Creation editor (storyboard preview + export)
/space User workspace
/batch Batch video production
/signin, /signup Authentication

API endpoints

Endpoint Description
/api/agent-chat AI director chat (streaming)
/api/threads Thread CRUD
/api/messages Message CRUD
/api/assets Asset management
/api/recipes/thread/[id] Recipe operations
/api/uploads/init, /complete File upload
/api/render/quick-concat Video render
/api/jobs/[id] Job status

Configuration

Required

Variable Default Description
DATABASE_URL Set in .env.prod MySQL connection string
REDIS_HOST redis Redis host
REDIS_PORT 6379 Redis port
S3_ENDPOINT http://minio:9000 S3-compatible storage endpoint
S3_ACCESS_KEY_ID opendirector S3 access key
S3_SECRET_ACCESS_KEY opendirector-secret S3 secret key
S3_BUCKET open-director S3 bucket name

Media generation

Variable Default Description
WAVESPEED_API_KEY โ€” WaveSpeed API key
WAVESPEED_IMAGE_MODEL nano-banana Text-to-image model for character and location plates
WAVESPEED_IMAGE_TO_IMAGE_MODEL nano-banana-2-edit Image-to-image/edit model for storyboard frames with character references
EDGE_TTS_VOICE zh-CN-XiaoxiaoNeural Local Edge TTS voice

LLM

Variable Default Description
OPENAI_API_KEY โ€” OpenAI-compatible API key
OPENAI_BASE_URL โ€” API base URL
OPENAI_MODEL gpt-4o-mini Model name

Batch mode

Variable Default Description
PEXELS_API_KEY โ€” Pexels API key for stock videos
PIXABAY_API_KEY โ€” Pixabay API key for stock videos
BATCH_TTS_PROVIDER edge Batch TTS provider
BATCH_EDGE_TTS_VOICE zh-CN-XiaoxiaoNeural Batch Edge TTS voice

Deployment

Docker Compose (recommended)

docker compose up -d --build

This starts all services: MySQL, Redis, MinIO, web app, and render worker.

Manual

pnpm install
pnpm db:generate
pnpm db:migrate
pnpm build
pnpm start

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Commit your changes: git commit -m "feat: add my feature"
  4. Push to the branch: git push origin feature/my-feature
  5. Open a Pull Request

Development guidelines

  • Run pnpm typecheck before committing
  • Run pnpm lint to check code style
  • Follow Conventional Commits for commit messages

Roadmap

  • AI Digital Human โ€” talking-head video generation with digital avatars
  • Manga Drama โ€” comic panel animation with expression switching and camera effects
  • Multi-language voiceover โ€” expand TTS voice catalog with more languages

About

๐ŸŽฌ ๅผ€ๆบ AI ่ง†้ข‘ๅทฅไฝœๅฎค โ€” ไปŽไธ€ๅฅ่ฏๅˆฐๆˆ็‰‡ใ€‚ๆ”ฏๆŒๅˆ›ๆ„ๆจกๅผ๏ผˆAI ๅฏผๆผ”ๅ…จๆต็จ‹่‡ชๅŠจ็”Ÿๆˆ๏ผ‰ๅ’Œๆ‰น้‡ๆจกๅผ๏ผˆๆ‰น้‡็”Ÿไบง็Ÿญ่ง†้ข‘๏ผ‰ใ€‚Open-source AI video studio โ€” from one sentence to final video. Supports creative mode (AI director full workflow) and batch mode (bulk short video production)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages