Skip to content

seme-org/open-director

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

OpenDirector

An open-source AI video studio with an 8-agent director pipeline โ€” from a one-line idea to a fully rendered video with voiceover, BGM, and storyboard.

English | ไธญๆ–‡


What is OpenDirector?

OpenDirector is a Docker-first, self-hosted AI video production studio. Describe your idea in one sentence, and a team of 8 specialized AI agents collaborate to produce a complete video โ€” with a storyboard, character designs, voiceover, background music, and rendered output.

Just docker compose up and start creating.


Screenshots

AI Director Chat Batch Production
AI Director Batch
Creation Editor Storyboard Preview
Creation Editor Storyboard Preview

How It Works

Your Idea
   |
   v
[Script Agent] --> [Art Style Agent] --> [Storyboard Agent]
   |                                         |
   v                                         v
[Character Agent]  [Location Agent]    [Voice Agent]  [BGM Agent]
   |                  |                   |              |
   +--------+---------+-------------------+--------------+
            |
            v
     [Media Agent] --> [Render Worker] --> Final Video

8 specialized agents work in a pipeline:

  1. Script Agent โ€” generates the story outline and narrative structure
  2. Art Style Agent โ€” selects from 34 built-in styles (e.g. Futuristic Neon Noir, Dreamscape Watercolor Anime, Documentary Realism)
  3. Storyboard Agent โ€” breaks the story into scenes with shot descriptions and dialogue
  4. Character Agent โ€” designs characters with visual prompts and assigns voice profiles
  5. Location Agent โ€” creates environment concepts for each scene
  6. Voice Agent โ€” assigns TTS voices matched to character personality and gender
  7. BGM Agent โ€” generates background music based on story atmosphere
  8. Media Agent โ€” orchestrates image/voice/music generation into final assets

Each agent is a LangGraph node that streams its output in real-time โ€” you can watch the plan build step by step.


Features

Creative Mode (AI Director Full Workflow)

  • Input one sentence, AI director auto-generates complete plan: brief, story, storyboard, voiceover, images, BGM
  • 34 built-in art styles across 9 categories: Cinematic, Commercial, Futuristic, Retro, Anime, 3D, Illustration, Realistic, Experimental
  • AI-generated story scripts, editable manually
  • AI voiceover with multiple voice options, real-time preview
  • AI background music, auto-generated based on story atmosphere
  • Storyboard preview with image + voiceover + BGM synced playback
  • Support 16:9 / 9:16 / 1:1 aspect ratios
  • Export at 480p / 720p / 1080p

Batch Mode (Short Video Mass Production)

  • Input topics, AI auto-generates multiple scripts, batch produce short videos
  • Configurable clip duration (2-10 seconds), control material switching rhythm
  • Support Chinese and English video scripts
  • Multiple TTS voices with built-in Edge TTS (free), real-time preview
  • Subtitle generation with customizable font, size, color, position, stroke
  • Background music โ€” random or specified local files, adjustable volume
  • Video materials are HD and royalty-free (Pexels / Pixabay), local files also supported
  • Generate multiple output variations at once, pick the best one

General

  • Multiple AI model providers โ€” OpenAI, Google Gemini, DeepSeek, Qwen, MiniMax, Ollama, and more
  • Pluggable media providers โ€” AiHubMix, WaveSpeed, switch via environment variable
  • Docker one-click deploy โ€” docker compose up and you're ready
  • Fully self-hosted โ€” data stays on your server
  • Chinese and English UI

Quick Start

Prerequisites

  • Docker & Docker Compose
  • (Optional) Node.js 20+ for local development

One-command start

git clone https://github.com/seme-org/open-director.git
cd open-director
cp .env.example .env
# Edit .env with your API keys
docker compose up --build

Then open http://localhost:3000.

Default services

Service URL Credentials
App http://localhost:3000 โ€”
MinIO Console http://localhost:9001 opendirector / opendirector-secret
MySQL localhost:3307 See .env.prod
Redis localhost:6379 โ€”

Media Generation Providers

OpenDirector supports multiple media generation providers through a pluggable architecture. Set MEDIA_PROVIDER in .env to choose:

AiHubMix (Recommended)

AiHubMix is a unified AI API platform with free tiers for many models. Register and get an API key at aihubmix.com.

MEDIA_PROVIDER="aihubmix"
AIHUBMIX_API_KEY="sk-your-key"

# Image generation (free options available)
AIHUBMIX_IMAGE_MODEL="gemini-3.1-flash-image-preview-free"
AIHUBMIX_IMAGE_EDIT_MODEL="gemini-3.1-flash-image-preview-free"

# TTS (free via Edge TTS)
AIHUBMIX_TTS_MODEL="edge"
EDGE_TTS_VOICE="zh-CN-XiaoxiaoNeural"

# BGM (uses pre-uploaded tracks from database, randomly selected)

Free model options:

Capability Free Model Notes
Image generation gemini-3.1-flash-image-preview-free Gemini image generation, has free tier
Image editing gemini-3.1-flash-image-preview-free Same model, for character/scene editing
TTS edge Microsoft Edge TTS, completely free
BGM Local tracks Randomly selected from pre-uploaded database tracks
LLM gpt-4.1-free For recipe/script generation

WaveSpeed

WaveSpeed provides high-quality AI models for media generation.

MEDIA_PROVIDER="wavespeed"
WAVESPEED_API_KEY="your-wavespeed-key"

# Optional: free alternatives
WAVESPEED_TTS_MODEL="edge"           # Free Edge TTS
WAVESPEED_MUSIC_MODEL="local"        # Free local tracks from database

Provider comparison

Feature AiHubMix WaveSpeed
Free tier Yes (limited calls) No
Image models Multiple (Gemini, GPT, etc.) Nano Banana, Seedream
TTS Edge TTS (free) or paid models MiniMax or Edge TTS
BGM Local tracks (database) AI-generated or local tracks
Setup Register at aihubmix.com Register at wavespeed.ai

LLM Configuration

The LLM is used for recipe generation, script writing, and the AI director. It uses OpenAI-compatible API format.

OPENAI_API_KEY="your-key"
OPENAI_BASE_URL="https://api.openai.com/v1"
OPENAI_MODEL="gpt-4o-mini"

Supported providers:

  • OpenAI (direct)
  • AiHubMix (https://aihubmix.com/v1)
  • Google Gemini (via OpenAI-compatible endpoint)
  • Any OpenAI-compatible API (OpenRouter, LiteLLM, Ollama, etc.)

Local Development

pnpm install
pnpm db:generate
pnpm dev

This starts the Next.js dev server on http://localhost:3000.

Environment files

File Purpose
.env.example Documented template for all variables
.env Local machine overrides (git-ignored)
.env.prod Docker Compose production defaults

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   Next.js App                   โ”‚
โ”‚  (App Router, React 19, TypeScript, Tailwind)   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚          โ”‚          โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”
    โ”‚ MySQL  โ”‚ โ”‚ Redis  โ”‚ โ”‚ MinIO  โ”‚
    โ”‚   8.4  โ”‚ โ”‚   7    โ”‚ โ”‚  S3    โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚   Worker   โ”‚
              โ”‚ (FFCreator) โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Monorepo structure

open-director/
โ”œโ”€โ”€ apps/
โ”‚   โ”œโ”€โ”€ web/          # Next.js frontend + API routes + 8 AI agents
โ”‚   โ””โ”€โ”€ render/       # BullMQ render worker (FFCreator)
โ”œโ”€โ”€ assets/
โ”‚   โ””โ”€โ”€ fonts/        # Subtitle rendering fonts
โ”œโ”€โ”€ prisma/
โ”‚   โ””โ”€โ”€ schema.prisma # Database schema (voices, art_styles, bgms, etc.)
โ”œโ”€โ”€ docker-compose.yml
โ””โ”€โ”€ package.json

Media provider architecture

apps/web/src/server/agent/
โ”œโ”€โ”€ media-provider.ts          # Types + factory + orchestrator
โ”œโ”€โ”€ voices.ts                  # TTS voice catalog (loaded from database)
โ”œโ”€โ”€ art-styles.ts              # Art style catalog (loaded from database)
โ”œโ”€โ”€ providers/
โ”‚   โ”œโ”€โ”€ wavespeed.ts           # WaveSpeed implementation
โ”‚   โ”œโ”€โ”€ aihubmix.ts            # AiHubMix implementation
โ”‚   โ”œโ”€โ”€ local-bgm.ts           # Local BGM (random track from database)
โ”‚   โ””โ”€โ”€ wavespeed.test.ts      # Provider tests
โ””โ”€โ”€ graph/nodes/recipe/        # 8 LangGraph agent nodes

Tech stack

Layer Technology
Frontend Next.js 16, React 19, TypeScript, Tailwind CSS 4
AI LangChain + LangGraph
Database Prisma + MySQL 8.4
Queue BullMQ + Redis
Storage MinIO (S3-compatible)
Render FFCreator (FFmpeg-based)
Auth Custom credentials (Prisma-backed)
i18n next-intl (English + Chinese)

Routes

Pages

Route Description
/ Landing page
/chat AI director studio
/chat/[id] Existing conversation
/creation/[id] Creation editor (storyboard preview + export)
/space User workspace
/batch Batch video production
/signin, /signup Authentication

API endpoints

Endpoint Description
/api/agent-chat AI director chat (streaming)
/api/threads Thread CRUD
/api/messages Message CRUD
/api/assets Asset management
/api/recipes/thread/[id] Recipe operations
/api/uploads/init, /complete File upload
/api/render/quick-concat Video render
/api/jobs/[id] Job status

Configuration

Required

Variable Default Description
DATABASE_URL Set in .env.prod MySQL connection string
REDIS_HOST redis Redis host
REDIS_PORT 6379 Redis port
S3_ENDPOINT http://minio:9000 S3-compatible storage endpoint
S3_ACCESS_KEY_ID opendirector S3 access key
S3_SECRET_ACCESS_KEY opendirector-secret S3 secret key
S3_BUCKET open-director S3 bucket name

Media generation

Variable Default Description
MEDIA_PROVIDER aihubmix Provider: aihubmix or wavespeed
AIHUBMIX_API_KEY โ€” AiHubMix API key
AIHUBMIX_IMAGE_MODEL gemini-3.1-flash-image-preview-free Image generation model
AIHUBMIX_TTS_MODEL edge TTS model (edge for free)
EDGE_TTS_VOICE zh-CN-XiaoxiaoNeural Edge TTS voice
WAVESPEED_API_KEY โ€” WaveSpeed API key
WAVESPEED_TTS_MODEL edge WaveSpeed TTS model (edge for free)
WAVESPEED_MUSIC_MODEL local WaveSpeed BGM model (local for database tracks)

LLM

Variable Default Description
OPENAI_API_KEY โ€” OpenAI-compatible API key
OPENAI_BASE_URL โ€” API base URL
OPENAI_MODEL gpt-4o-mini Model name

Batch mode

Variable Default Description
PEXELS_API_KEY โ€” Pexels API key for stock videos
PIXABAY_API_KEY โ€” Pixabay API key for stock videos
BATCH_TTS_PROVIDER edge Batch TTS provider
BATCH_EDGE_TTS_VOICE zh-CN-XiaoxiaoNeural Batch Edge TTS voice

Deployment

Docker Compose (recommended)

docker compose up -d --build

This starts all services: MySQL, Redis, MinIO, web app, and render worker.

Manual

pnpm install
pnpm db:generate
pnpm db:migrate
pnpm build
pnpm start

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Commit your changes: git commit -m "feat: add my feature"
  4. Push to the branch: git push origin feature/my-feature
  5. Open a Pull Request

Development guidelines

  • Run pnpm typecheck before committing
  • Run pnpm lint to check code style
  • Follow Conventional Commits for commit messages

Roadmap

  • AI Digital Human โ€” talking-head video generation with digital avatars
  • Manga Drama โ€” comic panel animation with expression switching and camera effects
  • Multi-language voiceover โ€” expand TTS voice catalog with more languages

About

๐ŸŽฌ ๅผ€ๆบ AI ่ง†้ข‘ๅทฅไฝœๅฎค โ€” ไปŽไธ€ๅฅ่ฏๅˆฐๆˆ็‰‡ใ€‚ๆ”ฏๆŒๅˆ›ๆ„ๆจกๅผ๏ผˆAI ๅฏผๆผ”ๅ…จๆต็จ‹่‡ชๅŠจ็”Ÿๆˆ๏ผ‰ๅ’Œๆ‰น้‡ๆจกๅผ๏ผˆๆ‰น้‡็”Ÿไบง็Ÿญ่ง†้ข‘๏ผ‰๏ผŒDocker ไธ€้”ฎ้ƒจ็ฝฒ๏ผ‰ใ€‚๐ŸŽฌ Open-source AI video studio โ€” from one sentence to final video. Supports creative mode (AI director full workflow) and batch mode (bulk short video production)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages