Generate TikTok-style videos with voice, captions, avatar, and code overlays — one command.
Free TTS, animated captions, lip-synced avatar, preview intro, built-in script bank. No API key needed.
npx generate-video "JavaScript closures explained in 30 seconds"- Voice — 400+ TTS voices via edge-tts (free, unlimited)
- Captions — word-synced animated captions with outline
- Avatar — lip-synced cartoon avatar (amplitude analysis)
- Preview — branded 1.5s intro frame
- Code overlay — syntax-highlighted code box
- Title overlay — centered, word-wrapped
- Logo — custom PNG in top-left
- Script bank — 12 built-in topics (RAG, JS, etc.)
- Custom colors — background, accent
- Custom dimensions — vertical, landscape, square
- Node.js >= 14
- Python 3 with pip (auto-installs edge-tts + Pillow)
- FFmpeg (
brew install ffmpeg/apt install ffmpeg)
npx generate-video "Your script text here"npx generate-video "Your text" \
--title "My Video" \
--code "const x = 42;" \
--avatar \
--preview \
--logo ./logo.png# List all topics
npx generate-video --topics
# Generate from a built-in topic
npx generate-video --topic 0
npx generate-video --topic 5 --avatar --previewnpx generate-video "Your text" --avatarGenerates a cartoon avatar with 4 mouth positions synced to audio amplitude.
npx generate-video "Your text" --title "My Video" --preview
npx generate-video "Your text" --preview --preview-bg ./background.png
npx generate-video "Your text" --preview --preview-duration 2.0Adds a branded intro frame before the main content. Audio is delayed to sync.
npx generate-video "Bonjour" --voice fr-FR-HenriNeural
npx generate-video "Fast speech" --rate "+30%"
npx generate-video "Deep voice" --pitch "-5Hz"
npx generate-video --voices --lang ennpx generate-video "Your text" --bg-color 1a1a2e --accent-color e94560
npx generate-video "Your text" --width 1920 --height 1080 # Landscape
npx generate-video "Your text" --width 1080 --height 1080 # Squarenpx generate-video "Caption text" --audio ./voiceover.mp3npx generate-video "Your text" --dry-run| Flag | Description | Default |
|---|---|---|
-v, --voice <name> |
TTS voice | en-US-GuyNeural |
-o, --output <file> |
Output path | Auto-generated |
-t, --title <text> |
Title overlay | — |
-c, --code <text> |
Code box overlay | — |
--logo <path> |
Logo image (PNG) | — |
--audio <path> |
Existing audio file | — |
-r, --rate <rate> |
Speech rate | Normal |
-p, --pitch <pitch> |
Voice pitch | Normal |
--avatar |
Enable lip-synced avatar | off |
--preview |
Add preview intro frame | off |
--preview-bg <path> |
Preview background image | — |
--preview-duration <s> |
Preview duration | 1.5 |
--topic <index> |
Use built-in topic | — |
--topics |
List built-in topics | — |
--width <px> |
Video width | 720 |
--height <px> |
Video height | 1280 |
--fps <n> |
Frames per second | 30 |
--bg-color <hex> |
Background color | 0f172a |
--accent-color <hex> |
Accent color | 7c3aed |
--no-captions |
Disable captions | — |
--voices |
List TTS voices | — |
-l, --lang <code> |
Filter voices | — |
--dry-run |
Preview only | — |
- edge-tts generates voice audio with word-level timestamps (free)
- Pillow renders background frame with title, code box, logo
- Pillow renders animated caption frames synced to word timings
- Pillow generates avatar with 4 mouth states (closed, small, medium, wide)
- FFmpeg analyzes audio amplitude for lip-sync
- FFmpeg composites frames + audio into final video
MIT