An AI agent skill for generating educational videos from a topic or Markdown outline. Built on Manim Community Edition with TTS narration via edge-tts.
Adapted from the Code2Video paradigm (arXiv:2510.01174) into a Claude Code skill format.
Step 1 — Install the skill:
# Into current project:
npx skills add https://github.com/likefallwind/claude2video
# Globally (available in all projects):
npx skills add https://github.com/likefallwind/claude2video --globalStep 2 — Install Python and system dependencies:
# Local install:
bash .agents/skills/claude2video/setup.sh
# Global install:
bash ~/.claude/skills/claude2video/setup.shSupports Ubuntu/Debian (including WSL2), Fedora, and macOS.
Step 3 — Use the skill in Claude Code:
@claude2video 生成一个讲解「光合作用」的教学视频
The skill takes a learning topic string or a Markdown course outline file as input and produces a fully narrated educational video (MP4) through an 8-stage pipeline:
Planner → TTS → Coder → Critic → Audio-Video Merge → Final Assembly
Each section of the course becomes a separate Manim animation scene. Narration audio is synthesized first, and animation timing is then paced to match the audio exactly.
| Dependency | Install |
|---|---|
| Python 3.9+ | — |
| Manim Community v0.19+ | pip install manim |
| edge-tts | pip install edge-tts |
| ffmpeg | system package |
.
├── claude2video/ # Skill directory (installed by npx skills add)
│ ├── SKILL.md # Entry point — 8-stage pipeline execution guide
│ ├── planner.md # P_parse / P_outline / P_storyboard / P_asset prompts
│ ├── coder.md # P_coder + Duration Control + ScopeRefine prompts
│ ├── critic.md # P_refine (layout) + P_aesth (scoring) prompts
│ ├── tts.py # CLI: storyboard.json → audio/ + durations.json
│ ├── teaching_scene.py # TeachingScene base class (6×6 grid system)
│ ├── example_section.py # Working example — manim render -ql example_section.py
│ ├── setup.sh # System dependency installer
│ ├── requirements.txt # Python dependencies
│ └── examples/
│ └── biology_cells.md # Sample Markdown course outline (7th grade biology)
├── CLAUDE.md # Project instructions for Claude Code
├── .gitignore
└── README.md
Output is written to output/{topic}/ (gitignored):
output/{topic}/
├── outline.json # Stage 1 output
├── storyboard.json # Stage 2 output (includes narrations)
├── teaching_scene.py # Copied from skill dir
├── sections/
│ ├── section_1.py # Generated Manim scene
│ └── ...
├── audio/
│ ├── section_1/
│ │ ├── line_1.mp3 # Per-line TTS audio
│ │ └── section_1.mp3 # Merged section audio (with 0.5s gaps)
│ └── durations.json # Per-line durations for animation sync
├── sections_with_audio/
│ ├── section_1.mp4 # Video + audio merged
│ └── ...
├── media/ # Manim render output
└── final_video.mp4 # Final output
Note: Auto-triggering is unreliable — always invoke the skill explicitly for best results.
Option 1 — slash command:
/claude2video
Option 2 — direct mention in prompt:
@claude2video 生成一个讲解「勾股定理」的教学视频
Option 3 — explicit reference:
请用 claude2video skill 生成一个讲解「勾股定理」的教学视频
The skill will ask whether to generate sections in parallel (faster) or sequential (step-by-step review) before starting.
Option A — Topic string:
生成一个讲解「自由落体」的教学视频
Option B — Markdown outline file:
用这个大纲文件生成视频:
my_course.md
The Markdown outline format:
# 课程标题
目标受众:高中生
## 第一节:概念介绍
内容...
## 第二节:公式推导
内容...Section count in the outline is preserved exactly.
| Stage | Role | Description |
|---|---|---|
| 1 | Planner | Parse Markdown outline or generate outline from topic |
| 2 | Planner | Build storyboard (lecture_lines + narrations + animations) |
| 3 | Planner | Select & download reference image assets |
| 4 | TTS | Synthesize narration audio, output durations.json |
| 5 | Coder | Generate + render Manim section code (parallel or sequential) |
| 6 | Critic | Visual layout refinement via frame extraction (up to 3 rounds) |
| 7 | — | Merge section video + audio via ffmpeg |
| 8 | Critic | Optional aesthetic scoring + final concatenation |
All animations are placed using a named grid (A1–F6). The left column is reserved for lecture notes; the right area is the animation canvas.
A1 A2 A3 A4 A5 A6
B1 B2 B3 B4 B5 B6
lecture | C1 C2 C3 C4 C5 C6
D1 D2 D3 D4 D5 D6
E1 E2 E3 E4 E5 E6
F1 F2 F3 F4 F5 F6
Two positioning methods (no manual coordinates):
self.place_at_grid(obj, 'B2')— snap to grid pointself.place_in_area(obj, 'A1', 'C3')— fit within grid region
TTS durations are generated before code generation. The Coder receives line_durations and pads each animation block with self.wait() so video duration matches audio exactly.
Render failures are resolved through escalating scope:
- Line scope — fix the offending line (≤3 attempts)
- Block scope — rewrite the animation block (≤2 attempts)
- Global scope — regenerate the entire scene from scratch
- Paper: Code2Video: Automated Educational Video Generation Leveraging AI Coding Agents
- Reference implementation: https://github.com/showlab/Code2Video