GitHub - kpr17ch/outtake

An AI-powered video editor you talk to.
Drop your footage, describe your vision, get a finished edit.

Quick Start

# 1. Install Cursor Agent CLI
curl https://cursor.com/install -fsSL | bash

# 2. Start the FFmpeg MCP server
cd services/ffmpeg_mcp
pip install -r requirements.txt
python server.py          # runs on port 8100

# 3. Start the web app
cd app
cp .env.example .env      # set CURSOR_API_KEY
npm install
npm run dev               # http://localhost:3000

# 4. Install Remotion + transcription deps (root)
npm install               # remotion, @elevenlabs/elevenlabs-js

Environment Variables

Variable	Where	Purpose
`CURSOR_API_KEY`	`app/.env`	Cursor Agent CLI authentication
`ELEVENLABS_API_KEY`	`.env`	Transcription (Scribe v2) + sound effects
`REPLICATE_API_TOKEN`	`skills/video-gen/.env`	AI video generation (Wan 2.6)

Set OUTTAKE_AGENT_BACKEND=claude in app/.env to use Claude Code CLI instead of Cursor.

What It Does

Capability	Powered by
Cut, concat, transcode, extract audio	FFmpeg via MCP tools
Animated word-by-word subtitles	ElevenLabs Scribe v2 + Remotion
Motion graphics (wave transitions, kinetic typography)	Remotion
Sound effects from text descriptions	ElevenLabs Text-to-SFX
AI video generation (text-to-video, image-to-video)	Replicate Wan 2.6
Audio normalization, complex filters	FFmpeg (Bash)

Architecture

┌──────────────────────────────────────┐
│  Next.js Web App (app/)              │
│                                      │
│  ┌───────────┐  ┌─────────────────┐  │
│  │ Chat Panel │  │ Video Preview   │  │
│  │ (SSE)     │  │ + Remotion Player│  │
│  └─────┬─────┘  └─────────────────┘  │
│        │        ┌─────────────────┐  │
│        │        │ Timeline        │  │
│        │        └─────────────────┘  │
└────────┼─────────────────────────────┘
         │
    /api/chat (POST)
         │
    Spawns Cursor Agent CLI
    (or Claude Code CLI)
         │
         ├──── FFmpeg MCP Server (port 8100)
         │     probe, cut, concat, transcode,
         │     scan_scenes, mix_sfx, extract_audio
         │
         ├──── Remotion (React video rendering)
         │     Animated subtitles (SubtitleJobPreview)
         │     Motion graphics (OuttakeMotion)
         │     Liquid wave transitions, kinetic typography
         │
         ├──── ElevenLabs API
         │     Scribe v2 transcription → word-level timestamps
         │     Text-to-SFX sound effects generation
         │
         └──── Replicate API
               Wan 2.6 text-to-video / image-to-video

The web app spawns the agent CLI as a subprocess with --print --output-format stream-json. The agent orchestrates all tools autonomously — it analyzes footage with FFmpeg MCP, transcribes speech with ElevenLabs Scribe v2, renders animated subtitles and motion graphics with Remotion, generates sound effects via ElevenLabs, and can create AI video clips through Replicate.

Project Structure

outtake/
├── app/                          Next.js web application
│   └── src/
│       ├── app/
│       │   ├── api/chat/         Agent CLI subprocess + SSE streaming
│       │   └── page.tsx          Main editor layout
│       ├── components/           ChatPanel, Preview, Timeline, MediaBin
│       └── lib/                  useChat hook, cursor-stream-adapter, timecode utils
├── src/                          Remotion compositions
│   ├── OuttakesCaption.tsx       Animated subtitle overlay
│   ├── OuttakeMotion.tsx         Motion graphics (wave, kinetic text, clapperboard)
│   └── Root.tsx                  Composition registry
├── services/ffmpeg_mcp/          FFmpeg MCP server (Python, port 8100)
├── skills/
│   ├── sound-effects/            ElevenLabs SFX generation
│   └── video-gen/                Replicate Wan 2.6 video generation
├── transcribe-pipeline.mjs       ElevenLabs Scribe v2 → aligned.json
├── CLAUDE.md                     Agent system prompt (skills, tools, workflow)
└── SYSTEM_PROMPT.md              Injected into agent at runtime

Tech Stack

Frontend: Next.js 16, React 19, Tailwind CSS, Remotion Player
Agent: Cursor Agent CLI (default) or Claude Code CLI
Video Processing: FFmpeg via MCP (Model Context Protocol)
Transcription: ElevenLabs Scribe v2 (word-level timestamps)
Motion Graphics: Remotion (React-based video rendering)
Sound Effects: ElevenLabs Text-to-SFX API
Video Generation: Replicate Wan 2.6 (text-to-video, image-to-video)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.claude		.claude
.cursor		.cursor
app		app
archive/scripts_legacy		archive/scripts_legacy
scripts		scripts
services		services
skills		skills
src		src
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
SYSTEM_PROMPT.md		SYSTEM_PROMPT.md
logo.png		logo.png
mcp-config.json		mcp-config.json
opencode.json		opencode.json
package-lock.json		package-lock.json
package.json		package.json
remotion.config.ts		remotion.config.ts
transcribe-job-contract.md		transcribe-job-contract.md
transcribe-pipeline.mjs		transcribe-pipeline.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start

Environment Variables

What It Does

Architecture

Project Structure

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Environment Variables

What It Does

Architecture

Project Structure

Tech Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages