π Third Place at HackAI x Stan
AI-powered video editor with automatic transcription, timeline editing, and advanced effects like text-behind-person overlays.
Zinc provides a complete video editing workflow:
- Automatic Transcription using faster-whisper with word-level timestamps
- Timeline-Based Editing with visual clip management
- AI-Assisted Editing using Claude/Gemini for intelligent suggestions
- LangGraph Agent for orchestrating complex video editing workflows
- Text Behind Person effect using MediaPipe segmentation
- Color Grading with preset looks
- Remotion Integration for advanced video compositions
- Transcription: Upload videos and get accurate transcripts with word-level timing
- Timeline Editor: Visual timeline for arranging clips with drag-and-drop
- Text Behind Person: Overlay text that appears behind the subject using real-time person segmentation
- Color Grades: Apply cinematic color presets to your videos
- Streaming Subtitles: Word-by-word subtitle rendering synced to speech
- Audio Tracks: Add background music with volume control
- Transitions: Fade, dissolve, and other transition effects between clips
- Frontend: Next.js 16, React 19, Tailwind CSS, Remotion Player
- Backend: FastAPI, faster-whisper, FFmpeg, MediaPipe
- AI: Anthropic Claude SDK, Google Gemini SDK
- Workflow: LangGraph for orchestrating video editing operations
- Video Processing: FFmpeg for compositing, Remotion for effects
The video editing pipeline is orchestrated by a LangGraph agent that manages the entire workflow:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LangGraph Workflow β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β START βββββΆβ Extract βββββΆβ Concatenate β β
β β β β Clips β β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β β β
β βΌ βΌ β
β βββββββββββββββββ βββββββββββββββ β
β β For each clip β β Mix Audio β β
β β (parallel): β β Tracks β β
β β β’ FFmpeg cut β βββββββββββββββ β
β β β’ Color grade β β β
β β β’ Subtitles β βΌ β
β β β’ Overlays β βββββββββββββββ β
β β β’ Transitions β β Cleanup βββββΆ END β
β βββββββββββββββββ βββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Nodes:
- Extract Clips: Processes each clip in parallel - applies FFmpeg cuts, color grading, streaming subtitles, text-behind-person overlays, and speed adjustments
- Concatenate: Joins all processed clips with transitions, mixes in audio tracks
- Cleanup: Removes temporary files
- Python 3.9+
- Node.js 18+
- FFmpeg
git clone https://github.com/yourusername/zinc.git
cd zinccd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Start the server
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000cd frontend
npm install
# Configure environment (optional)
cp .env.local.example .env.local
# Start dev server
npm run dev# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# Windows
# Download from https://ffmpeg.org/download.htmlVisit http://localhost:3000 to start editing!
Upload one or more video files. They'll be automatically transcribed with word-level timestamps.
- Drag clips to rearrange
- Trim clips by adjusting start/end times
- Select transcript segments to create clips
Text Behind Person: Add text overlays that appear behind the subject using AI-powered person segmentation. Customize:
- Text content and position
- Font size and color
- Feathering and threshold for segmentation quality
Color Grading: Apply preset color grades to change the mood of your video.
Subtitles: Enable streaming subtitles that highlight words as they're spoken.
Process your edit plan and download the final video.
zinc/
βββ backend/ # FastAPI backend
β βββ app/
β β βββ main.py # API endpoints (transcription, editing)
β β βββ video_editor.py # FFmpeg-based video processing
β β βββ ffmpeg_compositor.py
β β βββ person_segmentation.py
β βββ requirements.txt
β
βββ frontend/ # Next.js frontend
β βββ app/
β β βββ page.tsx # Main editor interface
β β βββ video-editor/ # Video editor page
β β βββ components/ # React components
β β βββ lib/ # Utilities and API client
β β βββ api/ # API routes
β βββ package.json
β
βββ remotion-renderer/ # Remotion video effects
β βββ src/
β
βββ packages/ # Shared packages
β βββ remotion-shared/
β
βββ Documentation/ # Additional docs
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/transcribe |
POST | Transcribe a video/audio file |
/transcribe/batch |
POST | Transcribe multiple files |
/edit/process |
POST | Process an edit plan |
/audio/upload |
POST | Upload background audio |
WHISPER_MODEL=base # tiny, base, small, medium, large-v2, large-v3
WHISPER_DEVICE=auto # auto, cpu, cuda
WHISPER_COMPUTE_TYPE=auto # auto, int8, float16, float32NEXT_PUBLIC_API_URL=http://localhost:8000The text-behind-person feature uses MediaPipe's selfie segmentation to create a mask of the person in the video. Text is then composited behind this mask, creating the illusion that text is behind the subject.
Parameters:
threshold: Segmentation confidence threshold (0-1)feather_px: Edge softening in pixelstemporal_smoothing: Frame-to-frame smoothing for stable masks
cd backend
pytest# Backend
cd backend
black app/
isort app/
# Frontend
cd frontend
npm run lintMIT
Contributions welcome! Please open an issue first to discuss what you'd like to change.