DubFlow is an audio translation/dubbing workspace. It accepts an uploaded audio/video file or a YouTube URL, transcribes the source audio with local Whisper model, translates the transcript with Sarvam AI, synthesizes target-language speech with Sarvam TTS, stitches the generated segments into a final MP3, and exposes progress, transcripts, and download links through a React dashboard.
- Dubbing workspace for file uploads or YouTube URLs.
- Supported upload formats: MP3, MP4, WAV, M4A, WEBM.
- Target language selection for Indian-language dubbing.
- Source language selection: English, Hindi, or auto-detect.
- Local transcription using the configured OpenAI Whisper model.
- Sarvam AI translation provider.
- Sarvam TTS provider.
- Segment-by-segment translation and synthesis.
- Audio stitching with pydub and FFmpeg.
- Live job progress through Server-Sent Events.
- Redis-backed job storage, recent job listing, and translation cache.
- Unit and integration tests for core backend behavior.
- Python 3.11
- FastAPI
- Pydantic v2 and pydantic-settings
- Redis asyncio client
- sse-starlette for Server-Sent Events
- openai-whisper for local speech-to-text
- Sarvam AI APIs for translation and text-to-speech
- pydub and FFmpeg for audio stitching
- yt-dlp for YouTube audio extraction
- httpx for outbound API calls
- tenacity for retry handling
- structlog for structured logging
- pytest, pytest-asyncio, fakeredis, and httpx for tests
- React 19
- Vite 7
- React Router 7
- Tailwind CSS 4
- lucide-react icons
- ESLint
- Docker and Docker Compose
- Redis 7 Alpine
- Nginx 1.27 Alpine for serving the frontend and proxying backend routes
- A user submits either a media file or a YouTube URL.
- The backend creates a job record in Redis and starts the pipeline as a background task.
- Whisper transcribes the source audio into timed transcript segments.
- Sarvam translates each segment from the source language to the target language.
- Sarvam TTS synthesizes each translated segment into audio bytes.
- pydub stitches the synthesized chunks together with timing-aware gaps.
- The final output is written to
outputs/<job_id>/dubbed_output.mp3. - The job stores the output path, original transcript, translated text, progress, and status.
- The frontend polls job data and listens to SSE progress events for active jobs.
- The user can inspect transcripts, play the dubbed audio, and download the MP3.
| Code | Language |
|---|---|
en |
English |
hi |
Hindi |
auto |
Auto-detect |
| Code | Language |
|---|---|
hi |
Hindi |
bn |
Bengali |
ta |
Tamil |
te |
Telugu |
kn |
Kannada |
mr |
Marathi |
gu |
Gujarati |
pa |
Punjabi |
ml |
Malayalam |
or |
Odia |
DubFlow/
|-- app/
| |-- api/
| | |-- job_store.py # Redis-backed job create/read/update helpers
| | |-- middleware.py # Request ID middleware
| | |-- provider_config.py # Provider metadata exposed to the UI
| | `-- routes.py # FastAPI system, config, job, stream, and download routes
| |-- cache/
| | `-- redis_cache.py # Redis connection and translation cache helpers
| |-- pipeline/
| | |-- orchestrator.py # End-to-end transcription -> translation -> TTS -> stitch flow
| | |-- stitcher.py # Audio stitching with pydub
| | |-- transcription.py # Transcription provider fallback wrapper
| | |-- translation.py # Translation provider and cache pipeline
| | `-- tts.py # TTS provider pipeline
| |-- providers/
| | |-- base.py # Provider interfaces
| | |-- transcription/
| | | `-- whisper_local.py # Local Whisper provider
| | |-- translation/
| | | `-- sarvam.py # Sarvam translation provider
| | `-- tts/
| | `-- sarvam.py # Sarvam TTS provider
| |-- config.py # Environment-driven settings
| |-- errors.py # Pipeline and provider error types
| |-- logging_config.py # structlog setup
| |-- main.py # FastAPI app factory and lifespan
| `-- models.py # Pydantic models and typed language/provider names
|-- frontend/
| |-- public/
| | `-- favicon.svg
| |-- src/
| | |-- api/
| | | `-- dubflow.js # Frontend API client
| | |-- components/
| | | |-- job/ # Job form, cards, progress, transcript viewer
| | | |-- layout/ # Navbar, sidebar, dashboard layouts
| | | `-- ui/ # Reusable UI primitives
| | |-- hooks/ # Config, submit, polling, and SSE hooks
| | |-- lib/ # Constants and className helper
| | |-- pages/ # Landing, dashboard, dub, jobs, detail, languages
| | |-- App.jsx # Client route definitions
| | |-- main.jsx # React entrypoint
| | `-- theme.css # Tailwind import and global design tokens
| |-- Dockerfile # Frontend build and Nginx image
| |-- nginx.conf # SPA serving and backend proxy config
| |-- package.json
| `-- vite.config.js
|-- tests/
| |-- integration/
| | `-- test_api.py # API integration tests
| `-- unit/ # Config, providers, cache, models, pipeline, store tests
|-- docker-compose.yml # Redis, backend, and frontend services
|-- Dockerfile # Backend image
|-- package.json # Root scripts
|-- pyproject.toml # Python package, deps, pytest, and FastAPI config
`-- README.md
| Method | Path | Purpose |
|---|---|---|
GET |
/health |
Health check |
GET |
/api/v1/hello |
Simple API smoke test |
GET |
/api/v1/config |
Frontend provider configuration |
POST |
/api/v1/jobs |
Submit a dubbing job |
GET |
/api/v1/jobs |
List recent jobs |
GET |
/api/v1/jobs/{job_id} |
Get one job status/result |
GET |
/api/v1/jobs/{job_id}/stream |
Stream live job progress with SSE |
GET |
/api/v1/jobs/{job_id}/download |
Download completed MP3 output |
GET |
/docs |
FastAPI Swagger UI |
GET |
/openapi.json |
OpenAPI schema |
POST /api/v1/jobs accepts multipart/form-data.
| Field | Required | Default | Notes |
|---|---|---|---|
target_language |
No | hi |
One of the target language codes |
source_language |
No | en |
en, hi, or auto |
translation_provider |
No | sarvam |
Currently Sarvam |
tts_provider |
No | sarvam |
Currently Sarvam |
file |
Conditional | none | Provide exactly one of file or youtube_url |
youtube_url |
Conditional | none | Provide exactly one of file or youtube_url |
Model and voice names are defined in .env.example. Copy it to .env.
For frontend-only local development, you can set this in frontend/.env if the backend is not served from the same origin:
VITE_API_BASE_URL=http://localhost:8000- Docker
- Node.js 20+
- Python 3.11+
- Redis if running the backend outside Docker.
- FFmpeg if running audio stitching outside Docker.
- A Sarvam API key for translation and TTS.
The backend Dockerfile installs FFmpeg. The configured Whisper model is loaded at runtime.
- Copy
.env.exampleto.envand setSARVAM_API_KEY. - Start the stack:
npm run devThis runs:
redison the internal Docker network.dubflowbackend on port8000inside Docker.frontendNginx server exposed on host port80.
Open:
- App:
http://localhost - API docs through Nginx:
http://localhost/docs - Health check:
http://localhost/health
Useful Docker scripts:
npm run up:backend
npm run up:frontend
npm run logs:backend
npm run logs:frontend
npm run downStart Redis first. For example:
docker compose up -d redisCreate and activate a Python virtual environment:
python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e ".[dev]"Run the backend:
npm run dev:backendBackend URLs:
- API:
http://localhost:8000 - Docs:
http://localhost:8000/docs - Health:
http://localhost:8000/health
Install frontend dependencies:
npm --prefix frontend installIf the backend is running at http://localhost:8000, create frontend/.env:
VITE_API_BASE_URL=http://localhost:8000Run Vite:
npm run dev:frontendOpen the Vite URL printed by the terminal, usually http://localhost:5173.
| Route | Page |
|---|---|
/ |
Landing page |
/dashboard |
Studio dashboard |
/dub |
Create a dubbing job |
/jobs |
List submitted jobs |
/jobs/:id |
Job progress, transcripts, audio player, and download |
/languages |
Supported language coverage |
- Job state is stored in Redis under
dubflow:job:<job_id>. - A sorted Redis index tracks recent jobs.
- Job records have a 24-hour TTL.
- Translation cache entries use keys under
dubflow:translation:*. - Final MP3 outputs are written to
outputs/<job_id>/dubbed_output.mp3. - Docker Compose mounts
./outputsinto the backend container so generated files persist on the host.
Install Python dev dependencies first:
python -m pip install -e ".[dev]"Run all tests:
npm run testRun focused test files:
pytest tests/unit/test_pipeline.py -q
pytest tests/integration/test_api.py -qRun frontend checks:
npm run lint:frontend
npm --prefix frontend run build-
OAuth, login, user accounts, roles, and per-user job ownership are not implemented.
-
Job actions such as cancel, delete, retry, or re-run with different settings are not implemented.
-
Subtitle exports, transcript file downloads, and final dubbed video rendering are not implemented; completed jobs currently produce an MP3 output.
-
Additional provider integrations are not implemented beyond local Whisper transcription and Sarvam translation/TTS.
-
Long-term storage, billing/quotas, and production-grade job history are not implemented; Redis job records expire and outputs are stored on the local filesystem.
Check that SARVAM_API_KEY is present in .env and that Docker Compose is loading the file.
Local Whisper and FFmpeg must be available. The Docker backend image installs them automatically. For local backend runs, install FFmpeg on the host and ensure Whisper dependencies are installed in the Python environment.
YouTube input depends on yt-dlp and FFmpeg. Some URLs can fail because of network, availability, or platform restrictions.
If using Docker Compose, open the app through http://localhost so Nginx can proxy /api, /docs, and /health.
If using Vite locally, set VITE_API_BASE_URL=http://localhost:8000.
Confirm that outputs/<job_id>/dubbed_output.mp3 exists and that the backend process has write permission to the outputs directory.





