DubFlow

DubFlow is an audio translation/dubbing workspace. It accepts an uploaded audio/video file or a YouTube URL, transcribes the source audio with local Whisper model, translates the transcript with Sarvam AI, synthesizes target-language speech with Sarvam TTS, stitches the generated segments into a final MP3, and exposes progress, transcripts, and download links through a React dashboard.

Previews

Features

Dubbing workspace for file uploads or YouTube URLs.
Supported upload formats: MP3, MP4, WAV, M4A, WEBM.
Target language selection for Indian-language dubbing.
Source language selection: English, Hindi, or auto-detect.
Local transcription using the configured OpenAI Whisper model.
Sarvam AI translation provider.
Sarvam TTS provider.
Segment-by-segment translation and synthesis.
Audio stitching with pydub and FFmpeg.
Live job progress through Server-Sent Events.
Redis-backed job storage, recent job listing, and translation cache.
Unit and integration tests for core backend behavior.

Tech Stack

Backend

Python 3.11
FastAPI
Pydantic v2 and pydantic-settings
Redis asyncio client
sse-starlette for Server-Sent Events
openai-whisper for local speech-to-text
Sarvam AI APIs for translation and text-to-speech
pydub and FFmpeg for audio stitching
yt-dlp for YouTube audio extraction
httpx for outbound API calls
tenacity for retry handling
structlog for structured logging
pytest, pytest-asyncio, fakeredis, and httpx for tests

Frontend

React 19
Vite 7
React Router 7
Tailwind CSS 4
lucide-react icons
ESLint

DevOps

Docker and Docker Compose
Redis 7 Alpine
Nginx 1.27 Alpine for serving the frontend and proxying backend routes

How DubFlow Works

A user submits either a media file or a YouTube URL.
The backend creates a job record in Redis and starts the pipeline as a background task.
Whisper transcribes the source audio into timed transcript segments.
Sarvam translates each segment from the source language to the target language.
Sarvam TTS synthesizes each translated segment into audio bytes.
pydub stitches the synthesized chunks together with timing-aware gaps.
The final output is written to outputs/<job_id>/dubbed_output.mp3.
The job stores the output path, original transcript, translated text, progress, and status.
The frontend polls job data and listens to SSE progress events for active jobs.
The user can inspect transcripts, play the dubbed audio, and download the MP3.

Supported Languages

Source Languages

Code	Language
`en`	English
`hi`	Hindi
`auto`	Auto-detect

Target Languages

Code	Language
`hi`	Hindi
`bn`	Bengali
`ta`	Tamil
`te`	Telugu
`kn`	Kannada
`mr`	Marathi
`gu`	Gujarati
`pa`	Punjabi
`ml`	Malayalam
`or`	Odia

Project Structure

DubFlow/
|-- app/
|   |-- api/
|   |   |-- job_store.py          # Redis-backed job create/read/update helpers
|   |   |-- middleware.py         # Request ID middleware
|   |   |-- provider_config.py    # Provider metadata exposed to the UI
|   |   `-- routes.py             # FastAPI system, config, job, stream, and download routes
|   |-- cache/
|   |   `-- redis_cache.py        # Redis connection and translation cache helpers
|   |-- pipeline/
|   |   |-- orchestrator.py       # End-to-end transcription -> translation -> TTS -> stitch flow
|   |   |-- stitcher.py           # Audio stitching with pydub
|   |   |-- transcription.py      # Transcription provider fallback wrapper
|   |   |-- translation.py        # Translation provider and cache pipeline
|   |   `-- tts.py                # TTS provider pipeline
|   |-- providers/
|   |   |-- base.py               # Provider interfaces
|   |   |-- transcription/
|   |   |   `-- whisper_local.py  # Local Whisper provider
|   |   |-- translation/
|   |   |   `-- sarvam.py         # Sarvam translation provider
|   |   `-- tts/
|   |       `-- sarvam.py         # Sarvam TTS provider
|   |-- config.py                 # Environment-driven settings
|   |-- errors.py                 # Pipeline and provider error types
|   |-- logging_config.py         # structlog setup
|   |-- main.py                   # FastAPI app factory and lifespan
|   `-- models.py                 # Pydantic models and typed language/provider names
|-- frontend/
|   |-- public/
|   |   `-- favicon.svg
|   |-- src/
|   |   |-- api/
|   |   |   `-- dubflow.js        # Frontend API client
|   |   |-- components/
|   |   |   |-- job/              # Job form, cards, progress, transcript viewer
|   |   |   |-- layout/           # Navbar, sidebar, dashboard layouts
|   |   |   `-- ui/               # Reusable UI primitives
|   |   |-- hooks/                # Config, submit, polling, and SSE hooks
|   |   |-- lib/                  # Constants and className helper
|   |   |-- pages/                # Landing, dashboard, dub, jobs, detail, languages
|   |   |-- App.jsx               # Client route definitions
|   |   |-- main.jsx              # React entrypoint
|   |   `-- theme.css             # Tailwind import and global design tokens
|   |-- Dockerfile                # Frontend build and Nginx image
|   |-- nginx.conf                # SPA serving and backend proxy config
|   |-- package.json
|   `-- vite.config.js
|-- tests/
|   |-- integration/
|   |   `-- test_api.py           # API integration tests
|   `-- unit/                     # Config, providers, cache, models, pipeline, store tests
|-- docker-compose.yml            # Redis, backend, and frontend services
|-- Dockerfile                    # Backend image
|-- package.json                  # Root scripts
|-- pyproject.toml                # Python package, deps, pytest, and FastAPI config
`-- README.md

API Overview

Method	Path	Purpose
`GET`	`/health`	Health check
`GET`	`/api/v1/hello`	Simple API smoke test
`GET`	`/api/v1/config`	Frontend provider configuration
`POST`	`/api/v1/jobs`	Submit a dubbing job
`GET`	`/api/v1/jobs`	List recent jobs
`GET`	`/api/v1/jobs/{job_id}`	Get one job status/result
`GET`	`/api/v1/jobs/{job_id}/stream`	Stream live job progress with SSE
`GET`	`/api/v1/jobs/{job_id}/download`	Download completed MP3 output
`GET`	`/docs`	FastAPI Swagger UI
`GET`	`/openapi.json`	OpenAPI schema

Job Submission Fields

POST /api/v1/jobs accepts multipart/form-data.

Field	Required	Default	Notes
`target_language`	No	`hi`	One of the target language codes
`source_language`	No	`en`	`en`, `hi`, or `auto`
`translation_provider`	No	`sarvam`	Currently Sarvam
`tts_provider`	No	`sarvam`	Currently Sarvam
`file`	Conditional	none	Provide exactly one of `file` or `youtube_url`
`youtube_url`	Conditional	none	Provide exactly one of `file` or `youtube_url`

Environment Variables

Model and voice names are defined in .env.example. Copy it to .env.

For frontend-only local development, you can set this in frontend/.env if the backend is not served from the same origin:

VITE_API_BASE_URL=http://localhost:8000

Prerequisites

Docker
Node.js 20+
Python 3.11+
Redis if running the backend outside Docker.
FFmpeg if running audio stitching outside Docker.
A Sarvam API key for translation and TTS.

The backend Dockerfile installs FFmpeg. The configured Whisper model is loaded at runtime.

How to Run

Option 1: Full Stack with Docker Compose

Copy .env.example to .env and set SARVAM_API_KEY.
Start the stack:

npm run dev

This runs:

redis on the internal Docker network.
dubflow backend on port 8000 inside Docker.
frontend Nginx server exposed on host port 80.

Open:

App: http://localhost
API docs through Nginx: http://localhost/docs
Health check: http://localhost/health

Useful Docker scripts:

npm run up:backend
npm run up:frontend
npm run logs:backend
npm run logs:frontend
npm run down

Option 2: Backend Locally

Start Redis first. For example:

docker compose up -d redis

Create and activate a Python virtual environment:

python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e ".[dev]"

Run the backend:

npm run dev:backend

Backend URLs:

API: http://localhost:8000
Docs: http://localhost:8000/docs
Health: http://localhost:8000/health

Option 3: Frontend Locally

Install frontend dependencies:

npm --prefix frontend install

If the backend is running at http://localhost:8000, create frontend/.env:

VITE_API_BASE_URL=http://localhost:8000

Run Vite:

npm run dev:frontend

Open the Vite URL printed by the terminal, usually http://localhost:5173.

Frontend Routes

Route	Page
`/`	Landing page
`/dashboard`	Studio dashboard
`/dub`	Create a dubbing job
`/jobs`	List submitted jobs
`/jobs/:id`	Job progress, transcripts, audio player, and download
`/languages`	Supported language coverage

Data and Storage

Job state is stored in Redis under dubflow:job:<job_id>.
A sorted Redis index tracks recent jobs.
Job records have a 24-hour TTL.
Translation cache entries use keys under dubflow:translation:*.
Final MP3 outputs are written to outputs/<job_id>/dubbed_output.mp3.
Docker Compose mounts ./outputs into the backend container so generated files persist on the host.

Testing

Install Python dev dependencies first:

python -m pip install -e ".[dev]"

Run all tests:

npm run test

Run focused test files:

pytest tests/unit/test_pipeline.py -q
pytest tests/integration/test_api.py -q

Run frontend checks:

npm run lint:frontend
npm --prefix frontend run build

Current Limitations / Not Implemented Yet

OAuth, login, user accounts, roles, and per-user job ownership are not implemented.
Job actions such as cancel, delete, retry, or re-run with different settings are not implemented.
Subtitle exports, transcript file downloads, and final dubbed video rendering are not implemented; completed jobs currently produce an MP3 output.
Additional provider integrations are not implemented beyond local Whisper transcription and Sarvam translation/TTS.
Long-term storage, billing/quotas, and production-grade job history are not implemented; Redis job records expire and outputs are stored on the local filesystem.

Troubleshooting

Sarvam requests fail with auth errors

Check that SARVAM_API_KEY is present in .env and that Docker Compose is loading the file.

Jobs stay failed during transcription

Local Whisper and FFmpeg must be available. The Docker backend image installs them automatically. For local backend runs, install FFmpeg on the host and ensure Whisper dependencies are installed in the Python environment.

YouTube jobs fail

YouTube input depends on yt-dlp and FFmpeg. Some URLs can fail because of network, availability, or platform restrictions.

Frontend cannot reach the backend

If using Docker Compose, open the app through http://localhost so Nginx can proxy /api, /docs, and /health.

If using Vite locally, set VITE_API_BASE_URL=http://localhost:8000.

Completed job has no downloadable file

Confirm that outputs/<job_id>/dubbed_output.mp3 exists and that the backend process has write permission to the outputs directory.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
app		app
frontend		frontend
public		public
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
package.json		package.json
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

DubFlow

Previews

Features

Tech Stack

Backend

Frontend

DevOps

How DubFlow Works

Supported Languages

Source Languages

Target Languages

Project Structure

API Overview

Job Submission Fields

Environment Variables

Prerequisites

How to Run

Option 1: Full Stack with Docker Compose

Option 2: Backend Locally

Option 3: Frontend Locally

Frontend Routes

Data and Storage

Testing

Current Limitations / Not Implemented Yet

Troubleshooting

Sarvam requests fail with auth errors

Jobs stay failed during transcription

YouTube jobs fail

Frontend cannot reach the backend

Completed job has no downloadable file

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages