Skip to content

prawesh-12/dubflow

Repository files navigation

DubFlow

DubFlow is an audio translation/dubbing workspace. It accepts an uploaded audio/video file or a YouTube URL, transcribes the source audio with local Whisper model, translates the transcript with Sarvam AI, synthesizes target-language speech with Sarvam TTS, stitches the generated segments into a final MP3, and exposes progress, transcripts, and download links through a React dashboard.


Previews

Landing Page

Landing Page

Create New Dub

Dub Completed Preview

Total Dubs Page

Language supported page


Features

  • Dubbing workspace for file uploads or YouTube URLs.
  • Supported upload formats: MP3, MP4, WAV, M4A, WEBM.
  • Target language selection for Indian-language dubbing.
  • Source language selection: English, Hindi, or auto-detect.
  • Local transcription using the configured OpenAI Whisper model.
  • Sarvam AI translation provider.
  • Sarvam TTS provider.
  • Segment-by-segment translation and synthesis.
  • Audio stitching with pydub and FFmpeg.
  • Live job progress through Server-Sent Events.
  • Redis-backed job storage, recent job listing, and translation cache.
  • Unit and integration tests for core backend behavior.

Tech Stack

Backend

  • Python 3.11
  • FastAPI
  • Pydantic v2 and pydantic-settings
  • Redis asyncio client
  • sse-starlette for Server-Sent Events
  • openai-whisper for local speech-to-text
  • Sarvam AI APIs for translation and text-to-speech
  • pydub and FFmpeg for audio stitching
  • yt-dlp for YouTube audio extraction
  • httpx for outbound API calls
  • tenacity for retry handling
  • structlog for structured logging
  • pytest, pytest-asyncio, fakeredis, and httpx for tests

Frontend

  • React 19
  • Vite 7
  • React Router 7
  • Tailwind CSS 4
  • lucide-react icons
  • ESLint

DevOps

  • Docker and Docker Compose
  • Redis 7 Alpine
  • Nginx 1.27 Alpine for serving the frontend and proxying backend routes

How DubFlow Works

  1. A user submits either a media file or a YouTube URL.
  2. The backend creates a job record in Redis and starts the pipeline as a background task.
  3. Whisper transcribes the source audio into timed transcript segments.
  4. Sarvam translates each segment from the source language to the target language.
  5. Sarvam TTS synthesizes each translated segment into audio bytes.
  6. pydub stitches the synthesized chunks together with timing-aware gaps.
  7. The final output is written to outputs/<job_id>/dubbed_output.mp3.
  8. The job stores the output path, original transcript, translated text, progress, and status.
  9. The frontend polls job data and listens to SSE progress events for active jobs.
  10. The user can inspect transcripts, play the dubbed audio, and download the MP3.

Supported Languages

Source Languages

Code Language
en English
hi Hindi
auto Auto-detect

Target Languages

Code Language
hi Hindi
bn Bengali
ta Tamil
te Telugu
kn Kannada
mr Marathi
gu Gujarati
pa Punjabi
ml Malayalam
or Odia

Project Structure

DubFlow/
|-- app/
|   |-- api/
|   |   |-- job_store.py          # Redis-backed job create/read/update helpers
|   |   |-- middleware.py         # Request ID middleware
|   |   |-- provider_config.py    # Provider metadata exposed to the UI
|   |   `-- routes.py             # FastAPI system, config, job, stream, and download routes
|   |-- cache/
|   |   `-- redis_cache.py        # Redis connection and translation cache helpers
|   |-- pipeline/
|   |   |-- orchestrator.py       # End-to-end transcription -> translation -> TTS -> stitch flow
|   |   |-- stitcher.py           # Audio stitching with pydub
|   |   |-- transcription.py      # Transcription provider fallback wrapper
|   |   |-- translation.py        # Translation provider and cache pipeline
|   |   `-- tts.py                # TTS provider pipeline
|   |-- providers/
|   |   |-- base.py               # Provider interfaces
|   |   |-- transcription/
|   |   |   `-- whisper_local.py  # Local Whisper provider
|   |   |-- translation/
|   |   |   `-- sarvam.py         # Sarvam translation provider
|   |   `-- tts/
|   |       `-- sarvam.py         # Sarvam TTS provider
|   |-- config.py                 # Environment-driven settings
|   |-- errors.py                 # Pipeline and provider error types
|   |-- logging_config.py         # structlog setup
|   |-- main.py                   # FastAPI app factory and lifespan
|   `-- models.py                 # Pydantic models and typed language/provider names
|-- frontend/
|   |-- public/
|   |   `-- favicon.svg
|   |-- src/
|   |   |-- api/
|   |   |   `-- dubflow.js        # Frontend API client
|   |   |-- components/
|   |   |   |-- job/              # Job form, cards, progress, transcript viewer
|   |   |   |-- layout/           # Navbar, sidebar, dashboard layouts
|   |   |   `-- ui/               # Reusable UI primitives
|   |   |-- hooks/                # Config, submit, polling, and SSE hooks
|   |   |-- lib/                  # Constants and className helper
|   |   |-- pages/                # Landing, dashboard, dub, jobs, detail, languages
|   |   |-- App.jsx               # Client route definitions
|   |   |-- main.jsx              # React entrypoint
|   |   `-- theme.css             # Tailwind import and global design tokens
|   |-- Dockerfile                # Frontend build and Nginx image
|   |-- nginx.conf                # SPA serving and backend proxy config
|   |-- package.json
|   `-- vite.config.js
|-- tests/
|   |-- integration/
|   |   `-- test_api.py           # API integration tests
|   `-- unit/                     # Config, providers, cache, models, pipeline, store tests
|-- docker-compose.yml            # Redis, backend, and frontend services
|-- Dockerfile                    # Backend image
|-- package.json                  # Root scripts
|-- pyproject.toml                # Python package, deps, pytest, and FastAPI config
`-- README.md

API Overview

Method Path Purpose
GET /health Health check
GET /api/v1/hello Simple API smoke test
GET /api/v1/config Frontend provider configuration
POST /api/v1/jobs Submit a dubbing job
GET /api/v1/jobs List recent jobs
GET /api/v1/jobs/{job_id} Get one job status/result
GET /api/v1/jobs/{job_id}/stream Stream live job progress with SSE
GET /api/v1/jobs/{job_id}/download Download completed MP3 output
GET /docs FastAPI Swagger UI
GET /openapi.json OpenAPI schema

Job Submission Fields

POST /api/v1/jobs accepts multipart/form-data.

Field Required Default Notes
target_language No hi One of the target language codes
source_language No en en, hi, or auto
translation_provider No sarvam Currently Sarvam
tts_provider No sarvam Currently Sarvam
file Conditional none Provide exactly one of file or youtube_url
youtube_url Conditional none Provide exactly one of file or youtube_url

Environment Variables

Model and voice names are defined in .env.example. Copy it to .env.

For frontend-only local development, you can set this in frontend/.env if the backend is not served from the same origin:

VITE_API_BASE_URL=http://localhost:8000

Prerequisites

  • Docker
  • Node.js 20+
  • Python 3.11+
  • Redis if running the backend outside Docker.
  • FFmpeg if running audio stitching outside Docker.
  • A Sarvam API key for translation and TTS.

The backend Dockerfile installs FFmpeg. The configured Whisper model is loaded at runtime.


How to Run

Option 1: Full Stack with Docker Compose

  1. Copy .env.example to .env and set SARVAM_API_KEY.
  2. Start the stack:
npm run dev

This runs:

  • redis on the internal Docker network.
  • dubflow backend on port 8000 inside Docker.
  • frontend Nginx server exposed on host port 80.

Open:

  • App: http://localhost
  • API docs through Nginx: http://localhost/docs
  • Health check: http://localhost/health

Useful Docker scripts:

npm run up:backend
npm run up:frontend
npm run logs:backend
npm run logs:frontend
npm run down

Option 2: Backend Locally

Start Redis first. For example:

docker compose up -d redis

Create and activate a Python virtual environment:

python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e ".[dev]"

Run the backend:

npm run dev:backend

Backend URLs:

  • API: http://localhost:8000
  • Docs: http://localhost:8000/docs
  • Health: http://localhost:8000/health

Option 3: Frontend Locally

Install frontend dependencies:

npm --prefix frontend install

If the backend is running at http://localhost:8000, create frontend/.env:

VITE_API_BASE_URL=http://localhost:8000

Run Vite:

npm run dev:frontend

Open the Vite URL printed by the terminal, usually http://localhost:5173.


Frontend Routes

Route Page
/ Landing page
/dashboard Studio dashboard
/dub Create a dubbing job
/jobs List submitted jobs
/jobs/:id Job progress, transcripts, audio player, and download
/languages Supported language coverage

Data and Storage

  • Job state is stored in Redis under dubflow:job:<job_id>.
  • A sorted Redis index tracks recent jobs.
  • Job records have a 24-hour TTL.
  • Translation cache entries use keys under dubflow:translation:*.
  • Final MP3 outputs are written to outputs/<job_id>/dubbed_output.mp3.
  • Docker Compose mounts ./outputs into the backend container so generated files persist on the host.

Testing

Install Python dev dependencies first:

python -m pip install -e ".[dev]"

Run all tests:

npm run test

Run focused test files:

pytest tests/unit/test_pipeline.py -q
pytest tests/integration/test_api.py -q

Run frontend checks:

npm run lint:frontend
npm --prefix frontend run build

Current Limitations / Not Implemented Yet

  • OAuth, login, user accounts, roles, and per-user job ownership are not implemented.

  • Job actions such as cancel, delete, retry, or re-run with different settings are not implemented.

  • Subtitle exports, transcript file downloads, and final dubbed video rendering are not implemented; completed jobs currently produce an MP3 output.

  • Additional provider integrations are not implemented beyond local Whisper transcription and Sarvam translation/TTS.

  • Long-term storage, billing/quotas, and production-grade job history are not implemented; Redis job records expire and outputs are stored on the local filesystem.


Troubleshooting

Sarvam requests fail with auth errors

Check that SARVAM_API_KEY is present in .env and that Docker Compose is loading the file.

Jobs stay failed during transcription

Local Whisper and FFmpeg must be available. The Docker backend image installs them automatically. For local backend runs, install FFmpeg on the host and ensure Whisper dependencies are installed in the Python environment.

YouTube jobs fail

YouTube input depends on yt-dlp and FFmpeg. Some URLs can fail because of network, availability, or platform restrictions.

Frontend cannot reach the backend

If using Docker Compose, open the app through http://localhost so Nginx can proxy /api, /docs, and /health.

If using Vite locally, set VITE_API_BASE_URL=http://localhost:8000.

Completed job has no downloadable file

Confirm that outputs/<job_id>/dubbed_output.mp3 exists and that the backend process has write permission to the outputs directory.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors