# üé¨ Supernan ‚Äì Hindi Video Dubbing Pipeline

**End-to-end pipeline: English Video ‚Üí Hindi Dubbed Video with Voice Cloning + Lip Sync**

Stages:
1. Extract 15-second segment (ffmpeg)
2. Transcribe English speech (Whisper)
3. Translate to Hindi (Helsinki-NLP)
4. Synthesize Hindi voices with voice cloning (Coqui XTTS v2)
5. Sync audio durations (ffmpeg atempo)
6. Lip-sync video to Hindi audio (Wav2Lip)
7. Restore face quality (GFPGAN)

**Cost: ‚Çπ0 (Google Colab Free Tier T4 GPU)**


## ‚úÖ Step 0: Check GPU

In [None]:
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv
import torch
print(f'PyTorch: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'GPU: {torch.cuda.get_device_name(0)}')

## üì¶ Step 1: Install Dependencies

In [None]:
# Install system packages
!apt-get update -qq
!apt-get install -y -qq ffmpeg git libsndfile1
print('System packages installed ‚úì')

In [None]:
# Install Python packages (this takes ~3-5 minutes on first run)
!pip install openai-whisper transformers sentencepiece sacremoses -q
!pip install TTS -q
!pip install librosa soundfile pydub -q
!pip install basicsr facexlib realesrgan -q
!pip install opencv-python-headless -q
print('Python packages installed ‚úì')

## üìÇ Step 2: Clone Pipeline Repository

In [None]:
import os

# ‚ö†Ô∏è Replace with your actual GitHub repo URL after pushing
REPO_URL = 'https://github.com/YOUR_USERNAME/supernan-hindi-dubbing.git'
REPO_DIR = '/content/supernan-hindi-dubbing'

if not os.path.isdir(REPO_DIR):
    !git clone {REPO_URL} {REPO_DIR}
    print('Repo cloned ‚úì')
else:
    !cd {REPO_DIR} && git pull
    print('Repo updated ‚úì')

os.chdir(REPO_DIR)
print(f'Working directory: {os.getcwd()}')

## üé• Step 3: Upload Source Video

Upload your video file to Colab or download it from Google Drive.

In [None]:
# Option A: Upload manually
from google.colab import files
uploaded = files.upload()
VIDEO_PATH = list(uploaded.keys())[0]
print(f'Uploaded: {VIDEO_PATH}')

In [None]:
# Option B: Download from Google Drive using gdown
# !pip install gdown -q
# FILE_ID = '1urRXU3HGjL30lXxQakqK_5rVjbH9XW3O'  # Supernan training video ID
# !gdown https://drive.google.com/uc?id={FILE_ID} -O /content/supernan_video.mp4
# VIDEO_PATH = '/content/supernan_video.mp4'

## ‚öôÔ∏è Step 4: Configure Pipeline

In [None]:
# Pipeline configuration
SEGMENT_START = 15    # seconds
SEGMENT_END   = 30    # seconds
WHISPER_MODEL = 'small'  # 'base' for CPU, 'small'/'medium' for T4
ENABLE_FACE_RESTORE = True

print(f'Processing: {SEGMENT_START}s ‚Üí {SEGMENT_END}s ({SEGMENT_END - SEGMENT_START}s clip)')
print(f'Whisper model: {WHISPER_MODEL}')
print(f'Face restoration: {ENABLE_FACE_RESTORE}')

## üöÄ Step 5: Run Pipeline

In [None]:
import subprocess, sys

cmd = [
    sys.executable, 'dub_video.py',
    '--input', VIDEO_PATH,
    '--start', str(SEGMENT_START),
    '--end',   str(SEGMENT_END),
    '--whisper-model', WHISPER_MODEL,
    '--output', '/content/final_dubbed.mp4',
]

if not ENABLE_FACE_RESTORE:
    cmd.append('--no-face-restore')

print('Running pipeline:', ' '.join(cmd))
result = subprocess.run(cmd, check=True)
print('\n‚úÖ Pipeline complete!')

## üé¨ Step 6: Preview Output

In [None]:
from IPython.display import Video, display
import os

output_file = '/content/final_dubbed.mp4'

if os.path.isfile(output_file):
    size_mb = os.path.getsize(output_file) / (1024 * 1024)
    print(f'Output file: {output_file} ({size_mb:.2f} MB)')
    display(Video(output_file, embed=True, width=640))
else:
    print('ERROR: Output file not found. Check pipeline logs above.')

## üíæ Step 7: Download Output

In [None]:
from google.colab import files
files.download('/content/final_dubbed.mp4')

---
## üîß Advanced: Run Individual Stages for Debugging

In [None]:
# Stage 1: Test extraction only
import sys
sys.path.insert(0, REPO_DIR)
from modules.extractor import extract_segment, extract_speaker_ref

vid, aud = extract_segment(VIDEO_PATH, start_sec=15, end_sec=30)
ref = extract_speaker_ref(aud)
print(f'Video clip: {vid}')
print(f'Audio clip: {aud}')
print(f'Speaker ref: {ref}')

In [None]:
# Stage 2: Transcribe and display
from modules.transcriber import transcribe_audio
segments = transcribe_audio(aud, model_size='base')

print('Transcription:')
for seg in segments:
    print(f'  [{seg["start"]:.2f} ‚Üí {seg["end"]:.2f}] {seg["text"]}')

In [None]:
# Stage 3: Translate and display
from modules.translator import translate_segments
segments = translate_segments(segments)

print('Translation:')
for seg in segments:
    print(f'  EN: {seg["text"]}')
    print(f'  HI: {seg.get("hindi_text", "")}\n')

---
## üìä Pipeline Timing Breakdown

In [None]:
import pandas as pd

timing_data = {
    'Stage': ['Extract (ffmpeg)', 'Transcribe (Whisper small)', 'Translate (Helsinki-NLP)', 
              'TTS (XTTS v2, GPU)', 'Audio Sync (ffmpeg)', 'Lip Sync (Wav2Lip GAN)', 'Face Restore (GFPGAN)'],
    'Est. Time (15s clip)': ['~2s', '~8s', '~3s', '~20s', '~2s', '~90s', '~60s'],
    'GPU Required': ['No', 'Optional', 'No', 'Recommended', 'No', 'Yes (strongly)', 'Recommended'],
    'Cost': ['‚Çπ0', '‚Çπ0', '‚Çπ0', '‚Çπ0', '‚Çπ0', '‚Çπ0', '‚Çπ0'],
}

df = pd.DataFrame(timing_data)
df