# Video Translator with Voice Cloning

This notebook demonstrates how to use the Video Translator project to translate videos with voice cloning and advanced synchronization.

**Version:** 0.3.0

## Features
- Automated speech-to-text transcription using Whisper
- Machine translation of transcribed text using MBart
- Voice cloning and text-to-speech synthesis with Tortoise-TTS
- Separation of vocals and background music using Demucs
- Advanced synchronization with natural pause detection
- Comprehensive synchronization metrics analysis

## 1. Installation

First, let's clone the repository and install the required dependencies:

In [None]:
# Clone the repository
!git clone https://github.com/yourusername/video-translator.git
%cd video-translator

# Install dependencies
!pip install -r requirements.txt

## 2. Upload Video

Upload the video you want to translate:

In [None]:
from google.colab import files
import os

# Create directory for input files
!mkdir -p input_files

# Upload video file
uploaded = files.upload()
input_video_path = list(uploaded.keys())[0]
!mv "$input_video_path" input_files/
input_video_path = f"input_files/{input_video_path}"
print(f"Video uploaded to: {input_video_path}")

## 3. Upload Voice Samples (Optional)

If you want to clone a specific voice, upload voice samples (WAV files):

In [None]:
# Create directory for voice samples
!mkdir -p voice_samples

# Upload voice samples
print("Upload voice samples (WAV files):")
uploaded_voices = files.upload()
for filename in uploaded_voices.keys():
    !mv "$filename" voice_samples/
    print(f"Voice sample uploaded: {filename}")

voice_samples_dir = "voice_samples"

## 4. Configure Synchronization Options

Set up synchronization options for better alignment between original and translated audio:

In [None]:
import json

# Define synchronization options
sync_options = {
    "max_speed_factor": 1.3,        # Maximum speed up factor
    "min_speed_factor": 0.8,        # Minimum slow down factor
    "pause_threshold": -35,         # dB threshold for pause detection
    "min_pause_duration": 150,      # Minimum pause duration in ms
    "adaptive_timing": True,        # Use adaptive timing based on language
    "preserve_sentence_breaks": True # Preserve pauses between sentences
}

# Save synchronization options to a file
with open("sync_config.json", "w") as f:
    json.dump(sync_options, f, indent=2)

print("Synchronization options saved to sync_config.json")

## 5. Translate Video

Now, let's translate the video using the autodub.py script:

In [None]:
# Define output path
output_video_path = "output_video.mp4"

# Define source and target languages
source_lang = "it"  # Italian
target_lang = "en"  # English

# Run the translation script
!python autodub.py \
    --input "$input_video_path" \
    --output "$output_video_path" \
    --source-lang "$source_lang" \
    --target-lang "$target_lang" \
    --voice-samples "$voice_samples_dir" \
    --sync-config "sync_config.json" \
    --keep-temp

## 6. Analyze Synchronization Metrics

Let's analyze the synchronization metrics to evaluate the quality of the translation:

In [None]:
import json
import matplotlib.pyplot as plt
import numpy as np

# Load synchronization metrics
with open("sync_metrics.json", "r") as f:
    metrics = json.load(f)

# Print synchronization metrics
print("Synchronization Quality Metrics:")
print(f"  Overall alignment score: {metrics['overall_alignment_score']:.2f}")
print(f"  DTW score: {metrics['dtw_score']:.2f}")
print(f"  Average timing error: {metrics['avg_timing_error']:.2f} ms")
print(f"  Max timing error: {metrics['max_timing_error']:.2f} ms")
print(f"  Percentage of well-aligned segments: {metrics['percent_well_aligned']:.1f}%")

# Plot segment scores
segment_scores = [segment['sync_score'] for segment in metrics['segments']]
segment_delays = [segment['delay'] for segment in metrics['segments']]

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))

# Plot sync scores
ax1.bar(range(len(segment_scores)), segment_scores, color='skyblue')
ax1.set_title('Segment Synchronization Scores')
ax1.set_xlabel('Segment Index')
ax1.set_ylabel('Sync Score')
ax1.axhline(y=70, color='r', linestyle='--', label='Good Sync Threshold')
ax1.legend()

# Plot timing delays
ax2.bar(range(len(segment_delays)), segment_delays, color='lightgreen')
ax2.set_title('Segment Timing Delays')
ax2.set_xlabel('Segment Index')
ax2.set_ylabel('Delay (ms)')
ax2.axhline(y=0, color='k', linestyle='-')
ax2.axhline(y=100, color='r', linestyle='--', label='Acceptable Delay Threshold')
ax2.axhline(y=-100, color='r', linestyle='--')
ax2.legend()

plt.tight_layout()
plt.show()

## 7. Visualize and Download Translated Video

Finally, let's visualize and download the translated video:

In [None]:
from IPython.display import HTML
from base64 import b64encode

# Display the video
mp4 = open(output_video_path, 'rb').read()
data_url = f"data:video/mp4;base64,{b64encode(mp4).decode()}"
HTML(f"""
<video width="640" height="360" controls>
  <source src="{data_url}" type="video/mp4">
</video>
""")

In [None]:
# Download the translated video
files.download(output_video_path)

# Download the synchronization visualization (if available)
if 'visualization_file' in metrics and os.path.exists(metrics['visualization_file']):
    files.download(metrics['visualization_file'])