The finished script should do the following

1. Open a movie file
2. Extract the audio
3. Transcribe the audio to text data
4. Translate the text data from English to Spanish
5. Speak the voice in Spanish
6. Transcribe the voice back into the original video
7. Concat the transcribed video to the original in sequence or return the newly translated video.

In [2]:
!pip install ffmpeg-python openai-whisper moviepy pydub num2words
!pip install -q TTS
# Uninstall current PyTorch (if needed)
!pip uninstall -y torch torchvision torchaudio
# Install PyTorch 2.2.2 (stable) with CPU support (you can change to CUDA if needed)
!pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cpu

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.21.0+cu124 requires torch==2.6.0, but you have torch 2.2.0 which is incompatible.
nx-cugraph-cu12 25.2.0 requires networkx>=3.2, but you have networkx 2.8.8 which is incompatible.
scikit-image 0.25.2 requires networkx>=3.0, but you have networkx 2.8.8 which is incompatible.[0m[31m
[0mFound existing installation: torch 2.2.0
Uninstalling torch-2.2.0:
  Successfully uninstalled torch-2.2.0
Found existing installation: torchvision 0.21.0+cu124
Uninstalling torchvision-0.21.0+cu124:
  Successfully uninstalled torchvision-0.21.0+cu124
Found existing installation: torchaudio 2.2.0
Uninstalling torchaudio-2.2.0:
  Successfully uninstalled torchaudio-2.2.0
Looking in indexes: https://download.pytorch.org/whl/cpu
Collecting torch==2.2.2
  Downloading https://download.pytorch.org/whl/cpu/torch-2.2.2%2B

In [11]:
import ffmpeg
import io
import tempfile
import whisper
from TTS.utils.radam import RAdam
import torch
#torch.serialization.add_safe_globals({RAdam.__module__ + '.' + RAdam.__name__: RAdam})
from TTS.api import TTS
import os
from num2words import num2words
import re
from moviepy.editor import VideoFileClip, AudioFileClip
from google.colab import drive


drive.mount('/content/drive')
filename = 'BE1_data.mov'
path = '/content/drive/MyDrive/TranslatorAPP/engineering/data'
filepath = os.path.join(path, filename)

class translate_batch_movie:
  def __init__(self, file_path, output_path = None, source_language='en', target_language='es'):
    self.file_path = file_path
    self.output_path = output_path
    self.source_language = source_language
    self.target_language = target_language

  def convert_numbers_to_words(self, text):
    return re.sub(r'\d+', lambda x: num2words(int(x.group()), lang=self.target_language), text)

  def transcribe_video(self,video_path: str, model_name: str = "base") -> str:
    # 1) Use ffmpeg-python to extract audio as raw WAV in-memory
    out, _ = (
        ffmpeg
        .input(video_path)
        # -ar 16000: resample to 16 kHz
        # -ac 1: mono
        # -f wav -acodec pcm_s16le: 16-bit PCM WAV
        .output('pipe:', format='wav', acodec='pcm_s16le', ac=1, ar='16000')
        .run(capture_stdout=True, capture_stderr=True)
    )

    # 2) Wrap bytes in a buffer
    audio_buffer = io.BytesIO(out)

    # 3) Whisper still wants a filename, so write to a NamedTemporaryFile
    #    (this file is deleted as soon as the context exits)
    model = whisper.load_model(model_name)
    with tempfile.NamedTemporaryFile(suffix=".wav") as tmp:
        tmp.write(audio_buffer.read())
        tmp.flush()
        # 4) Transcribe, forcing Spanish
        result = model.transcribe(tmp.name, language=self.target_language)

    return self.convert_numbers_to_words(result["text"])

  def save_audio(self):
    transcription = transcribe_video_to_spanish(self.file_path, model_name="small")
    print(f"Transcription in target language {self.target_language}:")
    print(transcription)

    wav_file = "x_transcribed.wav"
    if self.output_path is None:
      self.output_audio_filepath = os.path.splitext(filepath)[0] + wav_file
    else:
      self.output_audio_filepath = self.output_path+wav_file
    tts = TTS(model_name="tts_models/es/css10/vits", progress_bar=False, gpu=False)#
    tts.tts_to_file(text=transcription, file_path=self.output_audio_filepath)

  def replace_movie_audio(self):
    # Paths
    video_path = self.file_path
    new_audio_path = self.output_audio_filepath
    if self.output_path is None:
      output_path = self.file_path.replace(".mov", "_translatedXyX.mp4")

    # Load video and new audio
    video = VideoFileClip(video_path)
    new_audio = AudioFileClip(new_audio_path)
    # Set new audio
    video_with_new_audio = video.set_audio(new_audio)
    # Export final video
    video_with_new_audio.write_videofile(output_path, codec='libx264', audio_codec='aac')

  def run(self):
    #transcription = transcribe_video_to_spanish(self.filepath, model_name="small")
    self.save_audio()
    self.replace_movie_audio()


t = translate_batch_movie(filepath)
t.run()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).





Transcription in target language es:
 Hola, test. uno, dos, tres. Mi nombre es Joe. Tiene un buen día.
 > tts_models/es/css10/vits is already downloaded.
 > Using model: vits
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > initialization of speaker-embedding layers.
 > initialization of language-emb



MoviePy - Done.
Moviepy - Writing video /content/drive/MyDrive/TranslatorAPP/engineering/data/BE1_data_translatedXyX.mp4










Moviepy - Done !
Moviepy - video ready /content/drive/MyDrive/TranslatorAPP/engineering/data/BE1_data_translatedXyX.mp4


In [54]:
import os
print(os.getcwd())   # Shows the current working directory
print(os.listdir())  # Lists files in the current directory

!cd /content/drive/MyDrive/TranslatorAPP

/content/drive/MyDrive/TranslatorAPP/translatorAPP
['.git', 'BE1.ipynb', 'README.md', 'research_access.txt']


In [53]:
# 1. Clone with token authentication
name = "research_access.txt"
with open(name, 'r') as f:
  token = f.read()
#%cd /content/drive/MyDrive/TranslatorAPP/engineering

repo_url = f"https://{token}@github.com/jkginfinite/translatorAPP.git"

#!git clone -b dev {repo_url}



# 2. Copy notebook
!cp /content/drive/MyDrive/TranslatorAPP/engineering/BE1.ipynb /content/drive/MyDrive/TranslatorAPP/engineering/translatorAPP/
%cd /content/drive/MyDrive/TranslatorAPP/translatorAPP/
# 3. Git config
!git config --global user.email "jkgprofessional@gmail.com"
!git config --global user.name "jkginfinite"

# 4. Commit and push
!git add BE1.ipynb
!git commit -m "Update BE1.ipynb with latest changes"
!git push origin dev

/content/drive/MyDrive/TranslatorAPP/translatorAPP
On branch dev
Your branch is up to date with 'origin/dev'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31mresearch_access.txt[m

nothing added to commit but untracked files present (use "git add" to track)
Everything up-to-date
