<a href="https://colab.research.google.com/github/hoodini/whisper3-formatting-YuvalAI/blob/main/Whisper3_%2B_Subtitles_Formatting_%2B_Translation_(Yuval_Avidani).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **OpenAI's Whisper 3 transcriber + Subtitles Formatting by Yuval Avidani - יובל אבידני**

**Please support with Beer: https://linktree.com/hackit.co.il**

**IMPORTANT: V100/A100 GPU IS REQUIRED TO USE THIS NOTEBOOK! OTHERWISE THE NOTEBOOK WILL CRASH AND WILL SHOW CUDA MEMORY ERROR MESSAGES**

*This notebook has the followings capabilities:*

Select between YouTube URL and Media Files Upload

Select the language of the original language in the media file / YouTube URL

YouTube Videos are downloaded and converted to MP3

File size check is made to adhere to Whisper's file limit size of 25MB

If the file is larger, the notebook uses Smart Chunking

It then  allows to select the subtitles formatting (how many rows and words in each one), transcribes each chunk and concatenate it all to one TXT / SRT file

Optional: translate the TXT / SRT to other language

The files can be downloaded using the last cell

**Instructions to Transcribe from YouTube / Media File:**

Run cells 1-2

Run cell 3 and note to choose your media source (YouTube URL / Upload Media File)

Select the formatting of the desired substitles - number of rows and number of words per subtitle and then run cell 4

Run cell 5 to get transcription in SRT / TXT format

Run cell 6 to download SRT / TXT Note: the files can also be downloaded from the file explorer on the sidebar.

**Instructions to Transcribe from an existing SRT / TXT File:**

Run cells 1-2

Upload your TXT / SRT file by right-clicking with the mouse on the white area under 'files' in the tab and select 'upload' -> select your file

Run cell 6 and select your file, the target language and click 'Translate'

After you'll see the 'Completed' text the files will be available to download from the sidebar directly

You can also run cell 7 and select your file and then 'Download' and it will download it

Note: the files can also be downloaded from the file explorer on the sidebar.

**Enjoy!**
**Yuval Avidani**


# **1. Install dependencies and import packages**

In [6]:
!pip install git+https://github.com/openai/whisper.git
!pip install pydub
!pip install tqdm
!pip install moviepy
!pip install ipywidgets
!pip install pytube
!pip install googletrans==4.0.0-rc1 ipywidgets

import ipywidgets as widgets
from IPython.display import display, clear_output
import whisper
import os
from pydub import AudioSegment
from google.colab import files
import moviepy.editor as mp
from pytube import YouTube
from tqdm.notebook import tqdm
import time
import wave

Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-hbex4v4y
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-hbex4v4y
  Resolved https://github.com/openai/whisper.git to commit ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting googletrans==4.0.0-rc1
  Downloading googletrans-4.0.0rc1.tar.gz (20 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting httpx==0.13.3 (from googletrans==4.0.0-rc1)
  Downloading httpx-0.13.3-py3-none-any.whl (55 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.1/55.1 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
Collecting hstspreload (from httpx==0.13.3->googletrans==4.0.0-rc1)
  Downloading hstspreload-2024.2.1-py3-none-

# **2. Setting up the logic**

In [2]:
def get_wav_duration(filename):
    with wave.open(filename, 'r') as wav:
        frames = wav.getnframes()
        rate = wav.getframerate()
        duration = frames / float(rate)
        return duration  # Duration in seconds

def combine_transcriptions(transcriptions, chunk_filenames):
    combined_segments = []
    time_offset = 0.0  # Time offset in seconds

    for i, transcription in enumerate(transcriptions):
        for segment in transcription['segments']:
            adjusted_segment = segment.copy()
            adjusted_segment['start'] += time_offset
            adjusted_segment['end'] += time_offset
            combined_segments.append(adjusted_segment)

        if i < len(chunk_filenames) - 1:
            # Update time_offset for the next chunk
            chunk_duration = get_wav_duration(chunk_filenames[i])  # Get the duration of the current chunk
            time_offset += chunk_duration

    return {'segments': combined_segments}

def convert_to_wav(filename):
    # Extract file name and extension
    file_name, file_extension = os.path.splitext(filename)
    file_extension = file_extension.lower()

    # Define output WAV filename
    output_filename = f"{file_name}.wav"

    # Process based on file extension
    if file_extension in ['.mp3', '.ogg', '.m4a', '.wav']:
        # For audio files
        audio = AudioSegment.from_file(filename)
        audio.export(output_filename, format="wav")
    elif file_extension in ['.mp4', '.mov', '.avi', '.mpeg']:
        # For video files
        video_clip = mp.VideoFileClip(filename)
        audio_clip = video_clip.audio
        audio_clip.write_audiofile(output_filename)
        audio_clip.close()
        video_clip.close()
    else:
        raise ValueError("Unsupported file format")

    return output_filename

def format_as_srt(segments, rows_per_subtitle=2, words_per_row=8):
    srt_content = []
    seq_number = 1

    for segment in segments:
        start_time = format_timestamp(segment["start"])
        words = segment["text"].split()
        word_index = 0  # Tracks the index of the word in the words list

        while word_index < len(words):
            # Determine the end time for this segment
            segment_length = len(words) - word_index
            next_index = min(word_index + words_per_row * rows_per_subtitle, len(words))  # Calculate end index based on rows and words per row
            end_time = format_timestamp(segment["start"] + (segment["end"] - segment["start"]) * next_index / len(words))

            # Dynamically split the words across the specified number of rows
            transcript_lines = [" ".join(words[i:min(i+words_per_row, len(words))]) for i in range(word_index, next_index, words_per_row)]
            transcript = "\n".join(transcript_lines)

            srt_content.append(f"{seq_number}\n{start_time} --> {end_time}\n{transcript}\n")
            seq_number += 1

            word_index = next_index
            start_time = end_time  # Update start time for the next segment

    return "\n".join(srt_content)


def format_as_text(segments):
    return "\n".join([segment["text"] for segment in segments])

def format_timestamp(seconds):
    hours, remainder = divmod(seconds, 3600)
    minutes, seconds = divmod(remainder, 60)
    milliseconds = int((seconds - int(seconds)) * 1000)
    return f"{int(hours):02}:{int(minutes):02}:{int(seconds):02},{milliseconds:03}"

def split_wav_file(filename, max_size_mb=20):
    chunk_filenames = []
    file_size_mb = os.path.getsize(filename) / 1024 / 1024

    if file_size_mb > 25:
        with wave.open(filename, 'rb') as wav:
            frames_per_second = wav.getframerate()
            channels = wav.getnchannels()
            sampwidth = wav.getsampwidth()
            max_bytes = max_size_mb * 1024 * 1024

            # Calculate bytes per frame
            bytes_per_frame = channels * sampwidth
            # Calculate the maximum number of frames per chunk
            max_frames_per_chunk = max_bytes // bytes_per_frame

            frame_count = wav.getnframes()

            print(f"Splitting: {filename}")
            print(f"Frames per second: {frames_per_second}, Channels: {channels}, Sample width: {sampwidth}")
            print(f"Max frames per chunk: {max_frames_per_chunk}, Total frames: {frame_count}")

            for i in range(0, frame_count, max_frames_per_chunk):
                chunk_filename = f"{filename}_chunk_{i}.wav"
                chunk_filenames.append(chunk_filename)

                with wave.open(chunk_filename, 'wb') as chunk:
                    chunk.setnchannels(channels)
                    chunk.setsampwidth(sampwidth)
                    chunk.setframerate(frames_per_second)
                    frames_to_write = min(max_frames_per_chunk, frame_count - i)
                    chunk.writeframes(wav.readframes(frames_to_write))
                    print(f"Created chunk: {chunk_filename}, Frames: {frames_to_write}")

    return chunk_filenames

# Function to upload a file
def upload_file():
    with output:
        clear_output()
        uploaded = files.upload()
        if uploaded:
            filename = next(iter(uploaded))
            size_mb = os.path.getsize(filename) / (1024 * 1024)
            print(f"Uploaded File: {filename}, Size: {size_mb:.2f} MB")
            return filename
        return None

# Function to download a video from YouTube with progress bar
def download_youtube_video(url):
    with output:
        clear_output()
        yt = YouTube(url, on_progress_callback=on_progress)

        # Initialize the progress bar here
        global progress_bar
        progress_bar = tqdm(total=100, desc='Downloading', unit='%')

        stream = yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').desc().first()
        filename = stream.download()
        size_mb = os.path.getsize(filename) / (1024 * 1024)
        print(f"Downloaded Video: {filename}, Size: {size_mb:.2f} MB")
        return filename

# Progress callback function for YouTube download
def on_progress(stream, chunk, bytes_remaining):
    total_size = stream.filesize
    bytes_downloaded = total_size - bytes_remaining
    percentage_of_completion = (bytes_downloaded / total_size) * 100
    progress_bar.n = percentage_of_completion
    progress_bar.refresh()


# Handlers for the UI elements
def handle_upload_button_click(b):
    global global_filename
    global_filename = upload_file()

    # After uploading, process the file to convert it to WAV if needed
    if global_filename:
        process_file(global_filename)

def handle_download_button_click(b):
    global global_filename
    global_filename = download_youtube_video(youtube_input.value)

    # After downloading, you might want to process the file
    if global_filename:
        process_file(global_filename)

def process_file(filename):
    global global_filename

    # Process the file (e.g., convert to WAV)
    audio, audio_filename = handle_media_file(filename)
    if audio:
        print("Conversion successful.")
        # Update the global filename to the new audio file
        global_filename = audio_filename
    else:
        print("Conversion failed.")

def on_dropdown_change(change):
    if change['new'] == 'upload':
        upload_button.layout.visibility = 'visible'
        youtube_input.layout.visibility = 'hidden'
        download_button.layout.visibility = 'hidden'
    elif change['new'] == 'youtube':
        upload_button.layout.visibility = 'hidden'
        youtube_input.layout.visibility = 'visible'
        download_button.layout.visibility = 'visible'
    else:
        upload_button.layout.visibility = 'hidden'
        youtube_input.layout.visibility = 'hidden'
        download_button.layout.visibility = 'hidden'

def handle_media_file(filename):
    try:
        file_name, file_extension = os.path.splitext(filename)
        file_extension = file_extension.lower()

        if file_extension in ['.mp3', '.wav', '.ogg', '.m4a']:
            return AudioSegment.from_file(filename), filename
        elif file_extension in ['.mov', '.avi', '.mpeg', '.mp4']:
            video = mp.VideoFileClip(filename)
            audio = video.audio
            audio_filename = f"{file_name}.wav"
            audio.write_audiofile(audio_filename)
            return AudioSegment.from_file(audio_filename), audio_filename
        else:
            raise ValueError("Unsupported file format")
    except Exception as e:
        print(f"An error occurred: {e}")
        return None, None

# Function to estimate the chunk duration based on file size
def estimate_chunk_duration(file_size_bytes, total_duration_ms, target_chunk_size_mb=25):
    avg_bitrate = (file_size_bytes * 8) / (total_duration_ms / 1000)  # bits per second
    target_chunk_size_bytes = target_chunk_size_mb * 1024 * 1024  # bytes
    estimated_duration_ms = (target_chunk_size_bytes * 1000) / avg_bitrate  # milliseconds

    # Ensure that the estimated duration is at least 1 millisecond
    return max(1, int(estimated_duration_ms))


# Function to split the audio file into smaller chunks
def split_audio(filename, target_chunk_size_mb=25):
    audio = handle_media_file(filename)
    if not audio:
        print(f"Failed to process the file: {filename}")
        return []

    file_size_bytes = os.path.getsize(filename)
    total_duration_ms = len(audio[0])  # Assuming handle_media_file returns (audio, filename)

    estimated_chunk_duration_ms = estimate_chunk_duration(file_size_bytes, total_duration_ms, target_chunk_size_mb)

    chunks = []
    for i in range(0, total_duration_ms, estimated_chunk_duration_ms):
        chunk = audio[0][i:i + estimated_chunk_duration_ms]
        chunk_filename = f"{filename}_part{i}.wav"
        chunk.export(chunk_filename, format="wav")
        chunks.append(chunk_filename)
        print(f"Created chunk: {chunk_filename}")

    return chunks


# Function to format time for SRT file
def format_time(milliseconds):
    seconds, milliseconds = divmod(milliseconds, 1000)
    minutes, seconds = divmod(seconds, 60)
    hours, minutes = divmod(minutes, 60)
    return f"{int(hours):02}:{int(minutes):02}:{int(seconds):02},{int(milliseconds):03}"

def split_text(text):
    """Split text into chunks with 4-5 words in the first line and 5-6 in the second line."""
    words = text.split()
    lines = []
    current_line = []

    for word in words:
        current_line.append(word)
        # Check if the current line is the first line and has 4-5 words, or the second line with 5-6 words
        if (len(lines) == 0 and len(current_line) >= 4) or (len(lines) == 1 and len(current_line) >= 5):
            lines.append(' '.join(current_line))
            current_line = []

        # Break if two lines are filled
        if len(lines) == 2:
            break

    # Add the remaining words as a separate line if any
    if current_line:
        lines.append(' '.join(current_line))

    return lines

def format_srt_segment(counter, start_time, end_time, text):
    """Format an SRT segment."""
    formatted_text = "\n".join(split_text(text))
    return f"{counter}\n{format_time(start_time)} --> {format_time(end_time)}\n{formatted_text}\n\n"

def transcribe_file(filename):
    """Transcribe an audio file using Whisper."""
    # Extract the file extension
    _, file_extension = os.path.splitext(filename)
    file_extension = file_extension.lower()

    # Print the type of file being processed
    if file_extension in ['.wav']:
        print(f"Processing an audio file: {filename}")
    elif file_extension in ['.mp4']:
        print(f"Processing a video file: {filename}")
    elif file_extension in ['.mp3']:
        print(f"Processing an audio file: {filename}")
    elif file_extension in ['.avi']:
        print(f"Processing a video file: {filename}")
    elif file_extension in ['.ogg']:
        print(f"Processing an audio file: {filename}")
    elif file_extension in ['.mov']:
        print(f"Processing a video file: {filename}")
    else:
        print(f"Processing an unknown file type: {filename}")

    model = whisper.load_model("large-v3")
    result = model.transcribe(filename)
    return result

# **3. Select between YouTube URL or Upload Media File**

In [14]:
# UI Elements
# Define the widgets
dropdown = widgets.Dropdown(
    options=[('Select Option', None), ('Upload File', 'upload'), ('YouTube Video', 'youtube')],
    description='Action:'
)
upload_button = widgets.Button(description="Upload File", layout=widgets.Layout(visibility='hidden'))
youtube_input = widgets.Text(placeholder='Enter YouTube URL here', layout=widgets.Layout(visibility='hidden'))
download_button = widgets.Button(description="Download YouTube Video", layout=widgets.Layout(visibility='hidden'))
output = widgets.Output()

# Assign handlers to buttons and dropdown
upload_button.on_click(handle_upload_button_click)
download_button.on_click(handle_download_button_click)
dropdown.observe(on_dropdown_change, names='value')

# Display UI
display(dropdown, upload_button, youtube_input, download_button, output)

Dropdown(description='Action:', options=(('Select Option', None), ('Upload File', 'upload'), ('YouTube Video',…

Button(description='Upload File', layout=Layout(visibility='hidden'), style=ButtonStyle())

Text(value='', layout=Layout(visibility='hidden'), placeholder='Enter YouTube URL here')

Button(description='Download YouTube Video', layout=Layout(visibility='hidden'), style=ButtonStyle())

Output()

MoviePy - Writing audio in /content/This is not Morgan Freeman  -  A Deepfake Singularity.wav



chunk:   0%|          | 0/1400 [00:00<?, ?it/s, now=None][A
chunk:  26%|██▌       | 357/1400 [00:00<00:00, 3503.39it/s, now=None][A
chunk:  51%|█████     | 714/1400 [00:00<00:00, 3535.68it/s, now=None][A
chunk:  79%|███████▊  | 1102/1400 [00:00<00:00, 3689.04it/s, now=None][A
                                                                      [A

MoviePy - Done.
Conversion successful.


# **4. Choose Formatting (number of rows / words per subtitle)**

In [16]:
#@title Choose formatting options for SRT
rows_per_subtitle = 1 #@param [1, 2, 3, 4, 5] {type:"raw"}
words_per_row = 1 #@param [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] {type:"raw"}

print(f"Rows per subtitle: {rows_per_subtitle}, Words per row: {words_per_row}")


Rows per subtitle: 1, Words per row: 1


# **5. Transcribe and save SRT / TXT Files**

In [17]:
# global_filename = 'notAlone.wav'

uploaded_filename = global_filename
wav_filename = convert_to_wav(uploaded_filename)

# Split the file into chunks if necessary
chunk_filenames = split_wav_file(wav_filename)

if not chunk_filenames:  # No chunks were created, likely due to the file not exceeding the size threshold
    chunk_filenames = [wav_filename]  # Proceed with the original file

all_transcriptions = [transcribe_file(chunk) for chunk in chunk_filenames]

# Combine the transcriptions
combined_result = combine_transcriptions(all_transcriptions, chunk_filenames)

# Format the combined transcriptions
formatted_srt = format_as_srt(combined_result['segments'], rows_per_subtitle=rows_per_subtitle, words_per_row=words_per_row)
formatted_text = format_as_text(combined_result['segments'])

# Save the formatted transcriptions to files
with open("transcription.srt", "w", encoding='utf-8') as srt_file:
    srt_file.write(formatted_srt)

with open("transcription.txt", "w", encoding='utf-8') as text_file:
    text_file.write(formatted_text)

# Add the success message here
print("Transcription process completed successfully! Check the transcription.srt and transcription.txt files.")


Processing an audio file: /content/This is not Morgan Freeman  -  A Deepfake Singularity.wav
Transcription process completed successfully! Check the transcription.srt and transcription.txt files.


# **6. Download results as SRT / TXT**

In [None]:
import ipywidgets as widgets
import base64
from IPython.display import display, Javascript

def create_download_link(filename, content):
    b64 = base64.b64encode(content.encode())
    payload = b64.decode()
    js_download = f"""
    var link = document.createElement('a');
    link.href = "data:text/plain;base64,{payload}";
    link.download = "{filename}";
    document.body.appendChild(link);
    link.click();
    document.body.removeChild(link);
    """
    return Javascript(js_download)

# Function to handle download
def download_file(b):
    format_choice = download_dropdown.value
    filename = f"transcription.{format_choice.lower()}"

    if format_choice == 'SRT':
        content = formatted_srt
    else:  # TXT format
        content = formatted_text

    js = create_download_link(filename, content)
    display(js)

# Dropdown for selecting file format to download
download_dropdown = widgets.Dropdown(
    options=['SRT', 'TXT'],
    description='Download:',
    disabled=False,
)

# Button to trigger the download
download_button = widgets.Button(description="Download File")

# Display the dropdown and button
display(download_dropdown, download_button)

# Bind the button click event to the download function
download_button.on_click(download_file)


Dropdown(description='Download:', options=('SRT', 'TXT'), value='SRT')

Button(description='Download File', style=ButtonStyle())

<IPython.core.display.Javascript object>

# Bonus: Translate to ANY language!

1. Run cell and select the desired file
2. Select the target language you wish to translate to
3. Click Translate and BOOM! you have it!

In [19]:
# Import necessary libraries
from googletrans import Translator, LANGUAGES
import ipywidgets as widgets
from IPython.display import display, clear_output
import os
import re

# Function to list files with specific extensions
def get_files_with_extension(extensions):
    files = [f for f in os.listdir('.') if any(f.endswith(ext) for ext in extensions)]
    if not files:
        return ['No files found']
    return files

# Initialize the progress bar widget
progress_bar = widgets.IntProgress(
    value=0,
    min=0,  # Minimum value
    max=1,  # The max value will be set dynamically
    description='Translating:',
    bar_style='',  # Options: 'success', 'info', 'warning', 'danger', ''
    orientation='horizontal'
)

def translate_text(text, dest_language, update_progress=lambda x: None):
    translator = Translator()
    translated_text = ""

    if '.srt' in file_dropdown.value:
        blocks = re.split('(\n\n|\r\n\r\n)', text)  # Split by double newline to get blocks
        total_blocks = sum(1 for block in blocks if '-->' in block)
        progress_bar.max = total_blocks  # Set the max value of the progress bar
        current_block = 0

        for block in blocks:
            if '-->' in block:
                lines = block.strip().split('\n')
                index_line = lines[0]  # Subtitle index
                timecode_line = lines[1]  # Timecode line
                text_lines = lines[2:]  # Subtitle text lines

                # Joining text lines with a space to preserve natural sentence flow for translation
                joined_text = ' '.join(text_lines).strip()

                if joined_text:
                    try:
                        translated = translator.translate(joined_text, dest=dest_language)
                        # Use the original line breaks for translated_lines
                        translated_lines = '\n'.join(translated.text.split(' '))
                    except Exception as e:
                        translated_lines = str(e)
                else:
                    translated_lines = ''

                # Construct the translated block with correct newline handling
                translated_block = f"{index_line}\n{timecode_line}\n{translated_lines}"
                translated_text += translated_block

                # Add exactly one newline to separate from the next block
                if current_block < total_blocks - 1:
                    translated_text += '\n\n'

                current_block += 1
                update_progress(current_block)
    else:
        lines = text.split('\n')
        total_lines = len(lines)
        progress_bar.max = total_lines
        for i, line in enumerate(lines):
            if line.strip() != "":
                try:
                    translated = translator.translate(line, dest=dest_language)
                    translated_text += translated.text
                except Exception as e:
                    translated_text += str(e)
                # Ensure correct newline handling for non-SRT files
                if i < total_lines - 1:
                    translated_text += '\n'
            else:
                translated_text += '\n'
            update_progress(i + 1)

    return translated_text.rstrip()  # Remove any trailing whitespace for a clean end of the file

# Function to handle file selection and language selection
def on_translate_button_clicked(b):
    with output:
        clear_output(wait=True)
        display(progress_bar)
        print("Translation started, please wait...")

    selected_file = file_dropdown.value
    target_language = language_dropdown.value

    if selected_file == 'No files found':
        with output:
            print("No file selected or no files available.")
        return

    with open(selected_file, 'r', encoding='utf-8') as file:
        content = file.read()

    def update_progress(value):
        progress_bar.value = value

    translated_content = translate_text(content, target_language, update_progress)

    output_file = f"translated_{selected_file.split('.')[0]}_{target_language}.{selected_file.split('.')[-1]}"
    with open(output_file, 'w', encoding='utf-8') as file:
        file.write(translated_content)

    with output:
        print(f"Translated content saved to {output_file}")
        progress_bar.value = 0  # Reset the progress bar

# Creating and displaying widgets
output = widgets.Output()
file_dropdown = widgets.Dropdown(options=get_files_with_extension(['.txt', '.srt']), description='Select file:')
language_dropdown = widgets.Dropdown(options=[(name, code) for code, name in LANGUAGES.items()], description='Target Language:')
translate_button = widgets.Button(description='Translate')
translate_button.on_click(on_translate_button_clicked)

display(file_dropdown, language_dropdown, translate_button, output)


Dropdown(description='Select file:', options=('translated_transcription_yi.srt', 'transcription.txt', 'transcr…

Dropdown(description='Target Language:', options=(('afrikaans', 'af'), ('albanian', 'sq'), ('amharic', 'am'), …

Button(description='Translate', style=ButtonStyle())

Output()