
# Subtitles creation with `whisperx`

In this notebook, we'll undertake a series of operations focused on the management and processing of audio files using the `whisperx` tool. Through the following steps, we'll address the setup of our environment, clean up any pre-existing files, extract data from audio transcriptions, and then convert them into a specific subtitle format.

The `whisperx` tool allows us to process audio files to obtain detailed transcriptions. This notebook serves as a hands-on guide to execute this process from start to finish within the Google Colab environment.


The following commands are used to install specific libraries directly from their GitHub repositories:

- The first command installs the `whisperx` library from the repository maintained by `m-bain`.
- The second command installs the `whisperX_subs` library from the repository maintained by `lrubiorod`.

**Important Reminder**: After installing these libraries, you might encounter a warning indicating that certain packages were previously imported in the runtime. To ensure that you are using the newly installed versions of the packages, it's recommended to **restart the Colab runtime**. Look out for messages like:

***WARNING: The following packages were previously imported in this runtime:
[pydevd_plugins] You must restart the runtime in order to use newly installed versions.***

To restart the runtime, you can click on the "Runtime" menu at the top and select "Restart runtime".

In [None]:
!pip install git+https://github.com/lrubiorod/whisperX_subs
!pip install git+https://github.com/m-bain/whisperx.git

Additionally, to access files from Google Drive directly within the Colab environment, it's necessary to mount the Google Drive. The following lines of code are used for this purpose:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In the code below, we are setting up the path to a specific audio file located in the user's Google Drive.

- `directory_path`: This is the path to the directory where the audio file is located.
- `file_name`: The name of the audio file without its extension.
- `ext`: The extension of the audio file, in this case, `.wav`.
- `path_to_input`: This constructs the full path to the audio file by concatenating the directory, file name, and extension.
- `max_char`: A predefined limit, possibly representing the maximum number of characters to be processed or displayed from the audio's transcription.

Let's set these variables:

In [None]:
# Define the path to the directory containing the audio file in Google Drive
directory_path = "/content/drive/My Drive/[DIRECTORY PATH]/"

# Name of the audio file (without extension)
file_name = "[FILE NAME]"

# Audio file extension
ext = ".wav"

# Construct the full path to the audio file
path_to_input = directory_path + file_name + ext

# Set the maximum number of characters (e.g., for transcription display or processing)
max_char = 50

# Specify the language code for processing.
# For a list of available language codes, see: https://github.com/m-bain/whisperX/blob/main/whisperx/utils.py
source_language = "en"


The code below provides two different commands for processing audio using the `whisperx` tool.

- **Basic Command**: The first command processes the audio file using the default settings.
- **Advanced Command**: The second command (currently commented out) uses advanced models for alignment. To use this advanced command, you need to:
  1. Comment out the first command.
  2. Uncomment the second command.
  3. Provide a Hugging Face access token by replacing `[HUGGING FACE TOKEN]`.

To generate a Hugging Face token, visit [this link](https://huggingface.co/settings/tokens). Ensure that you've accepted the user agreement for the following models to enable Speaker Diarization:
  - [Segmentation](https://huggingface.co/pyannote/segmentation)
  - [Voice Activity Detection (VAD)](https://huggingface.co/pyannote/voice-activity-detection)
  - [Speaker Diarization](https://huggingface.co/pyannote/speaker-diarization)

Let's set up the commands:

In [None]:
# Basic audio processing command
!whisperx "{path_to_input}" \
--model large-v2 \
--language "{source_language}" \
--output_format json \
--output_dir "{directory_path}"

# If you wish to use advanced models for alignment, comment the above command and uncomment below:
# Ensure to provide your Hugging Face token after accepting the user agreements for the necessary models
#hf_token = "[HUGGING FACE TOKEN]"
#!whisperx "{path_to_input}" \
#--model large-v2 \
#--language "{source_language}" \
#--align_model WAV2VEC2_ASR_LARGE_LV60K_960H \
#--batch_size 32 \
#--hf_token $hf_token \
#--output_format json \
#--output_dir "{directory_path}"


The following code accomplishes three main tasks:

1. **Data Extraction**: Retrieves the transcription data from a JSON file that corresponds to the audio file.
2. **Conversion & Translation**: Processes this data to convert it into the SRT format. Additionally, there's a provision for translating the subtitles to a different language (e.g., English to Spanish).
3. **Saving**: After conversion (and optional translation), the result is saved as a new `.srt` file.

Let's proceed with these operations:


In [None]:
# Import required module from the `whisperx_subs` package
import whisperx_subs.whisperx_subs as ws

# Load the content from the associated JSON file
json_path = directory_path + file_name + ".json"
with open(json_path, "r") as file:
    json_content = file.read()

# Specify no languages to translate
langs_to_translate = []
# Uncomment the lines below and specify the languages to translate that you wish to use.
#langs_to_translate = ['es','fr']

# Convert the JSON content into SRT format
srt_results = ws.convert_json_to_srt(json_content, max_char, langs_to_translate)

# Save the converted SRT content to new .srt files
for result, lang in zip(srt_results, [source_language] + langs_to_translate):
    suffix = f"({lang})" if lang != source_language else ""
    srt_path = directory_path + file_name + suffix + ".srt"
    with open(srt_path, "w") as file:
        file.write(result)

print('\nProcess completed!')
