A notebook by [Mark Redito](https://markredito.com) with inspiration  from [Weights and Biases](https://https://wandb.ai/wandb_fc/gentle-intros/reports/OpenAI-Whisper-How-to-Transcribe-Your-Audio-to-Text-for-Free-with-SRTs-VTTs---VmlldzozNDczNTI0).

This notebook uses Open AI's Whisper Model for Audio transcription ([Github](https://https://github.com/openai/whisper))

Model params: `medium.en`. Medium-sized model, with English as the main language. For faster inference use GPU.

In [None]:
# @title Step 1: Connect your google drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# @title Step 2: Specify the values in this section

# @markdown Where's the file coming from?
input_format = "gdrive" #@param ["gdrive", "local"]

# @markdown Enter the path of the audio file to be transcribed. To grab the path, click on the folder icon on the left nav bar, select "drive", then "my drive" and find the folder of your audio file. Select the audio file, right click and choose "Copy Path" and paste it on the field below.
file = "" #@param {type:"string"}

#@markdown Tick checkbox if you'd like to save the transcription as text file
plain = True #@param {type:"boolean"}

#@markdown Tick checkbox if you'd like to save the transcription as an SRT file
srt = True #@param {type:"boolean"}

In [None]:
# @title Step 3: Install dependencies and set configuration
# Dependencies
!pip install -q git+https://github.com/openai/whisper.git

import os, re
import torch
from pathlib import Path

import whisper
from whisper.utils import get_writer

# Use CUDA, if available
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Load the desired model
model = whisper.load_model("medium.en").to(DEVICE)

In [None]:
# @title Step 4: Transcribe
def transcribe_file(model, file, plain, srt, options=None, ref_text=None):

    file_path = Path(file)
    print(f"Transcribing file: {file_path}\n")

    output_directory = file_path.parent

    # Run Whisper
    result = model.transcribe(file, verbose = False, language = "en")

    # Create a default options dictionary
    default_options = {
        "max_line_width": None,
        "max_line_count": None,
        "highlight_words": False
        # Add other default options as needed
    }


    if plain: # new additions here
        txt_path = file_path.with_suffix(".txt")
        print(f"\nCreating text file")

        # Numbering sentences
        sentences = result["text"].split('. ')
        numbered_sentences = [f"{i+1}. {sentence}" for i, sentence in enumerate(sentences)]

        with open(txt_path, "w", encoding="utf-8") as txt:
            txt.write("\n".join(numbered_sentences))
    if srt:
      print(f"\nCreating SRT file")
      srt_writer = get_writer("srt", output_directory)
      if options:
          srt_writer(result, str(file_path.stem), options)
      else:
          srt_writer(result, str(file_path.stem), default_options)

    return result

if input_format == "gdrive":

    # Run Whisper on the specified file
    result = transcribe_file(model, file, plain, srt)
elif input_format == "local":
    # Run Whisper on the specified file
    result = transcribe_file(model, file, plain, srt)

# And, that's it! We're done!
Check your google drive for the output file.

To generate another file go to [Step 2](#scrollTo=Lpz8LdGqJTH_&line=1&uniqifier=1) and input the new file path. Click the "Play" button. Then go straight to [Step 4](#scrollTo=40Z_lFo6LJwX&line=1&uniqifier=1). Then press "Play".

---

