<a target="_blank" href="https://colab.research.google.com/github/keatonkraiger/Whisper-Transcription-Tutorial/blob/main/Whisper_Tutorial.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

<a target="_blank" href="https://youtu.be/i4Sgg-ptRzs">
  <img src="https://upload.wikimedia.org/wikipedia/commons/e/ef/Youtube_logo.png" alt="Tutorial Video" width="24" height="24"/>
</a>


- Click the above button to open this notebook in Google Colab. 
- We recommend then making a copy of the notebook to your Google Drive so you can save your work (File -> Save a copy in Drive).
- You may also view the accompanying supplamentary tutorial video [here](https://youtu.be/i4Sgg-ptRzs)

# Using OpenAI's Whisper
OpenAI's Whisper is a general-purpose speech recognition model described in their 2022 [paper](https://arxiv.org/abs/2212.04356). This notebook is a practical introduction on how to use Whisper in Google Colab.

Before diving into Whisper, it's important to set up your environment correctly. This guide will walk you through the process, ensuring that even if you're not technically inclined, you'll be able to use Whisper effectively. Note that you don't need to run Whisper in Colab, we are using it here for convenience and the ability to run the model on a GPU.

## Setting Up Google Colab
Google Colab provides a convenient platform to run Python code in the cloud, with access to powerful computing resources, including GPUs. To ensure your Colab notebook runs smoothly, it's recommended to enable GPU acceleration which will speed up your transcription. Note however that a GPU is not strictly necessary to use Whisper. Here's how you can enable it on Colab:


1.   Click on 'Runtime' in the top menu.
2.   Select 'Change runtime type'.
3.   In the dialog that appears, under 'Hardware accelerator', choose 'GPU' (the type doesn't matter so much right now)
4.   Click 'Save'.

By enabling the GPU, your notebook will run more efficiently, especially when dealing with large models like Whisper.

We can also see the size of our GPU's VRAM with the `!nvidia-smi` command in order to see how large of a model we can use.

In [None]:
!nvidia-smi

### Important Note

In the following cells, you will often see an `!` symbol before the text/commands. This is because Colab cells expect *code*.

By using `!`, we are telling Colab we typing a command instead of a piece of code! If you did not include a `!`, Colab would assume you are running (Python) code

In [None]:
print('Hello world!')

## Getting Started with Whisper

Whisper is available through OpenAI's GitHub repository. To use Whisper, you need to install it along with its dependencies. This guide will take you through the process step-by-step, ensuring a smooth setup. You may follow along in parallel looking at the Github as well! First, to install Whisper:



1.   **Install Whisper**: Run the command !pip install -U openai-whisper in a Colab cell to install the latest release of Whisper.
2.   **Install FFMPEG**: Whisper requires FFMPEG for audio processing. Use the command !apt install ffmpeg in Colab to install it.
3.   **Additional Dependencies:** In some cases, you might need additional dependencies like setuptools-rust. If you encounter any installation errors, run !pip install setuptools-rust.



In [None]:
!pip install -U openai-whisper
!apt install ffmpeg
!pip install setuptools-rust

## Available Models

Whisper comes with several models, each offering a trade-off between speed and accuracy. Depending on your task, you can choose the model that best fits your needs. Importantly, the *size* of the model (Required VRAM) will be dictated by the GPU running the code **if** you are using a GPU.

In this case, we will use the *medium* sized model.

## Get your audio file ready!

Of course, our goal is to transcripe an audio file. Thus, we will need to upload our file to colab. This can be achieved through

1. Clicking the folder icon on the left sidebar at the bottom.
2. Clicking the upload file button or dragging and dropping your file into the left side bar.

**Be aware**: <ins>Colab sessions *will not* save files that have been uploaded or created. Be sure to save files that are created during sessions before exiting them.</ins>

**File Upload**: in the case you are uploading large files, you may instead want to mount your Google Drive to colab. You can then move files from Drive into Colab, which is often faster than uploading them individually. This can be achieved by clicking the Mount Drive button at the top left sidebar.

## Using Whisper

Once again, following the Github documentation, we can run Whisper through command-line arguments.

Let's first test whisper through the `whisper --help` command-line argument:

In [None]:
!whisper --help

The above help command lists command-line arguments we can use. Most will be unnecessary for our case, while some may be more applicable such as `--output_format` or `--language`.

Let's now try and transcribe an audio file! The following command will transcribe speech in audio file `Audio File.mp3`. We will use the arguments

1.   `--model medium`: specify the model size
2.   `--task transcribe`: to do transcription
3.   `--output_dir transcription`: to save the files in the directory transcription
4.   `--output_format all`: give transcription in all formats. Otherwise, you may select the one your prefer.

Some caveats


*   If you are running this without a GPU, you should include: `--device cpu`
*   If you're input file is not in English, you can specify language with: `--language x` where x is the language in the audio file.


We must specify the location of the audio file. In this case it's in `/content/AllStar.mp3`. We can determine this by rightclicking the audiofile and Copying the Path.

In [None]:
!whisper /content/AllStar.mp3 --model medium --task transcribe --output_dir transcription --output_format all --device cpu

Once the cell has completed, we can download the transcription files that are saved to the `transcription` directory. 

Colab doesn't allow you to download entire directories so we can instead `zip` it and download the zip file.

In [None]:
!zip -r transcriptions.zip transcription

## Helpful Commands



*   Often, you fill note want the transcription in every format. It may instead be better to select the most usable one for your case and just use that as the `--output_format`
*   You can do multiple audio files at a time. You would do the same command, but include two or more audio files (e.g. `!whisper audio1.mp3 audio2.mp3 ...`)
* Whisper may also be used for translation by setting `--task translate`



## Closing Remarks

Using Whisper in Colab is convenient because it allows you to utilize a GPU for free! However, you can also use Whisper without a GPU on your local computer. To do this, you should follow the steps in Whisper's [Github Setup](https://github.com/openai/whisper#setup). If you choose to do so, some considerations



1.   You will need to install Python to run whisper. This can be done in your terminal by first installing [brew](https://docs.brew.sh/Installation) with the command `/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"`. We recommend following `HomeBrew's [installation instructions.](https://brew.sh/) or this [guide](https://docs.python-guide.org/starting/install3/osx/).
2.   Assuming brew is installed, you can install python with running `brew install python` inside your terminal.
3.   With Python and brew installed, we recommend making a directory to work in. Inside your terminal, move to your desktop and create a directory: `cd Desktop; mkdir Whisper; cd Whisper`.

Note: if you do wish to work on your personal macbook and do install brew, you will need to also install `Xcode` tools. If given the option, we recommend instead installing `Command Line Tools` as it is a much smaller package.

Finally, while this document is primarily for mac users, the installation will be very similar across platforms. See Whisper's Github for more information.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>
<body>
     <img src="https://media.istockphoto.com/id/1294688589/photo/red-cat-with-blurred-the-poster-in-the-frame-with-the-words-thank-you.jpg?s=612x612&w=0&k=20&c=T84nHSu52sOQvrmnksdDNo2UByqJ7yXn1srkuodXdps=" alt="Page Image">
    <br>
</body>
</html>