# OpenAI Whisper Notebook

## Section 1 - Notebook setup

The following command will pull and install the latest commit from [OpenAI's Whisper repository](https://github.com/openai/whisper) along with its Python dependencies.

In [None]:
pip install git+https://github.com/openai/whisper.git

You'll also want to set Colab's hardware accelerator to 'GPU'. You can do this by going to 'view resources' (available from the drop-down list next to the RAM/Disk bars) and then selecting 'change runtime type'.

## Section 2 - High level model access

### 2.1 - English to English Transcription

In this sub-section we'll upload one or more audio files containing English speech and transcribe the content of that audio into English text. So first things first, let's upload the audio:

In [None]:
from google.colab import files
uploaded = files.upload() # run this to get an upload widget

Next, we'll load Whisper and ask it to transcribe the audio file we just uploaded:

In [None]:
import whisper

model = whisper.load_model("base.en")
result = model.transcribe("eleanor_oliphant_long.m4a", language="en", fp16=False)
print(result["text"])

### 2.2 French to English Translation

In this sub-section we'll upload one or more audio files containing French speech and translate the content of that audio into English text. Let's upload the audio:

In [None]:
from google.colab import files
uploaded = files.upload() # run this to get an upload widget

Let's first see how Whisper fairs transcribing French speech to French text:

In [None]:
model = whisper.load_model("base")
result = model.transcribe("amelie_original.m4a", language='fr', fp16=False)
print(result["text"])

Now let's see how well it translates French speech to English text:

In [None]:
model = whisper.load_model("base")
result = model.transcribe("amelie_original.m4a", language='fr', task='translate', fp16=False)
print(result["text"])

Let's try the same as above but on a slightly more accurate model:

In [None]:
model = whisper.load_model("small")
result = model.transcribe("amelie_original.m4a", language='fr', task='translate', fp16=False)
print(result["text"])

## Section 3 - Low level model access

Below we'll look at some low level Whisper access using `whisper.decode()` and `whisper.detect_language()`:

In [None]:
model = whisper.load_model('small')

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio('amelie_original.m4a')
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

### 3.1 - Language detection

In [None]:
# detect the spoken language
_, probs = model.detect_language(mel)
lang = max(probs, key=probs.get)
prob = "{0:.0%}".format(max(probs.values()))

# print language that scored the highest liklihood
print(f'Detected language (and probability): {lang}', f'({prob})')

### 3.2 - French to English Translation

In [None]:
# decode the audio
options = whisper.DecodingOptions(language='fr', task='translate')
result = whisper.decode(model, mel, options)

# print the recognized text
print(result.text)