# OpenAI's Whisper

This is a simple example of using OpenAI's Whisper, a general-purpose speech recognition model.

In [None]:
!pip install git+https://github.com/openai/whisper.git
!pip install gradio

## Available models

I copied the following list of available models from [Whisper's repo](https://github.com/openai/whisper):

|  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
|  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~1 GB     |      ~32x      |
|  base  |    74 M    |     `base.en`      |       `base`       |     ~1 GB     |      ~16x      |
| small  |   244 M    |     `small.en`     |      `small`       |     ~2 GB     |      ~6x       |
| medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
| large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |


In [6]:
import whisper
import gradio as gr

model = whisper.load_model("large")

100%|█████████████████████████████████████| 2.87G/2.87G [00:48<00:00, 64.2MiB/s]


We are going to use two functions to **transcribe** and **translate** audio files. 

In [7]:
def transcribe(file):
    options = dict(task="transcribe", best_of=5)
    text = model.transcribe(file, **options)["text"]
    return text.strip()

def translate(file):
    options = dict(task="translate", best_of=5)
    text = model.transcribe(file, **options)["text"]
    return text.strip()

# Gradio interface

Here is a simple Gradio interface that you can use to record audio directly from your computer and a couple of buttons to transcribe and translate that audio.

This interface was inspired by this [HuggingFace's notebook](https://colab.research.google.com/drive/1xO45FeNFBYfN6GyyUr3nEa08S0iHnWKM?usp=sharing).

In [8]:
block = gr.Blocks()

with block:
    with gr.Group():
        audio = gr.Audio(
            show_label=False,
            source="microphone",
            type="filepath"
        )
        with gr.Box():
            with gr.Row().style(equal_height=True):
                transcribe_button = gr.Button("Transcribe")
                translate_button = gr.Button("Translate")
        
        textbox = gr.Textbox(show_label=False)
        
        transcribe_button.click(transcribe, inputs=[audio], outputs=[textbox])
        translate_button.click(translate, inputs=[audio], outputs=[textbox])
 
block.launch()

Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`
Running on public URL: https://13218.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces: https://huggingface.co/spaces


(<gradio.routes.App at 0x7feb5de33310>,
 'http://127.0.0.1:7861/',
 'https://13218.gradio.app')

You can also upload individual audio files and transcribe or translate them using the functions directly.

In [None]:
print(transcribe("mondaymins.mp3"))

Monday Minutes Good morning everyone and welcome to the latest edition of Monday Minutes. In our feature this week, check out our latest Wednesday webinar covering permanent recruitment. The slide deck and video recording are now available on the CXD Google site. In Hub Corner, this week's Wednesday webinar will focus on the RM6194 Backoffice Software Agreement. Make sure to tune in for all the updates. And you can now visit the CCS Service Status page, which can be used to check the status of our various customer portals. In CRM and Salesforce, new automatically created opportunities have been added to the system. And you can catch up on feedback erases. Read the CRM and Salesforce document to get the full update. In Category Updates and Engagement, please review a recent correction from the Aggregation team regarding an upcoming furniture opportunity for customers. The contract notice for RM6313 Demand Management and Renewals DPS has now been published. The DDAT Buyers Community of P