# Web App Demonstrating OpenAI's Whisper Speech Recognition Model

This is a Colab notebook that allows you to upload audio files to [OpenAI's free Whisper speech recognition model](https://openai.com/blog/whisper/).

To use it, choose `Runtime->Run All` from the Colab menu. You can upload your own audio samples using the folder icon on the left of this page. That gives you access to a file system you can upload to by dragging files into it. You can see examples of how to run the transcription in a couple of the cells below.
You can also save the file in your google drive provided we have to give access to save it in our gdrive.

## Install the Whisper Code

In [None]:
! pip install git+https://github.com/openai/whisper.git -q

## Load the ML Model

In [36]:
import whisper

model = whisper.load_model("base")


## Check we have a GPU

You should see the output `device(type='cuda', index=0)` below. If you don't, you may be on a CPU-only Colab instance which will run more slowly. Go to `Runtime->Change Runtime Type` to fix this.

In [None]:
model.device

## Define the Transcribe Function

Now we've loaded the model, and have the code, this is the function that takes an audio file path as an input and returns the recognized text (and logs what it thinks the language is).

In [38]:
def transcribe(audio):

    # load audio and pad/trim it to fit 30 seconds
    audio = whisper.load_audio(audio)
    audio = whisper.pad_or_trim(audio)

    # make log-Mel spectrogram and move to the same device as the model
    mel = whisper.log_mel_spectrogram(audio).to(model.device)

    # detect the spoken language
    _, probs = model.detect_language(mel)
    print(f"Detected language: {max(probs, key=probs.get)}")

    # decode the audio
    options = whisper.DecodingOptions()
    result = whisper.decode(model, mel, options)
    return result.text


In [1]:
import time

## File Upload Facility

Upload your file in the prompt.

In [None]:

import time

from google.colab import files
uploaded = files.upload()



In [None]:
filename = next(iter(uploaded))
print(filename)

Audio(filename)

hard_text = transcribe(filename)
print(hard_text)

In [None]:
from pydub import AudioSegment
from IPython.display import Audio, display
import os

# Load audio
audio = AudioSegment.from_file(filename)

# 5 seconds = 5000 ms
frame_duration = 25 * 1000
num_chunks = len(audio) // frame_duration + (1 if len(audio) % frame_duration > 0 else 0)

# Optional: create a folder for chunks
os.makedirs("chunks", exist_ok=True)

text = ""
# Go through and play each chunk
for i in range(num_chunks):
    start = i * frame_duration
    end = min((i + 1) * frame_duration, len(audio))
    chunk = audio[start:end]

    # Export to WAV file
    chunk_filename = f"chunks/chunk_{i+1}.wav"
    chunk.export(chunk_filename, format="wav")

    # print(f"🔊 Playing chunk {i+1}: {start/1000:.2f}s to {end/1000:.2f}s")
    # display(Audio(chunk_filename))

    text_chunks = transcribe(chunk_filename)
    print(text_chunks)

    text += text_chunks




In [None]:
# print(text)

In [None]:
from google.colab import drive

drive.mount('/content/drive')


file_path = f"/content/drive/My Drive/{filename.split('.mp3')}.txt"
print(file_path)

with open(file_path, 'w') as f:
  f.write(text)