<a href="https://colab.research.google.com/github/edgarbc/audio_transcriber/blob/main/my_audio_transcriber_whisper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# My audio transcriber

Audio automated transcriber using whisper from openAI.

by Edgar Bermudez - edgar.bermudez@gmail.com

November, 2022.

In [None]:
# to handle audio files
!pip install pydub
from pydub import AudioSegment

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Installing collected packages: pydub
Successfully installed pydub-0.25.1


In [None]:
# install whisper
!pip install git+https://github.com/openai/whisper.git


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-wg8vpt2g
  Running command git clone -q https://github.com/openai/whisper.git /tmp/pip-req-build-wg8vpt2g
Collecting transformers>=4.19.0
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[K     |████████████████████████████████| 5.8 MB 5.1 MB/s 
[?25hCollecting ffmpeg-python==0.2.0
  Downloading ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 51.0 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[K     |████████████████████████████████| 182 kB 74.7 MB/s 
Building wheels for collected packages: whisper
  Build

In [None]:
# in order to access audio files (previously saved into google drive), we mount it
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
from glob import glob

In [None]:
import whisper
model = whisper.load_model("base")

100%|███████████████████████████████████████| 139M/139M [00:03<00:00, 47.6MiB/s]


Parameters and definitions 

In [None]:
data_dir = 'drive/MyDrive/BaileyAndSoda/data/'
sound_file = 'ZOOM_13DEC2021_LR.WAV'
print(data_dir)

drive/MyDrive/BaileyAndSoda/data/


In [None]:
# make sure we are in the right place
!pwd
!ls -lah 'drive/MyDrive/BaileyAndSoda/data/'


/content
total 1.3G
-rw------- 1 root root  43M Dec  3 18:20 MZ000003.WAV
-rw------- 1 root root 317M Dec  3 18:20 MZ000004.WAV
-rw------- 1 root root 143M Dec  3 18:20 MZ000005.WAV
-rw------- 1 root root  69M Dec  3 18:20 MZ000006.WAV
-rw------- 1 root root 168M Dec  3 18:21 MZ000007.WAV
-rw------- 1 root root  46M Dec  3 18:21 MZ000008.WAV
-rw------- 1 root root 454M Dec  3 18:21 MZ000009.WAV
-rw------- 1 root root  57M Dec  3 18:21 MZ000010.WAV


## Load audio file

Assumes that audio files are saved into google drive

In [None]:

#importing file from location by giving its path
sound = AudioSegment.from_mp3(data_dir + sound_file)


## Audio File slicing

Slice the audio file into 10 min (approx) segments,  transcribe them and save them into text files.

In [None]:
# total time in mins of the file
total_mins = sound.duration_seconds/60
print('total duration (mins): ' + str(total_mins))

slice_size = 10 # slice size (mins)

num_slices = int(total_mins / slice_size) + 1

total duration (mins): 71.19170370370371


In [None]:

interval = 10
offset = 20

for i in range(num_slices):

  if (i==0):
    start_time = 1000 * ((i * interval * 60))
  else:   
    start_time = 1000 * ((i * interval * 60) - offset) 
  end_time = 1000 * ((i+1) * interval * 60)
  print(start_time)
  print(end_time)   
  # take the corresponding slice
  sound_slice = sound[start_time:end_time]

  # create a file name
  fname = 'slice_'+str(i) + '.mp3'
  print(data_dir + fname)
  # save it to file 
  sound_slice.export(data_dir + fname, format='mp3')


0
600000
drive/MyDrive/BaileyAndSoda/data/slice_0.mp3
580000
1200000
drive/MyDrive/BaileyAndSoda/data/slice_1.mp3
1180000
1800000
drive/MyDrive/BaileyAndSoda/data/slice_2.mp3
1780000
2400000
drive/MyDrive/BaileyAndSoda/data/slice_3.mp3
2380000
3000000
drive/MyDrive/BaileyAndSoda/data/slice_4.mp3
2980000
3600000
drive/MyDrive/BaileyAndSoda/data/slice_5.mp3
3580000
4200000
drive/MyDrive/BaileyAndSoda/data/slice_6.mp3
4180000
4800000
drive/MyDrive/BaileyAndSoda/data/slice_7.mp3


In [None]:
slice_files = glob(data_dir + '*.mp3')
print(slice_files)

num_slices = len(slice_files)

for slice_file in range(num_slices):
  result = model.transcribe(slice_files[slice_file])

  text_fname = slice_files[slice_file][:-4] + '.txt'
  text_file = open(text_fname, "w")
  n = text_file.write(result['text'])
  text_file.close()
  print(text_fname + ' transcribed!') 


['drive/MyDrive/BaileyAndSoda/data/slice_0.mp3', 'drive/MyDrive/BaileyAndSoda/data/slice_1.mp3', 'drive/MyDrive/BaileyAndSoda/data/slice_2.mp3', 'drive/MyDrive/BaileyAndSoda/data/slice_3.mp3', 'drive/MyDrive/BaileyAndSoda/data/slice_4.mp3', 'drive/MyDrive/BaileyAndSoda/data/slice_5.mp3', 'drive/MyDrive/BaileyAndSoda/data/slice_6.mp3', 'drive/MyDrive/BaileyAndSoda/data/slice_7.mp3']
drive/MyDrive/BaileyAndSoda/data/slice_0.txt transcribed!
drive/MyDrive/BaileyAndSoda/data/slice_1.txt transcribed!
drive/MyDrive/BaileyAndSoda/data/slice_2.txt transcribed!
drive/MyDrive/BaileyAndSoda/data/slice_3.txt transcribed!
drive/MyDrive/BaileyAndSoda/data/slice_4.txt transcribed!
drive/MyDrive/BaileyAndSoda/data/slice_5.txt transcribed!
drive/MyDrive/BaileyAndSoda/data/slice_6.txt transcribed!
drive/MyDrive/BaileyAndSoda/data/slice_7.txt transcribed!


In [None]:
from transformers import pipeline
import gradio as gr

pipe = pipeline("automatic-speech-recognition", model="openai/whisper-small")

def inference(speech_file):
  return pipe(speech_file)["text"]

gr.Interface(inference,gr.Audio(type="filepath"),"text").launch()

Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`

Using Embedded Colab Mode (NEW). If you have issues, please use share=True and file an issue at https://github.com/gradio-app/gradio/
Note: opening the browser inspector may crash Embedded Colab Mode.

To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>

(<gradio.routes.App at 0x7f73edf43bd0>, 'http://127.0.0.1:7860/', None)