Installing the Libs

In [3]:
!pip install transformers
!pip install datasets
!pip install pydub

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[K     |████████████████████████████████| 5.8 MB 5.3 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 44.7 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[K     |████████████████████████████████| 182 kB 58.1 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.11.1 tokenizers-0.13.2 transformers-4.25.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting datasets
  Downloading datasets-2.8.0-py3-none-any.whl (452 kB)
[K     |████████████████████████████████| 452 kB 4.7 MB/s

Importing Libs

In [103]:
import torch
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import load_dataset


In [31]:
import torchaudio

**WhisperTranscriptorAPI Defination**


In [105]:
class WhisperTranscriptorAPI:
    '''

    This Module is based on OpenAI Whisper for Audio Transcription.
    We need WhisperProcessor and WhisperConditionalGeneration for 
    CTC task i.e. ASR. 

    example:
          whisper_transcriptor=WhisperTranscriptorAPI(model_path='openai/whisper-tiny.en')
          
    '''
    #----------------------- constructor -------------------------------------
    #
    def __init__(self,model_path=''):

        '''

        1) Defining processor for processing audio input for Whisper and
        generate Pytorch token

        2) Put Processed Audio to model and get PyTorch Tensor later this
        tensor will be post processed by Lang Model.

        args:
          model_path: the huggingface repo of whisper-model. 
          ... i.e. for example: openai/whisper-tiny.en 


        '''

        self.model_path = model_path
        self.processor = WhisperProcessor.from_pretrained(self.model_path) 
        self.model = WhisperForConditionalGeneration.from_pretrained(self.model_path)
        self.OUTPUT_DIR= "temp"


    #--------------------------- convert mp3 to wav ---------------------------
    #

    def save_fn(self,filepath=''):
        '''

        If file is mp3 first convert it to wav.
        
        1) Read audio segment from Path
        2) Sample the audio to 16kHZ
        3) Save file in temp


        args:
            filepath(str): path of audio file example audio.mp3

        returns:
            path of temporary generated wav file
        '''

        path =filepath
        filename=os.path.basename(filepath) #Getting filename from path
        save_path = f"{self.OUTPUT_DIR}"
        if not os.path.exists(save_path): #If path not exist create a directory
            os.makedirs(save_path, exist_ok=True)
        if os.path.exists(path):
            try:
                sound = AudioSegment.from_mp3(path) #Read a sound file
                sound = sound.set_frame_rate(16000) #resample to 16000
                sound.export(f"{save_path}/{filename[:-4]}.wav", format="wav") #save to *.wav
                return f"{save_path}/{filename[:-4]}.wav"
            except:
                print(path)


    #------------------------- genrate transcript -----------------------------
    #
    def genrate_transcript(self,audio_path=''):
        
        '''
        Generate the transcript from audio file using Whisper

        1) This function will read the .wav audio using torchaudio
        2) Process the audio using WhisperProcessor
        3) Generate the predicted token ids using Whisper Model
        4) Decode the output using Language model

        args: 
            audio_path (str) : path of wav file

        returns:
            transcription(str): generated script
        '''
        self.audio_path = audio_path
        exten = os.path.splitext(self.audio_path)[1]
        if exten == '.mp3':
            self.audio_path = self.save_fn(self.audio_path)
        wave,fs=torchaudio.load(self.audio_path)
        #tensor to numpy
        wave=wave.numpy()
        if len(wave.shape)>=2:
            wave=wave[0,:]
        inputs = self.processor(wave, return_tensors="pt")
        input_features = inputs.input_features
        generated_ids = self.model.generate(inputs=input_features)
        #decode the transcript using language model from processor 
        transcription = self.processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
        return transcription

Model Initialization

In [96]:
whisper_transcriptor=WhisperTranscriptorAPI(model_path='openai/whisper-tiny.en')

Experiments:

In [97]:
transcript1=whisper_transcriptor.genrate_transcript(audio_path='audio4.mp3')

It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.


In [98]:
transcript1

" Dr. Mortimer looked at homes with an air of professional interest, and Sir Henry Baker'sville turned a pair of puzzled dark eyes upon me."

In [100]:
transcript2=whisper_transcriptor.genrate_transcript(audio_path='abc/audio5.wav')

It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.


In [101]:
transcript2

' When I was taking my walk on the same day I met Zawiski with Labradi. I did not try to avoid them.'

In [102]:
transcript3=whisper_transcriptor.genrate_transcript(audio_path='ali.mp3')

It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.


In [91]:
transcript3

' A guru is an expert who has devoted many years to study and contemplation.'

In [93]:
transcript4=whisper_transcriptor.genrate_transcript(audio_path='temp/ali.wav')

It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.


temp/ali.wav




In [94]:
transcript4

' A guru is an expert who has devoted many years to study and contemplation.'