# **AUDIO to TEXT using Openai's Whisper API**

1. Familiarize yourself with the Whisper API: Research the Whisper API and learn how to use it to convert audio input to text.
2. Collect audio data: Collect a sample audio data set to test your application. The data set should include different types of audio inputs, such as speeches, songs, and conversations.
3. Convert audio input to text: Use the Whisper API to convert the audio input to text. Test and measure the accuracy of the transcription against the original audio input.
4. Send text output to OpenAl Davinci model: Once you have the text output, input it into the OpenAl Davinci model to generate a logical response. This step might involve using natural language processing (NLP) techniques to analyze and understand the content of the text input.
5. Convert text output to audio: After generating a response using the OpenAl Davinci model, convert the text output back to audio format. You might use text-to-speech (TTS) software or libraries to achieve this conversion.
6. Document and share your work: Document your work, including any code, data sets, or methodologies you used, and share your findings on a github repository.

In [5]:
# using GPU for faster computation
!nvidia-smi
# This can also be run using CPU if GPU is not accessible

/bin/bash: line 1: nvidia-smi: command not found


## Install openai

In [6]:
# install openai
!pip install openai

Collecting openai
  Downloading openai-1.10.0-py3-none-any.whl (225 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m225.1/225.1 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.26.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
Collecting typing-extensions<5,>=4.7 (from openai)
  Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.2-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.9/76.9 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[

In [7]:
#install latest release of whisper
!pip install -U openai-whisper

Collecting openai-whisper
  Downloading openai-whisper-20231117.tar.gz (798 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/798.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.4/798.6 kB[0m [31m2.0 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━[0m [32m491.5/798.6 kB[0m [31m7.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m798.6/798.6 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting tiktoken (from openai-whisper)
  Downloading tiktoken-0.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m14.5

In [8]:
# install the python dependencies
!pip install git+https://github.com/openai/whisper.git

Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-f_z4894o
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-f_z4894o
  Resolved https://github.com/openai/whisper.git to commit ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


# 1. Importing whisper (medium) model

In [9]:
import openai
import whisper

model = whisper.load_model("medium")

100%|█████████████████████████████████████| 1.42G/1.42G [00:15<00:00, 97.3MiB/s]


In [10]:
# function to display audio
from IPython.display import Audio, display

def display_audio(media_file):
  display(Audio(media_file, autoplay=True))

In [11]:
# function that prints audio to text
def audio2text(media_file):
  result = model.transcribe(media_file, fp16=False)
  print(result["text"])

# 2. Dataset consists of dialogues from movies, songs, speech and conversations

## Audio Clip 1 (Movie)

In [None]:
# download audio from given site
!wget -O audio.mp3 http://www.moviesoundclips.net/movies1/batmanbegins/bats.mp3

--2024-01-23 19:24:57--  http://www.moviesoundclips.net/movies1/batmanbegins/bats.mp3
Resolving www.moviesoundclips.net (www.moviesoundclips.net)... 198.54.115.219
Connecting to www.moviesoundclips.net (www.moviesoundclips.net)|198.54.115.219|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 52128 (51K) [audio/mpeg]
Saving to: ‘audio.mp3’


2024-01-23 19:24:58 (106 KB/s) - ‘audio.mp3’ saved [52128/52128]



In [None]:
display_audio("audio.mp3")
audio2text("audio.mp3")

 Bats are not pernil. Bats may be, but even a billionaire playboy, sir, three o'clock is pushing him.


## Audio Clip 2 (Audio from a person)

In [12]:
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [13]:
audio_path = '/content/drive/MyDrive/Colab Notebooks/audios/Satpal_audio.mp4'


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [14]:
display_audio(audio_path)
audio2text(audio_path)

 Hi, my name is Satpal. I am pursuing electronics and communication engineering from Rajiv Gandhi Institute of Petroleum Technology and my hobbies are I love dancing and to read some books.


In [15]:
#trying on hindi AUDIO
aud_hin='/content/drive/MyDrive/Colab Notebooks/audios/History_india.m4a'

In [16]:
display_audio(aud_hin)
audio2text(aud_hin)

 वैदिक सब्यता सरस्वती नदी के तट्ये शेत्र जिसमे आदुनिक भारत के पंजाब और हर्याना राज आटे हैं में विक्सित हुई। आम तॉरपर अधिकतर विद्वान वैदिक सब्यता काकाल 2000 इसा पूर्व से 600 इसा पूर्व के बीच में मानते हैं।


## Audio Clip 3 (Speech)

In [None]:
!wget -O audio1.mp3 http://www.moviesoundclips.net/movies1/pirates4/kingsmen.mp3

--2024-01-23 20:13:10--  http://www.moviesoundclips.net/movies1/pirates4/kingsmen.mp3
Resolving www.moviesoundclips.net (www.moviesoundclips.net)... 198.54.115.219
Connecting to www.moviesoundclips.net (www.moviesoundclips.net)|198.54.115.219|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 174632 (171K) [audio/mpeg]
Saving to: ‘audio1.mp3’


2024-01-23 20:13:14 (53.2 KB/s) - ‘audio1.mp3’ saved [174632/174632]



In [None]:
display_audio("audio1.mp3")
audio2text("audio1.mp3")

 Gentlemen, I shall not ask any more of any man than what that man can deliver. But I do ask this. Are we not King's men? Aye. On the King's mission? Aye. I did not note any fear in the eyes of the Spanish as they passed us by. Are we not King's men? Aye. Aye.


## Audio Clip 4 (Conversations)

In [None]:
audio_file2='/content/drive/MyDrive/Colab Notebooks/audios/phone-call-about-work.mp3'
display_audio(audio_file2)
audio2text(audio_file2)

 Listen, read, repeat. The phone call. Hello, this is 123 Company. Hello, my name is John. I want to work at your company. Do you need more people? Oh yes, we do need more people to work. You can come get an application. Do you know the place? Oh yes, thank you. I know the place. It is close to my home. I will come get the application. That is good. Anytime is good. I will see you soon. Thank you. Goodbye.


# 3. Audio input to text using Whisper API

In [None]:
def audio2text_usingapi(file_path):
  openai.api_key= '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' # generate api key from openai account #generate keys
  f1 = open(file_path,"rb")
  response = openai.Audio.transcribe("whisper-1", f1)
  return response['text']

In [None]:
!pip install openai --upgrade


Collecting openai
  Using cached openai-1.9.0-py3-none-any.whl (223 kB)
Installing collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 0.28.0
    Uninstalling openai-0.28.0:
      Successfully uninstalled openai-0.28.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.[0m[31m
[0mSuccessfully installed openai-1.9.0


In [None]:
text_f = audio2text_usingapi(audio_path)
print(text_f)

APIRemovedInV1: 

You tried to access openai.Audio, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742


In [None]:
import openai

# Set your API key
openai.api_key = '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'

# Example prompt and API call
prompt = "Translate the following English text to French: '{text}'"
response = openai.Completion.create(
  engine="text-davinci-003",
  prompt=prompt,
  max_tokens=100
)

# Extract and print the generated text
generated_text = response.choices[0].text.strip()
print(generated_text)


APIRemovedInV1: 

You tried to access openai.Completion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742


# Audio Clip 5 (French song)

In [None]:
display_audio("audio_french.mp3")

ValueError: rate must be specified when data is a numpy array or list of audio samples.

In [None]:
audio2text_usingapi("audio_french.mp3")

"Il s'est barré beau malin d'ennui A la recherche d'une nouvelle lueur Ça m'rappelle quand on prenait le train"

# Audio Clip 6 (Speech in other language)

In [None]:
display_audio("audio_lang.mp3")

In [None]:
audio2text_usingapi("audio_lang.mp3")

'Ерлан Тұрғымбаев – Қазақстан Республикасының үшкестер министрі, Батыс Қазақстан Қызылорда областарының жергілікті атқарушу орғандары екі жыл бойынша жоспарланған ұшыраларды ұрындама егеледі.'

# 4. Complete the text output generated using Davinci model

In [None]:
import os
import openai

openai.api_key = "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" # generate api key from openai account

response = openai.Completion.create(
  model="text-davinci-003", # best performing davinci model
  prompt=text_f,
  temperature=0.7,
  max_tokens=256,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)

NameError: name 'text_f' is not defined

In [None]:
text_completion = response.choices[0].text

In [None]:
print(text_completion)

 person was astounding! They just kept saying that they weren't allowed to give it out and I had to respect their wishes. I can understand respecting people's privacy, but this person seemed to be taking it to an extreme. In the end, I had to just accept that I wasn't going to be getting the answer I wanted and move on.


In [None]:
text_final = text_f+'...'+text_completion

In [None]:
# the text output has been completed after passing it to the davinci model
print(text_final)

It wasn't like I was asking for the code to a nuclear bunker or anything like that, but the amount of resistance I got from this... person was astounding! They just kept saying that they weren't allowed to give it out and I had to respect their wishes. I can understand respecting people's privacy, but this person seemed to be taking it to an extreme. In the end, I had to just accept that I wasn't going to be getting the answer I wanted and move on.


In [None]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

from nltk import word_tokenize, pos_tag
tokens = word_tokenize(text_final)
print(pos_tag(tokens))

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


[('It', 'PRP'), ('was', 'VBD'), ("n't", 'RB'), ('like', 'IN'), ('I', 'PRP'), ('was', 'VBD'), ('asking', 'VBG'), ('for', 'IN'), ('the', 'DT'), ('code', 'NN'), ('to', 'TO'), ('a', 'DT'), ('nuclear', 'JJ'), ('bunker', 'NN'), ('or', 'CC'), ('anything', 'NN'), ('like', 'IN'), ('that', 'DT'), (',', ','), ('but', 'CC'), ('the', 'DT'), ('amount', 'NN'), ('of', 'IN'), ('resistance', 'NN'), ('I', 'PRP'), ('got', 'VBD'), ('from', 'IN'), ('this', 'DT'), ('...', ':'), ('person', 'NN'), ('was', 'VBD'), ('astounding', 'VBG'), ('!', '.'), ('They', 'PRP'), ('just', 'RB'), ('kept', 'VBD'), ('saying', 'VBG'), ('that', 'IN'), ('they', 'PRP'), ('were', 'VBD'), ("n't", 'RB'), ('allowed', 'VBN'), ('to', 'TO'), ('give', 'VB'), ('it', 'PRP'), ('out', 'IN'), ('and', 'CC'), ('I', 'PRP'), ('had', 'VBD'), ('to', 'TO'), ('respect', 'VB'), ('their', 'PRP$'), ('wishes', 'NNS'), ('.', '.'), ('I', 'PRP'), ('can', 'MD'), ('understand', 'VB'), ('respecting', 'VBG'), ('people', 'NNS'), ("'s", 'POS'), ('privacy', 'NN'), ('

# 5. Convert Text output to Audio

In [None]:
!pip install gTTS

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting gTTS
  Downloading gTTS-2.3.1-py3-none-any.whl (28 kB)
Installing collected packages: gTTS
Successfully installed gTTS-2.3.1


In [None]:
import os
from gtts import gTTS

language = 'en'

obj1 = gTTS(text=text_final, lang=language, slow=False)

# Saving the converted audio in a mp3 file
obj1.save("packt.mp3")

In [None]:
display_audio("packt.mp3")
audio2text("packt.mp3")

 It wasn't like I was asking for the code to a nuclear bunker or anything like that. But the amount of resistance I got from this person was astounding. They just kept saying that they weren't allowed to give it out and I had to respect their wishes. I can understand respecting people's privacy, but this person seemed to be taking it to an extreme. In the end, I had to just accept that I wasn't going to be getting the answer I wanted and move on.
