# Using SpeechRecognition to convert speech to text

In this tutorial, we will detail installing a speech-to-text library and running it on an audio file to extract data.

First install `SpeechRecognition` and `PyAudio` using `pip`.

In [None]:
!pip install SpeechRecognition
!sudo apt-get install portaudio19-dev # ubuntu-based system dependency
!pip install pyaudio

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
portaudio19-dev is already the newest version (19.6.0-1.1).
0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.


Let's import `SpeechRecognition` and create a `Recognizer` whose methods will be used for speech-to-text.

In [None]:
import speech_recognition as sr
import pyaudio
recognizer = sr.Recognizer()

Let's now upload a file to convert to text. The first file I am using is a .wav file of Kanye Wests' POWER acapella.

[Google Drive Folder](https://drive.google.com/drive/folders/1FFCNxg51TiUFICosmcTy56feQ9n-Occp?usp=sharing)

(FYI: If you want to add your own songs, you need to trim to down to less than 10 MB for it to parse)

You can put whatever file you want for transcription. For now, however, just download the file from the second link and upload it to your Colab notebook. Or, if you are running this notebook locally, just download it to the same folder as this python notebook.

> Note: No matter what file you use, make sure it's in mono format, not stereo.



In [None]:
# Replace "POWER.wav" with your file path
with sr.AudioFile("./POWER.wav") as source:
    audio_file = recognizer.record(source)

try:
    text = recognizer.recognize_google(audio_file, language = 'en-US')
    print("You said in the audio file: {}".format(text))
except sr.UnknownValueError:
    print("Sorry, could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))


You said in the audio file: I'm living in that 21st century doing something mean to it do it better than anybody you ever seen do is screams from the haters got a nice ring to it I guess every superhero need his theme music no one man should have all that power the clock ticking I just count the hours stop tripping up tripping off the power


If you get something along the lines of:

```
You said in the audio file: I'm living in that 21st century doing something mean to it do it better than anybody you ever seen do is screams from the haters got a nice ring to it I guess every superhero need his theme music no one man should have all that power the clock ticking I just count the hours stop tripping up tripping off the power
```

You did it!


Now, let's try transcribing a song with background vocals.

This time, I am using Devil in a New Dess by Kanye West *with* the background sound included.

In [None]:
# Replace "DIAND.wav" with your file path
with sr.AudioFile("./DIAND.wav") as source:
    audio_file = recognizer.record(source)

try:
    text = recognizer.recognize_google(audio_file, language = 'en-US')
    print("You said in the audio file: {}".format(text))
except sr.UnknownValueError:
    print("Sorry, could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))


Sorry, could not understand audio


If you're lucky, it will return with some garbage text: most of the time, you will get an `UnknownValueError`. This is because the beat confused the `recognize_google` method.

There are a few more songs in the Google Drive folder that can be experimented with. Some are acapella, and some still have their beats.