In [76]:
pip install SpeechRecognition




In [77]:
pip install pyaudio




In [78]:
import speech_recognition as sr

In [79]:
recognizer=sr.Recognizer()

In [80]:
recognizer.energy_threshold=300

### **Speech Recognition Functions**
For recognizing speech from audio data using different APIs there is a recognizer class that does all the work.

recognize_houndify(): Houndify by SoundHound
recognize_ibm(): IBM Speech to Text
recognize_sphinx(): CMU Sphinx – requires installing PocketSphinx
recognize_google(): Google Web Speech API
recognize_google_cloud(): Google Cloud Speech – requires installation of the google-cloud-speech package
The recognize_sphinx() has benefits as it can work offline with the CMU Sphinx engine. The other requires a stable internet connection.

Google offers it is own API recognize_google() which is free and it also does not require any API key for use. Well, there is one drawback about google speech recognition that is limiting you went you try to process the audio data which have a longer time period.

### **Audio Preprocessing**

While passing the audio data if you get an error it is due to the wrong data type format for the audio file. To avoid this kind of situation preprocessing of audio data is a must there is a class especially for preprocessing the audio file which is called AudioFile.

For this example, you can download any audio file. But if you want to use your own voice for the audio files you just need to run the below code.

In [88]:
import speech_recognition as sr

# obtain audio from the microphone
recognizer = sr.Recognizer()


with sr.Microphone() as source:
    print("Say something!")
    audio = recognizer.listen(source)

    #write audio to a wav file
with open("microphone-results.wav", "wb") as f:
    f.write(audio.get_wav_data())
    print("Audio saved as microphone-results.wav")   


#*******

#     # recognize speech using Google Speech Recognition
#     text = r.recognize_google(audio)
#     print("Google Speech Recognition thinks you said:", text)

# except sr.UnknownValueError:
#     print("Google Speech Recognition could not understand audio")
# except sr.RequestError as e:
#     print("Could not request results from Google Speech Recognition service; {0}".format(e))
# except Exception as e:
#      print("Error:", e)


Say something!
Audio saved as microphone-results.wav


### **Audio processing code**

In [89]:
import speech_recognition as sr
recognizer=sr.Recognizer()

audio_file=sr.AudioFile("microphone-results.wav")
type(audio_file)

speech_recognition.AudioFile

### **Record**
So, let’s convert it into **audio to audio data** with the help of a record.

In [90]:
audio_file = sr.AudioFile("microphone-results.wav")

with audio_file as source:
  audio_file = recognizer.record(source)
  recognizer.recognize_google(audio_data=audio_file)

type(audio_file)

speech_recognition.audio.AudioData

### **Duration:**
Duration is used when you want specific audio from the audio data to let’s say you want only the first 5 seconds of the entire audio data, so now we have to set the duration parameter to 0.5.

In [91]:
audio_file = sr.AudioFile("microphone-results.wav")
with audio_file as source:
  audio_file = recognizer.record(source, duration = 5.0)
  result = recognizer.recognize_google(audio_data=audio_file)

result

'good morning are you everyone'

### **Offset:** 
Offset is mainly used when we need to cut off some seconds at the starting of the audio data. Let’s see if you don’t want to listen or need the first 5 seconds of the audio then you have to set the offset parameter to 5

In [101]:
# audio_file = sr.AudioFile("microphone-results.wav")
# with audio_file as source:
#   audio_file = recognizer.record(source, offset = 1.0)
#   result = recognizer.recognize_google(audio_data=audio_file)
audio_file = sr.AudioFile("microphone-results.wav")

with audio_file as source:
  audio_file = recognizer.record(source, offset = 0.01)
  result = recognizer.recognize_google(audio_data=audio_file)

result

'good morning are you everyone'

### **The Effect of Noise on Speech Recognition**
We won’t get noise-free data every time. All audio files have some degree of noise in them from the start and this un-handled noise will affect the accuracy of the Speech Recognition system.

Let’s see an example where we will see how a noise audio file affects the accuracy of the Speech Recognition system. You can download the audio file “jackhammer.wav” file here. Don’t forget to download the audio file into your current working directory.

First, let’s run the code and see what output is seen

In [111]:
recognizer = sr.Recognizer()
jackhammer = sr.AudioFile('microphone-results.wav')
with jackhammer as source:
    audio = recognizer.record(source)
    
recognizer.recognize_google(audio)

'good morning are you everyone'

Ohh hell no this won’t work but we have the solution we can use the adjust_for_ambient_noise method of the Recognizer class. Let’s re-run it.

In [None]:
audio_file = sr.AudioFile("microphone-results.wav")
with jackhammer as source:
    recognizer.adjust_for_ambient_noise(source, duration=0.5)
    audio = recognizer.record(source)

recognizer.recognize_google(audio)

In [None]:
#****************************************************************************

In [None]:
#****************************************************************************

### **adjust_for_ambient_noise methods** 
work for this audio and we got a closer output but it is not correct for cases like this we need to pre-process the audio file which can be done with audio editing software or a Python package (such as SciPy).
If you don’t want to use the audio file you can also use your voice and the speech_recognition will write down what you speak.
The code for the voice coming through the microphone for that we will use pyaudio library.

In [116]:
import speech_recognition as sr

# Initialize recognizer class (for recognizing the speech)
recognizer = sr.Recognizer()

# Reading Microphone as source
# listening the speech and store in audio_text variable
with sr.Microphone() as source:
    print("Start Talking")
    audio_text = recognizer.listen(source)
    print("Time over, thank you")
    
    try:
        # using google speech recognition
        print("Text: "+recognizer.recognize_google(audio_text))
    except:
         print("Sorry, I did not get that")

Start Talking
Time over, thank you
Text: hey good morning 12345
