# Introduction to Speech Recognition with Python 

# Speech Recognition from Audio Files

In [1]:
import speech_recognition as speech_recog

To convert speech to text the one and only class we need is the Recognizer
class from speech_recognition module. Depending upon the underlying API used to convert speech to text, the Recognizer class has following methods: 

recognize_sphinx()
recognize_bing(): Uses Microsoft Bing Speech API
recognize_google(): Uses Google Speech API
recognize_google_cloud(): Uses Google Cloud Speech API
recognize_houndify(): Uses Houndify API by Sound Hound
recognize_ibm(): Uses IBM Speech to Text API
recognize_sphinx(): Uses PocketSphinx API

## read file

In [2]:
sample_audio = speech_recog.AudioFile('OSR_us_000_0018_8k.wav')

the recognize_google() method requires the AudioData object of the speech_recognition module as a parameter. 

To convert our audio file to an AudioData object, we can use the record() method of the Recognizer class.

In [8]:
recog = speech_recog.Recognizer()
with sample_audio as audio_file:
    audio_content = recog.record(audio_file)

Now we can simply pass the audio_content object to the recognize_google() method of the Recognizer() class object and the audio file will be converted to text. 

In [9]:
recog.recognize_google(audio_content)

"what's the low to the left shoulder take the winding path reach the lake no closely the size of the gas tank wife degrees off is dirty face men to call before you go out the Redwood Valley strain and hung limp the stray cat gave birth to kittens the young girl gave no clear response the meal was cooked before the bell rang what Joy there is a living"

## Setting Duration and Offset Values

Instead of transcribing the complete speech, you can also transcribe a particular segment of the audio file. For instance, if you want to transcribe only the first 10 seconds of the audio file, you need to pass 10 as the value for the duration parameter of the record() method.

In [10]:
with sample_audio as audio_file:
    audio_content = recog.record(audio_file, duration=10)
    
recog.recognize_google(audio_content)

"what's the low to the left shoulder take the winding path reach the lake no closely the size of the gas"

you can skip some part of the audio file from the beginning using the offset parameter. For instance, if you do not want to transcribe the first 4 seconds of the audio, pass 4 as the value for the offset attribute. 

In [11]:
with sample_audio as audio_file:
    audio_content = recog.record(audio_file, offset=4, duration=10)
    
recog.recognize_google(audio_content)

"it's a winding path reach the lake no closely the size of the gas tank wipe degrees off is dirty face"

## Handling Noise

An audio file can contain noise due to several reasons. Noise can actually affect the quality of speech to text translation. To reduce noise, the Recognizer class contains adjust_for_ambient_noise() method, which takes the
AudioData object as a parameter.

In [12]:
with sample_audio as audio_file:
    recog.adjust_for_ambient_noise(audio_file)
    audio_content = recog.record(audio_file)
    
recog.recognize_google(audio_content)

"what's the low to the left shoulder take the winding path leads to make no closely the size of the gas tank white degrees off is dirty face before you go out the wristlet Bali strain and hung limp the stray cat gave birth to kittens the young girl gave no clear response the meal was cooked before the bell rang what Joy there is a living"

# Speech Recognition from Live Microphone

To capture the audio from a microphone, we need to first create an object of the Microphone class of the Speach_Recogniton module

In [13]:
mic = speech_recog.Microphone()

In [14]:
speech_recog.Microphone.list_microphone_names()

[' - Input',
 'Microphone (Realtek(R) Audio)',
 'Stereo Mix (Realtek(R) Audio)',
 ' - Output',
 'Speakers (Realtek(R) Audio)',
 'Primary Sound Capture Driver',
 'Microphone (Realtek(R) Audio)',
 'Stereo Mix (Realtek(R) Audio)',
 'Primary Sound Driver',
 'Speakers (Realtek(R) Audio)',
 'Speakers (Realtek(R) Audio)',
 'Stereo Mix (Realtek(R) Audio)',
 'Microphone (Realtek(R) Audio)',
 'Speakers (Realtek HD Audio output)',
 'Stereo Mix (Realtek HD Audio Stereo input)',
 'SPDIF Out (Realtek HDA SPDIF Out)',
 'Microphone (Realtek HD Audio Mic input)',
 'Speakers (Nahimic mirroring Wave Speaker)']

The next step is to capture the audio from the microphone. To do so, you need to call the listen() method of the Recognizer() class. Like the record() method, the listen() method also returns the speech_recognition.AudioData object, which can then be passed to the recognize_google() method. 

In [18]:
with mic as audio_file:
    print("Speak Please")
    recog.adjust_for_ambient_noise(audio_file)
    audio = recog.listen(audio_file)
    print("Converting Speech to Text...")
try:
        print("You said: " + recog.recognize_google(audio))
except Exception as e:
        print("Error: " + str(e))

Speak Please
Converting Speech to Text...
You said: conclusion useful application in the domain of human-computer interaction and automatic speech transcription explain the process of the brain and explains how to translate speech to text
