## Speech Recognition With Python
[link](https://realpython.com/python-speech-recognition/#how-speech-recognition-works-an-overview)

In [1]:
!pip install SpeechRecognition



In [2]:
import speech_recognition as sr
sr.__version__

'3.8.1'

In [3]:
r = sr.Recognizer()

### Working With Audio Files

In [4]:
harvard = sr.AudioFile('./data/harvard.wav')
with harvard as source:
    audio = r.record(source)

type(audio)

speech_recognition.AudioData

In [5]:
# Note, you can play this audio (harvard.wav) on Windows Media Player, so that you can compare how precise the following Speech Reconition does!

r.recognize_google(audio)

'the stale smell of old beer lingers it takes heat to bring out the odor a cold dip restores health and zest a salt pickle taste fine with ham tacos al Pastore are my favorite a zestful food is be hot cross bun'

### Capturing Segments With offset and duration

In [6]:
with harvard as source:
     audio = r.record(source, duration=4)  # duration keyword argument that stops the recording after a specified number of seconds.

print(r.recognize_google(audio) )

the stale smell of old beer lingers


In [7]:
with harvard as source:
     audio1 = r.record(source, duration=4)
     audio2 = r.record(source, duration=4)

print(r.recognize_google(audio1))
print(r.recognize_google(audio2))

the stale smell of old beer lingers
it takes heat to bring out the odor a cold dip


In [8]:
with harvard as source:
     audio = r.record(source, offset=4, duration=3)

print(r.recognize_google(audio) )

it takes heat to bring out the odor


In [9]:
# be aware that inappropriate offset can have poor transcriptions:
with harvard as source:
     audio = r.record(source, offset=4.8, duration=2.8)

print(r.recognize_google(audio) )

Peta bring out the odor ecole


### The Effect of Noise on Speech Recognition

In [10]:
jackhammer = sr.AudioFile('./data/jackhammer.wav')
with jackhammer as source:
     audio = r.record(source)

r.recognize_google(audio)

'the snail smelly old gear vendors'

In [11]:
#One thing you can try is using the adjust_for_ambient_noise() method of the Recognizer class
with jackhammer as source:
     r.adjust_for_ambient_noise(source)
     audio = r.record(source)

r.recognize_google(audio)

'still smell like old beer drinkers'

In [12]:
#You can adjust the time-frame that adjust_for_ambient_noise() uses for analysis with the duration keyword argument
with jackhammer as source:
     r.adjust_for_ambient_noise(source, duration=0.5)
     audio = r.record(source)

r.recognize_google(audio)

'the stale smell of old gear vendors'

In [13]:
# If you find yourself running up against these issues frequently, you may have to resort to some pre-processing of the audio. 
# This can be done with audio editing software or a Python package (such as SciPy) that can apply filters to the files.
# For now, just be aware that ambient noise in an audio file can cause problems and must be addressed in order to maximize 
# the accuracy of speech recognition.

In [14]:
r.recognize_google(audio, show_all=True)

{'alternative': [{'transcript': 'does still smell old gear vendors',
   'confidence': 0.78407794},
  {'transcript': 'the snail smelly old gear vendors'},
  {'transcript': 'the still smell old gear vendors'},
  {'transcript': 'the still smell of old gear vendors'},
  {'transcript': 'the snail smell of old gear vendors'}],
 'final': True}

## <font color=red>Working With Microphones</font>

In [15]:
!pip install pyaudio
# Note: For this installation, it did not work for me in this way. Finally I installed it manually at Anaconda env 'tensorflow' 
# which is where this notebook is running at



In [17]:
import speech_recognition as sr
r = sr.Recognizer()

In [18]:
mic = sr.Microphone()

In [19]:
sr.Microphone.list_microphone_names()

['Microsoft Sound Mapper - Input',
 'Headset Microphone (Plantronics',
 'Microphone Array (Synaptics Aud',
 'Microsoft Sound Mapper - Output',
 'Headset Earphone (Plantronics B',
 'Speakers (Synaptics Audio)',
 'Microphone 1 (Synaptics Audio capture)',
 'Microphone 2 (Synaptics Audio capture)',
 'Microphone 3 (Synaptics Audio capture)',
 'Output 1 (Synaptics Audio headphone)',
 'Output 2 (Synaptics Audio headphone)',
 'Input (Synaptics Audio headphone)',
 'Microphone Array 1 (Synaptics Audio capture)',
 'Microphone Array 2 (Synaptics Audio capture)',
 'Microphone Array 3 (Synaptics Audio capture)',
 'Output 1 (Synaptics Audio output)',
 'Output 2 (Synaptics Audio output)',
 'Input (Synaptics Audio output)',
 'Headset Microphone (Plantronics BT600)',
 'Headset Earphone (Plantronics BT600)']

In [None]:
# from above devices index you choose any device of microphones eg: mic = sr.Microphone(device_index=1) will pick Plantronics
# or do nothing to let the software to pick the default device

In [34]:
with mic as source:
     audio = r.listen(source)

In [36]:
r.recognize_google(audio)

"hello good morning I'm Chuck I'm from Mountain View California"

In [None]:
# To handle ambient noise, you’ll need to use the adjust_for_ambient_noise() method of the Recognizer class, just like mentioned above
with mic as source:
     r.adjust_for_ambient_noise(source)
     audio = r.listen(source)

In [None]:
# After running the above code, wait a second for adjust_for_ambient_noise() to do its thing, then try speaking “hello” into the
# microphone. Again, you will have to wait a moment for the interpreter prompt to return before trying to recognize the speech.

# Audio that cannot be matched to text by the API raises an UnknownValueError exception. 

In [37]:
text = r.recognize_google(audio)
print(text)

hello good morning I'm Chuck I'm from Mountain View California
