# Speech Understanding 
# Lecture 9: The SpeechRecognition Module


### Mark Hasegawa-Johnson, KCGI, December 17, 2022

In today's lecture, we will learn how to use the <a href="https://pypi.org/project/SpeechRecognition/">Speech Recognition</a> module in order to access high-performance commercial and open-source speech recognizers.

Here are the contents:
1. <a href="#section_1">Installing SpeechRecognition</a>
1. <a href="#section_2">Using speech_recognition from the microphone</a>
1. <a href="#section_3">Using speech_recognition to perform a web search</a>
1. <a href="#section_4">Using speech_recognition from an audio file</a>
1. <a href="#homework">Homework</a>


<a id='section_1'></a>

## 1. Installing SpeechRecognition

The SpeechRecognition module is installed using pip and conda.  If you have anaconda installed, you can try the following two commands:

In [1]:
!pip install SpeechRecognition



In [2]:
!conda install pyaudio -y

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/jhasegaw/opt/anaconda3

  added / updated specs:
    - pyaudio


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2022.10.11 |       hecd8cb5_0         125 KB
    conda-22.11.1              |   py38hecd8cb5_4         941 KB
    openssl-1.1.1s             |       hca72f7f_0         2.8 MB
    portaudio-19.6.0           |       h647c56a_4          75 KB
    pyaudio-0.2.11             |   py38h1de35cc_2         203 KB
    ruamel.yaml-0.17.21        |   py38hca72f7f_0         179 KB
    ruamel.yaml.clib-0.2.6     |   py38hca72f7f_1         124 KB
    ------------------------------------------------------------
                                           Total:         4.4 MB

The following NEW packages will be INSTALLED:

  portaudio          p

The SpeechRecognition package is a python user interface that connects, in the back end, to many different speech recognizers, see: https://pypi.org/project/SpeechRecognition/

To start with, let's use the Google speech recognizer.  This one only works if you're connected to the internet.

In [11]:
import speech_recognition as sr
speech = sr.Recognizer()
print('Python is listening...')
with sr.Microphone() as source:
    speech.adjust_for_ambient_noise(source)
    audio = speech.listen(source)
    inp = speech.recognize_google(audio)
    print('You just said',inp,'.')

Python is listening...
result2:
[]


UnknownValueError: 

<a id='section_2'></a>

## 2. Using SpeechRecognition

We can use python's <a href="https://docs.python.org/3/tutorial/errors.html">exception handling</a> in case the speech recognizer has trouble recognizing what we say:

In [10]:
import speech_recognition as sr
speech = sr.Recognizer()

while True:
    print('Python is listening...')
    with sr.Microphone() as source:
        speech.adjust_for_ambient_noise(source)
        try:
            audio = speech.listen(source)
            inp = speech.recognize_google(audio)
            print('You just said',inp,'.')
        except sr.UnknownValueError:
            continue
        except sr.RequestError:
            continue
        except sr.WaitTimeoutError:
            continue
        if inp=="stop listening":
            print('Goodbye!')
            break

Python is listening...
result2:
{   'alternative': [   {   'confidence': 0.84996074,
                           'transcript': 'does this work'},
                       {'transcript': 'what is this work'},
                       {'transcript': 'is this work'},
                       {'transcript': 'Justice work'},
                       {'transcript': 'what does this work'}],
    'final': True}
You just said does this work .
Python is listening...
result2:
{   'alternative': [{'confidence': 0.98762906, 'transcript': 'stop listening'}],
    'final': True}
You just said stop listening .
Goodbye!


<a id='section_3'></a>

## 3. Using SpeechRecognizer to search the web

The speech recognizer can now be used to give text input for any application.  For example, let's try using it to search the web.  

To start with, here's how we open a web page in python:


In [12]:
import webbrowser
webbrowser.open("http://wsj.com")

True

Now let's use the speech recognizer to input the web page:

In [13]:
import speech_recognition as sr
import webbrowser
speech = sr.Recognizer()

while True:
    print('Python is listening...')
    with sr.Microphone() as source:
        speech.adjust_for_ambient_noise(source)
        try:
            audio = speech.listen(source)
            inp = speech.recognize_google(audio)
            print('You just said',inp,'.')
            inp.replace('browser ', '')
            webbrowser.open("http://" + inp)
        except sr.UnknownValueError:
            continue
        except sr.RequestError:
            continue
        except sr.WaitTimeoutError:
            continue
        if inp=="stop listening":
            print('Goodbye!')
            break

Python is listening...
result2:
{   'alternative': [{'confidence': 0.98762906, 'transcript': 'wsj.com'}],
    'final': True}
You just said wsj.com .
Python is listening...
result2:
[]
Python is listening...
result2:
{   'alternative': [{'confidence': 0.98762906, 'transcript': 'cnn.com'}],
    'final': True}
You just said cnn.com .
Python is listening...
result2:
[]
Python is listening...
result2:
{   'alternative': [{'confidence': 0.98762906, 'transcript': 'wikipedia.org'}],
    'final': True}
You just said wikipedia.org .
Python is listening...
result2:
{   'alternative': [{'confidence': 0.98762906, 'transcript': 'wiktionary.org'}],
    'final': True}
You just said wiktionary.org .
Python is listening...
result2:
{   'alternative': [{'confidence': 0.98762906, 'transcript': 'stop listening'}],
    'final': True}
You just said stop listening .
Goodbye!


Finally, let's use speech recognition to perform a web search.  To do that, all we need is to replace this line:

```webbrowser.open("http://" + inp)```

...with this one:

```webbrowser.open("http://google.com/search?q=" + inp)```

In [14]:
import speech_recognition as sr
import webbrowser
speech = sr.Recognizer()

while True:
    print('Python is listening...')
    with sr.Microphone() as source:
        speech.adjust_for_ambient_noise(source)
        try:
            audio = speech.listen(source)
            inp = speech.recognize_google(audio)
            print('You just said',inp,'.')
            inp.replace('browser ', '')
            webbrowser.open("http://google.com/search?q=" + inp)
        except sr.UnknownValueError:
            continue
        except sr.RequestError:
            continue
        except sr.WaitTimeoutError:
            continue
        if inp=="stop listening":
            print('Goodbye!')
            break

Python is listening...
result2:
{   'alternative': [   {   'confidence': 0.987629,
                           'transcript': 'browser how many inches in a foot'}],
    'final': True}
You just said browser how many inches in a foot .
Python is listening...
result2:
[]
Python is listening...
result2:
{   'alternative': [   {   'confidence': 0.98762906,
                           'transcript': 'yards in a mile'}],
    'final': True}
You just said yards in a mile .
Python is listening...
result2:
{   'alternative': [   {   'confidence': 0.987629,
                           'transcript': 'how many meters in a kilometer'}],
    'final': True}
You just said how many meters in a kilometer .
Python is listening...
result2:
{   'alternative': [   {   'confidence': 0.95494229,
                           'transcript': 'how many Korean Won per Japanese '
                                         'Yen'},
                       {'transcript': 'how many Korean won a Japanese Yen'},
                     

<a id="section_4"></a>

## 4. Using speech_recognition from an audio file

If you have an audio file, you can use the speech_recognition module to transcribe it.  For example, let's download the audio file we used in lecture 4:

In [15]:
import urllib.request, soundfile, IPython

example_url = "https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav"
webdata = urllib.request.urlopen(example_url).read()
with open("webdata.wav", "wb") as f:
    f.write(webdata)
    
speech_wave, speech_rate = soundfile.read("webdata.wav")
IPython.display.Audio(data=speech_wave, rate=speech_rate)

Now let's use speech_recognition to transcribe it:

In [16]:
import speech_recognition as sr
speech = sr.Recognizer()
with sr.AudioFile("webdata.wav") as source:
    audio = speech.record(source)
    inp = speech.recognize_google(audio)
    print('The person in this audio file said:',inp,'.')

result2:
{   'alternative': [   {   'confidence': 0.78506243,
                           'transcript': 'she has a duck suit and Gracie '
                                         'washer all year'},
                       {   'transcript': 'she has a duck suit and greasy '
                                         'washer all year'},
                       {   'transcript': 'she has a duck suit and greasy water '
                                         'all year'},
                       {   'transcript': 'she has a duck suit and greasy wash '
                                         'water all year'},
                       {   'transcript': 'she has a duck suit and Gracie watch '
                                         'for all year'}],
    'final': True}
The person in this audio file said: she has a duck suit and Gracie washer all year .


<a id='homework'></a>

## Homework for Week 9

Create a text file called `week9.py`.

This file should `def` a function called `transcribe_wavefile`, with the following parameters:
* Input: str = name of the input file
* Return: str = recognized text  

Here is a template that you can cut and paste, to get started:

In [18]:
import speech_recognition as sr

def transcribe_wavefile(filename):
    '''
    Use sr.Recognizer.AudioFile(filename) as the source,
    recognize from that source,
    and return the recognized text.
    '''
    raise "You need to write this part!"
    return inp

Test whether your code works by running the following block:

In [20]:
import week9, importlib
importlib.reload(week9)

inp = week9.transcribe_wavefile("webdata.wav")
print(inp)

result2:
{   'alternative': [   {   'confidence': 0.78506243,
                           'transcript': 'she has a duck suit and Gracie '
                                         'washer all year'},
                       {   'transcript': 'she has a duck suit and greasy '
                                         'washer all year'},
                       {   'transcript': 'she has a duck suit and greasy water '
                                         'all year'},
                       {   'transcript': 'she has a duck suit and greasy wash '
                                         'water all year'},
                       {   'transcript': 'she has a duck suit and Gracie watch '
                                         'for all year'}],
    'final': True}
she has a duck suit and Gracie washer all year


When the block above is working, try uploading your text file `week9.py` to <a href="https://www.gradescope.com/">Gradescope</a>.  The autograder checks the following things:

1. Did you submit a text file called `week9.py`?
1. Does your text file contains a method called `transcribe_wavefile`?
1. Does `week9.transcribe_wavefile` return a string?
1. Does `week9.transcribe_wavefile("webdata.wav")` return the string `she has a duck suit and Gracie washer all year`?
1. Does `week9.transcribe_wavefile` also work if applied to a secret audio file that is different from `webdata.wav`?