# Guess the Word

In the previous Notebook, we built and trained a Speech Recognition engine ourselves. Another possibility is to use one of the Speech Recognition services that are available for use online through an API. Many of these services offer Python SDKs, so Speech Recognition becomes really easy.

Some important packages for the Speech Recognition are:

- apiai
- assemblyai
- google-cloud-speech
- pocketsphinx
- SpeechRecognition
- watson-developer-cloud
- wit

There is one package that stands out in terms of ease-of-use: *SpeechRecognition*. Instead of having to build scripts for accessing microphones and processing audio files from scratch, SpeechRecognition will have you up and running in just a few minutes.

The SpeechRecognition library acts as a wrapper for several popular speech APIs and is thus extremely flexible. One of these —the Google Web Speech API — supports a default API key that is hard-coded into the SpeechRecognition library. That means you can get off your feet without having to sign up for a service. So, we'll use this 'SpeechRecognition' library as our Proof of Concept.

## 1. Installing SpeechRecognition

You can install SpeechRecognition with the following command:

In [1]:
pip install SpeechRecognition

Collecting SpeechRecognition
  Using cached SpeechRecognition-3.10.0-py2.py3-none-any.whl (32.8 MB)
Installing collected packages: SpeechRecognition
Successfully installed SpeechRecognition-3.10.0
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Once installed, you should verify the installation by typing:

In [2]:
import speech_recognition as sr
sr.__version__

'3.10.0'

## 2. The Recognizer Class

All of the magic in SpeechRecognition happens with the *Recognizer class* (for more info: https://pypi.org/project/SpeechRecognition/).

Once a Recognizer instance is created, you can use seven methods for recognizing speech from an audio source using various APIs. These are:

- recognize_bing(): Microsoft Bing Speech
- recognize_google(): Google Web Speech API
- recognize_google_cloud(): Google Cloud Speech - requires installation of the google-cloud-speech package
- recognize_houndify(): Houndify by SoundHound
- recognize_ibm(): IBM Speech to Text
- recognize_sphinx(): CMU Sphinx - requires installing PocketSphinx
- recognize_wit(): Wit.ai

Since SpeechRecognition ships with a default API key for the Google Web Speech API, you can get started with it right away. For this reason, we’ll use the Web Speech API in this Notebook. The other six APIs all require authentication with either an API key or a username/password combination. Remark: with the default Google Web Speech API key, you’ll be limited to only 50 requests per day, and there is no way to raise this quota. 

SpeechRecognition makes working with audio files easy thanks to its handy *AudioFile class*. This class can be initialized with the path to an audio file and provides methods for reading and working with the file’s contents. The *record()* method records the data from the entire file into an AudioData instance.

So the complete code to read an audio file and recognize any speech in the audio, will be:

In [3]:
r = sr.Recognizer()

harvard = sr.AudioFile('./audio_files/harvard.wav')
with harvard as source:
    audio = r.record(source)
    
r.recognize_google(audio)

'the still smell of old beer lingers it takes heat to bring out the odour a cold dip restores health exist a salt pickle taste fine with him go past or my favourite exist for food is the hot cross bun'

It's not perfect, but it does do a decent job, doesn't it?

## Exercise

- Open the file in the audio_files map and listen if the transcription is correct.
- Download an audio file and try to recognize the speech.

## 3. Capturing segments with offset and duration

If you only want to capture a portion of the speech in a file, the record() method accepts a duration keyword argument that stops the recording after a specified number of seconds. In addition to specifying a recording duration, the record() method can be given a specific starting point using the offset keyword argument. This value represents the number of seconds from the beginning of the file to ignore before starting to record.

In [4]:
with harvard as source:
    audio = r.record(source, duration=4)
r.recognize_google(audio)

'the stale smell of old beer lingers'

In [5]:
with harvard as source:
    audio = r.record(source, offset=4.7, duration=2.8)

r.recognize_google(audio)

'exceed to bring out the odour'

## 4. The effect of noise

Noise is a fact of life. All audio recordings have some degree of noise in them, and un-handled noise can wreck the accuracy of speech recognition apps.

Listen to `jackhammer.wav` in the audio_files map. A lot of noise, right? The correct transcription should be: "the stale smell of old beer lingers". Let's try to recognize it from the audio file.

In [6]:
r = sr.Recognizer()

jackhammer = sr.AudioFile('./audio_files/jackhammer.wav')
with jackhammer as source:
    audio = r.record(source)

r.recognize_google(audio)

'smelling fingers'

You will probably get rubbish output (because newer versions of the library handled the exception, see next paragraph, inside of the API block itself), or an error. 


Audio that cannot be matched to text by the API raises an UnknownValueError exception. You should always wrap calls to the API with try and except blocks to handle this exception. You know about exception handling from your other programming courses, right?

In [7]:
r = sr.Recognizer()

jackhammer = sr.AudioFile('./audio_files/jackhammer.wav')
with jackhammer as source:
    audio = r.record(source)
    
try:
    print(r.recognize_google(audio))
except sr.RequestError:
    print("API was unreachable or unresponsive")
except sr.UnknownValueError:
    print("Unable to recognize speech")

smelling fingers


We handled the exception (or the API returned a rubbish value), but we weren't able to transcribe the file. One thing we can try, is using the `adjust_for_ambient_noise()` method of the Recognizer class.

In [8]:
r = sr.Recognizer()

jackhammer = sr.AudioFile('./audio_files/jackhammer.wav')
with jackhammer as source:
    r.adjust_for_ambient_noise(source)
    audio = r.record(source)

r.recognize_google(audio)

'spell smell during windows'

Not exactly what we expected of the sentence "the stale smell of old beer lingers". But at least we've got some output. 

When working with noisy files, it can be helpful to see the actual API response. Most APIs return a JSON string containing many possible transcriptions. The `recognize_google()` method will always return the most likely transcription unless you force it to give you the full response.

You can do this by setting the show_all keyword argument of the recognize_google() method to True.

In [9]:
r.recognize_google(audio, show_all=True)

{'alternative': [{'transcript': 'spell smell during windows',
   'confidence': 0.60805869},
  {'transcript': 'spell smell off your fingers'},
  {'transcript': 'still smell up your windows'},
  {'transcript': 'still smell your fingers'},
  {'transcript': 'spell smell'}],
 'final': True}

Well, still not what we wanted. That means we need to do some more preprocessing on the input data. We need beter noise cancelling or filtering techniques. But, this is out of scope for this course.

## 5. Working with microphones

To access your microphone with *SpeechRecognizer*, you’ll have to install the PyAudio package. This can be done as follows:

In [10]:
pip install PyAudio
# If pip installing PyAudio doesn't work on your Windows machine, you might want to try:
#!pip install pipwin
#!pipwin install pyaudio

Collecting PyAudio
  Using cached PyAudio-0.2.13-cp310-cp310-win_amd64.whl (164 kB)
Installing collected packages: PyAudio
Successfully installed PyAudio-0.2.13
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


You can get a list of microphone names by calling the `list_microphone_names()` static method of the Microphone class.

In [13]:
sr.Microphone.list_microphone_names()

['Microsoft Sound Mapper - Input',
 'Microphone (HP USB-C Dock Audio',
 'Microphone (Realtek(R) Audio)',
 'Microsoft Sound Mapper - Output',
 'Speakers (Realtek(R) Audio)',
 'Headphones (HP USB-C Dock Audio',
 'Primary Sound Capture Driver',
 'Microphone (HP USB-C Dock Audio Headset)',
 'Microphone (Realtek(R) Audio)',
 'Primary Sound Driver',
 'Speakers (Realtek(R) Audio)',
 'Headphones (HP USB-C Dock Audio Headset)',
 'Headphones (HP USB-C Dock Audio Headset)',
 'Speakers (Realtek(R) Audio)',
 'Microphone (Realtek(R) Audio)',
 'Microphone (HP USB-C Dock Audio Headset)',
 'Speakers (Realtek HD Audio output)',
 'Microphone (Realtek HD Audio Mic input)',
 'Headphones (Realtek HD Audio 2nd output)',
 'Stereo Mix (Realtek HD Audio Stereo input)',
 'Microphone Array (Realtek HD Audio Mic Array input)',
 'Output (AMD HD Audio HDMI out #1)',
 'Microphone (HP USB-C Dock Audio Headset)',
 'Headphones (HP USB-C Dock Audio Headset)']

You can access the default system microphone by creating an instance of the Microphone class. If your system has no default microphone (such as on a RaspberryPi), or if you want to use a microphone other than the default, you will need to specify which one to use by supplying a device index (`mic = sr.Microphone(device_index=3)`).

You can capture input from the microphone using the listen() method of the Recognizer class inside of the with block. This method takes an audio source as its first argument and records input from the source **until silence is detected**.

Try speaking "hello" into your microphone. Wait a moment for the interpreter prompt to display again, then execute the recognize statement. Maybe try some other words, like "f*** you". Seems like Google is very polite. And maybe try to transcribe a complete sentence?

In [16]:
import speech_recognition as sr
r = sr.Recognizer()
mic = sr.Microphone()

with mic as source:
    audio = r.listen(source)

KeyboardInterrupt: 

In [15]:
r.recognize_google(audio)

UnknownValueError: 

**If the prompt never returns, your microphone is most likely picking up too much ambient noise.** As you know, you’ll need to use the adjust_for_ambient_noise() method.  Since input from a microphone is far less predictable than input from an audio file, it is a good idea to do this anytime you listen for microphone input. And also the try and except blocks are a good idea.

In [17]:
import speech_recognition as sr
r = sr.Recognizer()
mic = sr.Microphone()

with mic as source:
    r.adjust_for_ambient_noise(source)
    audio = r.listen(source)
    
try:
    print(r.recognize_google(audio))
except sr.RequestError:
    print("API was unreachable or unresponsive")
except sr.UnknownValueError:
    print("Unable to recognize speech")

acting if this works


## 6. Putting It All Together: A "Guess the Word" Game

Now that you’ve seen the basics of recognizing speech with the SpeechRecognition package let’s put your newfound knowledge to use and write a small game that picks a random word from a list and gives the user three attempts to guess the word.

Here is the full code (try to understand the code, this shouldn't be a problem):

In [None]:
import random 
import time

import speech_recognition as sr


def recognize_speech_from_mic(recognizer, microphone):
    """Transcribe speech from recorded from `microphone`.

    Returns a dictionary with three keys:
    "success": a boolean indicating whether or not the API request was
               successful
    "error":   `None` if no error occured, otherwise a string containing
               an error message if the API could not be reached or
               speech was unrecognizable
    "transcription": `None` if speech could not be transcribed,
               otherwise a string containing the transcribed text
    """
    # check that recognizer and microphone arguments are appropriate type
    if not isinstance(recognizer, sr.Recognizer):
        raise TypeError("`recognizer` must be `Recognizer` instance")

    if not isinstance(microphone, sr.Microphone):
        raise TypeError("`microphone` must be `Microphone` instance")

    # adjust the recognizer sensitivity to ambient noise and record audio
    # from the microphone
    with microphone as source:
        recognizer.adjust_for_ambient_noise(source)
        audio = recognizer.listen(source)

    # set up the response object
    response = {
        "success": True,
        "error": None,
        "transcription": None
    }

    # try recognizing the speech in the recording
    # if a RequestError or UnknownValueError exception is caught,
    #     update the response object accordingly
    try:
        response["transcription"] = recognizer.recognize_google(audio)
    except sr.RequestError:
        # API was unreachable or unresponsive
        response["success"] = False
        response["error"] = "API unavailable"
    except sr.UnknownValueError:
        # speech was unintelligible
        response["error"] = "Unable to recognize speech"

    return response


if __name__ == "__main__":
    # set the list of words, maxnumber of guesses, and prompt limit
    WORDS = ["apple", "banana", "grape", "orange", "mango", "lemon"]
    NUM_GUESSES = 3
    PROMPT_LIMIT = 5

    # create recognizer and mic instances
    recognizer = sr.Recognizer()
    microphone = sr.Microphone()

    # get a random word from the list
    word = random.choice(WORDS)

    # format the instructions string
    instructions = (
        "I'm thinking of one of these words:\n"
        "{words}\n"
        "You have {n} tries to guess which one.\n"
    ).format(words=', '.join(WORDS), n=NUM_GUESSES)

    # show instructions and wait 3 seconds before starting the game
    print(instructions)
    time.sleep(3)

    for i in range(NUM_GUESSES):
        # get the guess from the user
        # if a transcription is returned, break out of the loop and
        #     continue
        # if no transcription returned and API request failed, break
        #     loop and continue
        # if API request succeeded but no transcription was returned,
        #     re-prompt the user to say their guess again. Do this up
        #     to PROMPT_LIMIT times
        for j in range(PROMPT_LIMIT):
            print('Guess {}. Speak!'.format(i+1))
            guess = recognize_speech_from_mic(recognizer, microphone)
            if guess["transcription"]:
                break
            if not guess["success"]:
                break
            print("I didn't catch that. What did you say?\n")

        # if there was an error, stop the game
        if guess["error"]:
            print("ERROR: {}".format(guess["error"]))
            break

        # show the user the transcription
        print("You said: {}".format(guess["transcription"]))

        # determine if guess is correct and if any attempts remain
        guess_is_correct = guess["transcription"].lower() == word.lower()
        user_has_more_attempts = i < NUM_GUESSES - 1

        # determine if the user has won the game
        # if not, repeat the loop if user has more attempts
        # if no attempts left, the user loses the game
        if guess_is_correct:
            print("Correct! You win!".format(word))
            break
        elif user_has_more_attempts:
            print("Incorrect. Try again.\n")
        else:
            print("Sorry, you lose!\nI was thinking of '{}'.".format(word))
            break

## 7. Recognizing Speech in languages other than English - Exercise

Throughout this Notebook, we’ve been recognizing speech in English, which is the default language for each recognize_*() method of the SpeechRecognition package. However, it is absolutely possible to recognize speech in other languages, and this is quite simple to accomplish.

To recognize speech in a different language, set the __language__ keyword argument of the recognize_*() method to a string corresponding to the desired language. Most of the methods accept a BCP-47 language tag, such as 'en-US' for American English, 'fr-FR' for French or __'nl-NL'__ for Dutch.

Try to use the microphone to transcribe a Dutch (or other language) sentence.