## Ruse-Based Dialogsystems: ASR and NLU
Goals of today's exercise:
- Transcribe audio using the Google ASR cloud service
- Transcribe audio using vosk as a local speech recognition engine 
- NLU language understanding with Stanford stanza engine


Today we will get to know a library that connects various speech recognizers and online APIs. We use the Google Speech API as a cloud service and Vosk as a speech recognizer, which works locally.

Install the following library: https://pypi.org/project/SpeechRecognition/

In [None]:
pip install SpeechRecognition

In [None]:
import speech_recognition as sr
print(f"We use speech_recognition version: {sr.__version__}")

To record the microphone input and forward it to the speech recognizer, we use the PyAudio library. Install PyAudio with pip. Depending on the operating system, it may be necessary to install additional packages with homebrew (macOS) or apt (ubuntu). You will surely find a solution with Google:

In [None]:
pip install pyaudio

In [None]:
from pyaudio import PyAudio

p = PyAudio()
try:
    print(p.get_default_input_device_info())
except:
    print("No mics available")

### Transcribe a spoken utterance

The following tutorial provides good instructions on how to use the SpeechRecognition library: https://www.simplilearn.com/tutorials/python-tutorial/speech-recognition-in-python 

In [None]:
import speech_recognition as sr
recognizer = sr.Recognizer()

Capture a spoken utterance with the microphone. Say for example: "Call Paul on his mobile number"

In [None]:
with sr.Microphone() as source:
    print("Speak something...")
    audio_data = recognizer.listen(source)
    print("Audio data recorded.")

Now we send the captured utterance to the Google Web Speech API. If the API is not accessible, the ```recognizer``` throws a ```RequestError```. If the passed ```audio_data``` does not contain a speech utterance, the ```recognizer``` throws an ```UnknownValueError```. We check for both exceptions and give the user appropriate feedback on the console.

We would like to look at several recognition results and the respective confidence. With the ```show_all``` parameter, the ```recognizer``` outputs several alternatives as JSON.

In [None]:
import json

def recognize_google(audio_data):
    try:
        result = recognizer.recognize_google(audio_data, language='en-US', show_all=True)
        print(f"[Google] Recognition results: {json.dumps(result, sort_keys=True, indent=4)}")
        return result
    except sr.RequestError:
        print("[Google] RequestError: Could not access Google Web Speech API")
    except sr.UnknownValueError:
        print("[Google] UnknownValueError: Sorry, I do not understand")

google_result = recognize_google(audio_data)


We define a function ```recognize_vosk(audio_data, ..)``` for recognition and return the recognition result. Good programming practice is to pass all necessary objects as parameters rather than relying on global variables from the Jupyter Notebook. This allows the function to be easily copied into other notebooks without introducing global dependencies. The vosk recognizer runs locally without sending data to the cloud. Therefore, we need to download a model from https://alphacephei.com/vosk/models first:

In [None]:
import urllib.request
import os

if not os.path.exists('model'):
    model_name = 'vosk-model-small-en-us-0.15.zip'
    model_url = 'https://alphacephei.com/vosk/models/' + model_name
    urllib.request.urlretrieve(model_url, model_name)

    import zipfile
    with zipfile.ZipFile(model_name, 'r') as zip_ref:
        zip_ref.extractall()

    import os
    os.rename(model_name.replace('.zip', ''), 'model')

In [None]:
import json

def recognize_vosk(audio_data: any, recognizer: sr.Recognizer, language = 'en') -> any:
    try:
        text = recognizer.recognize_vosk(audio_data, language='de')
        return json.loads(text)
    except sr.RequestError:
        print("[Vosk] Error: Could not access Google Web Speech API;")
    except sr.UnknownValueError:
        print("[Vosk] Sorry, I do not understand")
    return None

vosk_result = recognize_vosk(audio_data, recognizer)
print(f"[Vosk] Recognition Result: {vosk_result}")

## NLU

We would like to use the Stanford CoreNLP to analyze the results from Google and Vosk. There is a Python implementation: https://stanfordnlp.github.io/stanza/ . Install it:


In [None]:
pip install stanza

We use the stanza pipeline tokenize, pos, lemma, depparse, and ner to interpret the results. For more information about the stanza pipeline see: https://stanfordnlp.github.io/stanza/neural_pipeline.html

In [None]:
import stanza

nlp = stanza.Pipeline(lang='en', processors='tokenize, pos, lemma, depparse, ner')

print("Google NLU result: ", nlp(google_result['alternative'][0]['transcript']))

print("Vosk NLU result: ", nlp(vosk_result['text']))