# LAB 5 - Voice for our chatbot

In [2]:
#!pip install PyJWT==1.7.1
#### the 'ibm_watson' package acts as a wrer. It removes a lot of the hard work, specially for the speech services.
#### The 'bs4' (BeautifulSoup4) package enables us to take the output from Watson Assistant and strip away the HTML, to keep only the raw text.
#!pip install ibm_watson bs4

## Import the right modules - For this lab, you'll need to import the following:

os - to run commands in the environment via "os.popen".

glob.glob - to find audio files.

bs4 - to extract text from HTML.

IPython - to play audio from Watson Text to Speech from within the notebook.

ibm_cloud_sdk_core.authenticators.IAMAuthenticator - to help with API Key-based authentication.

ibm_watson:

    a) SpeechToTextV1 - the Speech to Text service wrer.

    b) AssistantV2 - The Assistant service wrer.

    c) TextToSpeechV1 - the Text to Speech service wrer.

In [3]:
import os
from glob import glob
from bs4 import BeautifulSoup
import IPython
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson import SpeechToTextV1
from ibm_watson import AssistantV2
from ibm_watson import TextToSpeechV1

### Implementing Speech to Text
In order to implement the **Speech to Text** service, you need to first instantiate your service wrer. To do so, create a new instance of 'SpeechToTextV1'. You'll need to pass your API key through the IAMAuthenticator type, as well as the endpoint URL which you can find just under the API Key on the service instance page on IBM Cloud.

function called "recognize_audio()". This function is simple: it waits for a new audio file to ear in the current working directory (using SPEECH_EXTENSION). Right as it ears, it'll read the file, delete the file from the filesystem, and then pass it to Watson.

To parse this JSON, you navigate the hierarchy to get to the transcription that Watson is most confident in. This is how it's done:

    "["results"][0]" - this will get the first set of results from Watson's response.

    "["alternatives"][0]" - of all the alternative transcriptions, it'll get the first (most likely) one.

    "["transcript"]" - of all the data Watson returns, only take the transcript string ("str" type in Python).

In [4]:
recognition_service = SpeechToTextV1(IAMAuthenticator('FJDFwpG1AC42sE-qTnYEvDiUmrrpgHe7y87159y_kNeC'))
recognition_service.set_service_url('https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/62d3030a-919d-4c8e-8217-8d1fb9c10b17')
SPEECH_EXTENSION = "*.webm"
SPEECH_AUDIOTYPE = "audio/webm"

def recognize_audio():
    while len(glob(SPEECH_EXTENSION)) == 0:
        pass
    filename = glob(SPEECH_EXTENSION)[0]
    audio_file = open(filename, "rb")
    os.popen("rm " + filename)
    result = recognition_service.recognize(audio=audio_file, content_type=SPEECH_AUDIOTYPE).get_result()
    return result["results"][0]["alternatives"][0]["transcript"]


### Conversing with Watson Assistant - 
In order to facilitate the communication with the **Assistant** service, let's define a helper function! This function will take some text from the user, and return Watson's response. Before this function can be defined, we need to instantiate the wrer around the Assistant service itself. In order to do so, create a new instance of "AssistantV2". You'll need to provide your **API Key** via an IAMAuthenticator through the "authenticator" argument. You'll also need to provide a **version of the AssistantV1 service** - in this case, we're using "2019-02-28". You should check the documentation for the current version. You'll also need to define the **Assistant ID of your assistant**. Finally, you'll also need to specify your **endpoint URL** - you can find this on your service instance page right under the API Key

In [5]:
assistant = AssistantV2(version='2019-02-28', authenticator=IAMAuthenticator('Je3R-nT3a2GZyoA06_LLEIMh_O1l_HYVaY29OmvOxNa1'))
assistant.set_service_url('https://api.us-south.assistant.watson.cloud.ibm.com/instances/77c27d91-0c1c-4009-9e8d-071e32681d25')
ASSISTANT_ID = '2fde6b1d-9c89-45a6-8b7a-a53db38c27f6'
session_id = assistant.create_session(assistant_id =ASSISTANT_ID).get_result()["session_id"]

def message_assistant(text):
    response = assistant.message(assistant_id=ASSISTANT_ID,
                                 session_id=session_id,
                                 input={'message_type': 'text', 'text': text}).get_result()
    return BeautifulSoup(response["output"]["generic"][0]["text"]).get_text()

#### Finally, we'll go ahead and ask the Assistant to create a new "session". 
With a session, Watson can automatically keep track of the context of a conversation. This means you don't need to handle the context and pass it back and forth with Watson manually. To differentiate between session, you have a session ID, which we store in "session_id". You can now define the "message_assistant" function. The working of this function is simple:

    1 Message the assistant with the user's utterance and the current session ID, and get a JSON response.

    2 Return the raw text extracted from the HTML of the first response that Watson returned.

In [6]:
assistant = AssistantV2(version='2019-02-28', authenticator=IAMAuthenticator('Je3R-nT3a2GZyoA06_LLEIMh_O1l_HYVaY29OmvOxNa1'))
assistant.set_service_url('https://api.us-south.assistant.watson.cloud.ibm.com/instances/77c27d91-0c1c-4009-9e8d-071e32681d25')
ASSISTANT_ID = "2fde6b1d-9c89-45a6-8b7a-a53db38c27f6"
session_id = assistant.create_session(assistant_id =ASSISTANT_ID).get_result()["session_id"]

def message_assistant(text):
    response = assistant.message(assistant_id=ASSISTANT_ID,
                                 session_id=session_id,
                                 input={'message_type': 'text', 'text': text}).get_result()
    return BeautifulSoup(response["output"]["generic"][0]["text"]).get_text()


### Hearing Watson's response - 
To enable a truly end-to-end intuitive and interactive experience, let's use **Text to Speech** to synthesize audio and have Watson speak! Start by initializing the "TextToSpeechV1" wrer. Pass it your API Key through an IAMAuthenticator, and your API endpoint, which you can find right under the API Key in your service dashboard on IBM Cloud. Then, define a new function called "speak_text". This is what it'll do:

    1 Open a new file "temp.wav".

    2 Take the text that Watson needs to speak and pass it to the "synthesisservice.synthesize()" function. Tell it we're passing a WAV file, and tell it we want the "en-US_AllisonV3Voice" voice. You can see more voices [here](https://cloud.ibm.com/apidocs/text-to-speech?cm_mmc=Email_Newsletter--DeveloperEd%2BTech--WWWW--SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-CB0106EN-SkillsNetwork-20719128&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ).

    3 Write Watson's response to the "temp.wav" file.

    4 Play the "temp.wav" file.

In [7]:
synthesis_service = TextToSpeechV1(IAMAuthenticator('9ikPoYJv4fNVPM9Wu-8JCLM0AryW_ANGt97YxZZCnfRs'))
synthesis_service.set_service_url('https://api.us-south.text-to-speech.watson.cloud.ibm.com/instances/6e6b30fe-6a58-4216-bdfe-2a286a39eba9')

def speak_text(text):
    with open('temp.wav', 'wb') as audio_file:
        response = synthesis_service.synthesize(text, accept='audio/wav', voice="en-US_AllisonV3Voice").get_result()
        audio_file.write(response.content)
    return IPython.display.Audio("temp.wav", autoplay=True)


### Putting the pieces together - 
Because of the way these functions work, putting them together is as easy as chaining them together! By calling "recognize_audio()", you're waiting for the user to provide some input. Then, that is passed to the "message_assistant()" function. The output of that function is passed to "speak_text", which provides output to the user! 

### To interact with the chatbot in this lab, simply run this cell for every utterance. To be specific:

    1 Run the following cell.

    2 Record audio.

    3 Wait until you hear Watson's response

    4 Until you're done, repeat.

In [None]:
speak_text(message_assistant(recognize_audio()))

### That's all! Now, by running this cell every time you wish to speak to Watson, you'll be able to interact in a natural, vocal manner.