<h1 style="text-align: center;text-transform: uppercase;">Conversational Based Agent</h1>

<br>

In this project, you will build an end-to-end voice conversational agent, which can take a voice input audio line, and synthesize a response. The chatbot agent will be executed locally on your computer. 

<img style="width:550px; height:300px;" src="assets/intro.png">

This jupyter notebook is consists of the following parts:
1. __Speech Recognition:__ <br>In this part, you will create a speech recognition that can convert your voice into a text format.<br><br>
2. __Chatbot:__ <br>This is the core of your conversational based agent. You will build a chatbot that will answer your questions. <br><br>
3. __Text to Speech:__ <br>After getting the answer from your chatbot, it should be converted into a voice format and that is what you should create in this part. <br><br>
4. __Finalize your Conversational Based Agent:__ <br>At the very end step, you will put everything together and create your Conversational Based Agent.

<br>

# 1. Speech Recognition

---

In this part, we will use <a href="https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/">Microsoft Azure</a> for performing the speech recognition. Using the Speech service is easy and affordable. 


<br>

### 1.1. Create your Azure Account

---


Before doing any speech recognition task, you need to follow these steps for setting up your account at Azure Microsoft:

1. If you do not have a Microsoft account, you can sign up for one free of charge at the <a href="https://account.microsoft.com/account">Microsoft account portal</a>. <br><br>

2. Once you have a Microsoft account, go to the <a href="https://azure.microsoft.com/en-gb/free/ai/">Azure sign-up page</a>, select Start free, and create a new Azure account using a Microsoft account. <br><br>

3. Sign in to the <a href="https://portal.azure.com/">Azure portal</a> using your Microsoft account. <br><br>

4. Select Create a resource at the top left of the portal. If you do not see Create a resource, you can always find it by selecting the collapsed menu in the upper left:

<img width="500px" src="assets/collapsed-nav.png">
<br><br>

5. In the New window, type "speech" in the search box and press ENTER. <br><br>

6. In the search results, select Speech.

<img width="700px" src="assets/speech-search.png">
<br><br>

7. Select Create, then:
    - 7.1. Give a unique name for your new resource.
    - 7.2. Choose the Azure subscription that the new resource is associated with to determine how the fees are billed.
    - 7.3. Choose the <a href="https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/regions">region</a>.
    - 7.4. Choose either a free (F0) or paid (S0) pricing tier.
    - 7.5. Create a new resource group for this Speech subscription or assign the subscription to an existing resource group. Resource groups help you keep your various Azure subscriptions organized.
    - 7.6. Select Create. This will take you to the deployment overview and display deployment progress messages.
<br><br>

It takes a few moments to deploy your new Speech resource. Once deployment is complete, select __Go to resource__ and in the left navigation pane select __Keys__ to display your Speech service subscription keys. Each subscription has two keys; you can use either key in your application. To quickly copy/paste a key to your code editor or other location, select the copy button next to each key, switch windows to paste the clipboard contents to the desired location.

<br>

### 1.2. Perform the Speech Recognition Task

---

This section shows you how to use the Speech Service through the Speech SDK for Python. It illustrates how the SDK can be used to recognize speech from microphone input.

First, set up some general items. Import the Speech SDK Python:

In [42]:
# Import the libraries
import azure.cognitiveservices.speech as speechsdk

Set up the subscription info for the Speech Service:

In [43]:
# Create an instance of a speech config with specified subscription key and service region.
speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"

Create an instance of a speech config with specified subscription key and service region. Replace with your own subscription key and service region (e.g., "westus").

In [44]:
# Create an instance of a speech config with specified subscription key and service region
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

Create a recognizer with the given settings. Since no explicit audio config is specified, the default microphone will be used (make sure the audio settings are correct).

In [45]:
# Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config = speech_config)

Starts speech recognition, and returns after a single utterance is recognized. The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed. The task returns the recognition text as result. 

__Note:__ Since `recognize_once()` returns only a single utterance, it is suitable only for single shot recognition like command or query. For long-running multi-utterance recognition, use `start_continuous_recognition()` instead.

In [46]:
# Speech recognition for a single utterance 
print("Say something...")
result = speech_recognizer.recognize_once()
print("Finished!")

Say something...
Finished!


In [59]:
print("Recognized speech: \t", result.text)

Recognized speech: 	 Hello, how are you doing?


<br>

# 2. Chatbot

---


In this part, you will create a deep learning based conversational agent. This agent will be able to interact with users and understand their questions. More specifically, you will start with loading the dataset, cleaning and preprocessing them, and then you will feed them into a neural network.

<br>

### 3.1. Load and Clean the Chatterbot Dataset 

---

In this project, we have provided you with multiple dataset files. Each of these files contains conversations regarding a specific topic. For example, topics about humor, food, movies, science, history, etc. You can read the description of each dataset in below:

| Name of Dataset | Description |
| :----:| :----: |
| botprofile.yml | Personality of Your Chatbot |
| humor.yml | Joke and Humor |
| emotion.yml | Emotional Conversations |
| politics.yml | Political Conversations |
| ai.yml | General Questions about AI |
| computers.yml | Conversations about Computer |
| history.yml | Q&A about Historical Facts and Events |
| psychology.yml | Psychological Conversations |
| food.yml | Food Related Conversations. |
| literature.yml | Conversations about Different Books, Authors, Genres |
| money.yml | Conversations about Money, Investment, Economy |
| trivia.yml | Conversations that Have Small Values |
| gossip.yml | Gossipy Conversations |
| conversations.yml | Common Conversations |
| greetings.yml | Different Ways of Greeting |
| sports.yml | Conversations about Sports. |
| movies.yml | Conversation about Movies. |
| science.yml | Conversations about Science  |
| health.yml | Health Related Questions and Answers. |


Feel free to modify these datasets in the way you want the chatbot to behave. 

In [6]:
# Import the libraries
import yaml
from yaml import Loader
import glob
import datetime

In [13]:
# Function for loading all of the yml files
def load_chatterbot_dataset():
    
    # Initialize empty lists for questions and answers
    questions, answers = [], []
    
    # Get the list of all dataset names
    dataset_names = glob.glob("datasets/chatterbot/*.yml")
    
    # Iterate through each dataset name
    for i_dataset_name in tqdm.tqdm(dataset_names):
        
        # Load the dataset
        with open(i_dataset_name) as file:
            greeting = yaml.load(file, Loader = Loader)["conversations"]
            
        # Iterate through each conversation
        for i_conversation in greeting:
            
            # If length is two
            if len(i_conversation) == 2:
                
                # Append the question to 'questions' list
                questions.append(i_conversation[0])
                
                # Append the answer to 'answers' list
                answers.append(i_conversation[1])
            
            # If length is more than two
            elif len(i_conversation) > 2:
                
                # Iterate through each index
                for index in range (len(i_conversation)-1):
                    
                    # Append the question and answer
                    questions.append(i_conversation[0])
                    answers.append(i_conversation[index+1])
                    
    return questions, answers

In [14]:
# Get the questions and answers
questions_chatterbot, answers_chatterbot = load_chatterbot_dataset()

100%|██████████| 19/19 [00:00<00:00, 52.21it/s]


In [17]:
print("Total Question & Answers: ", len(questions_chatterbot))

Total Question & Answers:  869


<br>

### 3.2. Load and Clean the Stanford Dataset 

---

The second dataset that we are going to use, is called <a href="https://rajpurkar.github.io/SQuAD-explorer">Stanford Question Answering Dataset (SQuAD 2.0)</a>. SQuAD is a reading comprehension dataset and a standard benchmark for QA models. THe dataset is publicly available on the website. 

In [19]:
# Import the libraries
import json

In [20]:
# Load the dataset
with open('datasets/stanford question answering/dev-v2.0.json', 'r') as f:
    train_data = json.load(f)
    
# Get the training data
train_data = [item for topic in train_data['data'] for item in topic['paragraphs'] ]

In [21]:
def load_stanford_dataset():
    # Initialize the total questions and answers
    total_questions, total_answers = [], []

    # Iterate through train_data
    for td in train_data:

        # Get the list of questions and answers
        qas = td["qas"]

        # Iterate through each question and answer
        for i_qas in qas:

            # Get the question
            question = i_qas["question"]
            
            # Get "answers" if it exists
            if i_qas["answers"]:
                answer = i_qas["answers"][0]["text"]
                
            # Get "plausible_answers" if it exists
            elif i_qas["plausible_answers"]:
                answer = i_qas["plausible_answers"][0]["text"]
                
            # If none of above exists then go to the beginning of the loop
            else:
                continue
            
            # Append the questions and answers into the total questions and answers
            total_questions.append(question)
            total_answers.append(answer)
            
    return total_questions, total_answers

In [22]:
# Get the questions and answers
questions_stanford, answers_stanford = load_stanford_dataset()

In [23]:
print("Total Question & Answers: ", len(questions_stanford))

Total Question & Answers:  11858


<br>

### 3.3. Combine all Datasets

---

Now let's combine the Chatterbot dataset with Stanford Question Answering Dataset.

In [26]:
# Combine datasets
questions = questions_chatterbot + questions_stanford
answers = answers_chatterbot + answers_stanford
print("Total Question & Answers: ", len(questions))

Total Question & Answers:  12727


In [28]:
# Get a smaller sample of dataset (for memory purposes)
sample_size = 8500
questions = questions[:sample_size]
answers = answers[:sample_size]
print("Total Question & Answers: ", len(questions))

Total Question & Answers:  8500


<br>

### 3.4. Data Preprocessing

---

After cleaning the dataset, you should preprocess the dataset by following the below steps:

1. Lower case the text.
2. Decontract the text (e.g. she's -> she is, they're -> they are, etc.).
3. Remove the punctuation (e.g. !, ?, $, %, #, @, ^, etc.).
4. Tokenization.
5. Pad the sequences to be the same length.

In [45]:
# import the libraries
import numpy as np
import contractions
import re
from tensorflow.keras import preprocessing, utils
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [46]:
# Function for preprocessing the given text
def preprocess_text(text):
    
    # Lowercase the text
    text = text.lower()
    
    # Decontracting the text (e.g. it's -> it is)
    text = contractions.fix(text)
    
    # Remove the punctuation
    text = re.sub(r"[^a-zA-Z0-9]", " ", text)
    
    return text

In [47]:
# Preprocess the questions
questions_preprocessed = []
for i_question in tqdm(questions):
    questions_preprocessed.append(preprocess_text(i_question))
    
# Preprocess the answers
answers_preprocessed = []
for i_answer in tqdm(answers):
    answers_preprocessed.append(preprocess_text(i_answer))    

100%|██████████| 869/869 [00:00<00:00, 75283.49it/s]
100%|██████████| 869/869 [00:00<00:00, 65740.49it/s]


In [48]:
# Take a look at the preprocessed questions and answers
for i in range(4):
    print("Question {}: \n".format(i), questions_preprocessed[i])
    print("")
    print("Answer {}: \n".format(i), answers_preprocessed[i])
    print("--------------------------------------------------------------------------")

Question 0: 
 have you read the communist

Answer 0: 
 yes  marx had made some interesting observations 
--------------------------------------------------------------------------
Question 1: 
 what is a government

Answer 1: 
 ideally it is a representative of the people 
--------------------------------------------------------------------------
Question 2: 
 what is greenpeace

Answer 2: 
 global organization promoting environmental activism 
--------------------------------------------------------------------------
Question 3: 
 what is capitalism

Answer 3: 
 the economic system in which all or most of the means of production and distribution  as land  factories  railroads  etc   are privately owned and operated for profit  originally under fully competitive conditions 
--------------------------------------------------------------------------


To ensure that every training example are the type string, we need to first filter out both answers and questions that are not string.

In [49]:
answers_with_tags = list()
for i in range(len(answers)):
    if type(answers[i]) == str:
        answers_with_tags.append(answers[i])
    else:
        questions.pop(i)

After preprocessing the dataset, we should add a start tag (e.g. `<START>`) and an end tag (e.g. `<END>`) to answers. Remember that we will only add these tags to answers and not questions. This requirement is because of the Seq2Seq model.

In [50]:
# Add <START> and <END> tag to each sentence
answers = list()
for i in range(len(answers_with_tags)):
    answers.append('<START> ' + answers_with_tags[i] + ' <END>')

In [51]:
answers[:5]

['<START> yes, marx had made some interesting observations. <END>',
 '<START> ideally it is a representative of the people. <END>',
 '<START> global organization promoting environmental activism. <END>',
 '<START> the economic system in which all or most of the means of production and distribution, as land, factories, railroads, etc., are privately owned and operated for profit, originally under fully competitive conditions. <END>',
 '<START> an established system of political administration by which a nation, state, district, etc. is governed. <END>']

Now it's time to tokenize our dataset. We use a class in Keras which allows us to vectorize a text corpus, by turning each text into either a sequence of integers (each integer being the index of a token in a dictionary) or into a vector where the coefficient for each token could be binary, based on word count, based on tf-idf, etc.


In [52]:
# Initialize the tokenizer
tokenizer = preprocessing.text.Tokenizer()

# Fit the tokenizer to questions and answers
tokenizer.fit_on_texts(questions + answers)

# Get the total vocab size
VOCAB_SIZE = len(tokenizer.word_index) + 1

print( 'VOCAB SIZE : {}'.format(VOCAB_SIZE))

VOCAB SIZE : 1975


In [54]:
### encoder input data

# Tokenize the questions
tokenized_questions = tokenizer.texts_to_sequences(questions)

# Get the length of longest sequence
maxlen_questions = max([len(x) for x in tokenized_questions])

# Pad the sequences
padded_questions = pad_sequences(tokenized_questions, maxlen=maxlen_questions, padding='post')

# Convert the sequences into array
encoder_input_data = np.array(padded_questions)

print(encoder_input_data.shape, maxlen_questions)

(869, 22) 22


In [58]:
### decoder input data

# Tokenize the answers
tokenized_answers = tokenizer.texts_to_sequences(answers)

# Get the length of longest sequence
maxlen_answers = max([len(x) for x in tokenized_answers])

# Pad the sequences
padded_answers = pad_sequences(tokenized_answers, maxlen=maxlen_answers, padding='post')

# Convert the sequences into array
decoder_input_data = np.array(padded_answers)

print(decoder_input_data.shape, maxlen_answers)

(869, 45) 45


In [59]:
### decoder_output_data

# Iterate through index of tokenized answers
for i in range(len(tokenized_answers)) :

    #
    tokenized_answers[i] = tokenized_answers[i][1:]

# Pad the tokenized answers
padded_answers = pad_sequences(tokenized_answers, maxlen = maxlen_answers, padding = 'post')

# One hot encode
onehot_answers = utils.to_categorical(padded_answers, VOCAB_SIZE)

# Convert to numpy array
decoder_output_data = np.array(onehot_answers)

print(decoder_output_data.shape)

(869, 45, 1975)


In [60]:
# Saving all the arrays to storage
np.save("enc_in_data.npy", encoder_input_data)
np.save("dec_in_data.npy", decoder_input_data)
np.save("dec_tar_data.npy", decoder_output_data)

In [61]:
# Load all the arrays from storage
encoder_input_data = np.load("enc_in_data.npy")
decoder_input_data = np.load("dec_in_data.npy")
decoder_output_data = np.load("dec_tar_data.npy")

<br>

### 3.5. Train the Seq2Seq Model

---

In this section, we will use an architecture called Sequence to Sequence (or Seq2Seq). This model is used since the length of the input sequence (question) does not match the length of the output sequence (answer). This model is consists of an encoder and a decoder.
- __Encoder:__ <br> In this part of the network, we take the input data and train on it. Then we pass the last state of the recurrent layer to decoder. <br><br>
- __Decoder:__ <br> In this part of the network, we take the last state in encoder’s last recurrent layer. Then we will use it as an initial state in decoder's first recurrent layer.

<br>

<img src="assets/encoder_decoder.png">

<br>

Let's start by importing all the necessary libraries in Keras.

In [62]:
# Import the libraries
import tensorflow.keras
from tensorflow.keras.layers import Input, Embedding, LSTM, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.activations import softmax
from tensorflow.keras.callbacks import ModelCheckpoint

Below you can play around with hyperparameters for improving the model's accuracy.

In [63]:
# Hyper parameters
BATCH_SIZE = 32
EPOCHS = 100
LEARNING_RATE = 1e-3

In the following block of code, you will implement the Encoder. You can follow the below steps for creating the encoder: 

1.   Create an input for the Encoder.
2.   Create an embedding layer.
3.   Create an LSTM layer which also returns the states.
4.   Get the hidden state (state h) and cell state (state c) inside a list.

In [64]:
### Encoder Input
embed_dim = 200
num_lstm = 200

# Input for encoder
encoder_inputs = Input(shape = (None, ))

# Embedding layer
# Why mask_zero = True? https://www.tensorflow.org/guide/keras/masking_and_padding
encoder_embedding = Embedding(input_dim = VOCAB_SIZE, output_dim = embed_dim, mask_zero = True)(encoder_inputs)

# LSTM layer (that returns states in addition to output)
encoder_outputs, state_h, state_c = LSTM(units = num_lstm, return_state = True)(encoder_embedding)

# Get the states for encoder
encoder_states = [state_h, state_c]

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


After creating your encoder, it's time to implement the decoder. You can follow the below steps for implementing the decoder:

1.   Create an input for the decoder.
2.   Create an embedding layer.
3.   Create an LSTM layer that returns states and sequences.
4.   Create a dense layer.
5.   Get the output.

In [65]:
### Decoder

# Input for decoder
decoder_inputs = Input(shape = (None,  ))

# Embedding layer
decoder_embedding = Embedding(input_dim = VOCAB_SIZE, output_dim = 200 , mask_zero = True)(decoder_inputs)

# LSTM layer (that returns states and sequences as well)
decoder_lstm = LSTM(units = 200 , return_state = True , return_sequences = True)

# Get the output of LSTM layer, using the initial states from the encoder
decoder_outputs, _, _ = decoder_lstm(inputs = decoder_embedding, initial_state = encoder_states)

# Dense layer
decoder_dense = Dense(units = VOCAB_SIZE, activation = softmax) 

# Get the output of Dense layer
output = decoder_dense(decoder_outputs)

Now that you have implemented the encoder and decoder. It's time to create your model which takes two inputs: encoder's input and decoder's input. Then it outputs the decoder's output.

In [66]:
# Create the model
model = Model([encoder_inputs, decoder_inputs], output)

In [67]:
# Compile the model
model.compile(optimizer = RMSprop(lr = LEARNING_RATE), loss = "categorical_crossentropy")

In [68]:
# Summary
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, None, 200)    395000      input_1[0][0]                    
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, None, 200)    395000      input_2[0][0]                    
______________________________________________________________________________________________

In [None]:
# Train the model
model.fit(x = [encoder_input_data , decoder_input_data], 
          y = decoder_output_data, 
          batch_size = BATCH_SIZE, 
          epochs = EPOCHS) 

In [30]:
# Save the final model
timestamp = datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
model.save(filepath = './saved models/final_weight.h5') 
print("Model Weight Saved!")

Model Weight Saved!


In [70]:
# Load the final model
model.load_weights('./saved models/final_weight.h5') 
print("Model Weight Loaded!")

Model Weight Loaded!


<br>

### 3.6. Inference

---

Now it's time to use our model for inference. In other words, we will ask a question to our chatbot and it will answer us.

In [71]:
# Function for making inference
def make_inference_models():
    
    # Create a model that takes encoder's input and outputs the states for encoder
    encoder_model = Model(encoder_inputs, encoder_states)
    
    # Create two inputs for decoder which are hidden state (or state h) and cell state (or state c)
    decoder_state_input_h = Input(shape = (200, ))
    decoder_state_input_c = Input(shape = (200, ))
    
    # Store the two inputs for decoder inside a list
    decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
    
    # Pass the inputs through LSTM layer you have created before
    decoder_outputs, state_h, state_c = decoder_lstm(decoder_embedding, initial_state = decoder_states_inputs)
    
    # Store the outputted hidden state and cell state from LSTM inside a list
    decoder_states = [state_h, state_c]

    # Pass the output from LSTM layer through the dense layer you have created before
    decoder_outputs = decoder_dense(decoder_outputs)

    # Create a model that takes decoder_inputs and decoder_states_inputs as inputs and outputs decoder_outputs and decoder_states
    decoder_model = Model([decoder_inputs] + decoder_states_inputs,
                          [decoder_outputs] + decoder_states)
    
    return encoder_model , decoder_model

In [72]:
# Function for converting strings to tokens
def str_to_tokens(sentence:str):

    # Lowercase the sentence and split it into words
    words = sentence.lower().split()

    # Initialize a list for tokens
    tokens_list = list()

    # Iterate through words
    for word in words:

        # Append the word index inside tokens list
        tokens_list.append(tokenizer.word_index[word]) 

    # Pad the sequences to be the same length
    return pad_sequences([tokens_list] , maxlen = maxlen_questions, padding = 'post')

In [41]:
# Initialize the model for inference
enc_model , dec_model = make_inference_models()

# Iterate through the number of times you want to ask question
for _ in range(5):

    # Get the input and predict it with the encoder model
    states_values = enc_model.predict(str_to_tokens(preprocess_text(input('Enter question : '))))

    # Initialize the target sequence with zero - array([[0.]])
    empty_target_seq = np.zeros(shape = (1, 1))

    # Update the target sequence with index of "start"
    empty_target_seq[0, 0] = tokenizer.word_index["start"]

    # Initialize the stop condition with False
    stop_condition = False

    # Initialize the decoded words with an empty string
    decoded_translation = ''

    # While stop_condition is false
    while not stop_condition :

        # Predict the (target sequence + the output from encoder model) with decoder model
        dec_outputs , h , c = dec_model.predict([empty_target_seq] + states_values)

        # Get the index for sampled word
        sampled_word_index = np.argmax(dec_outputs[0, -1, :])

        # Initialize the sampled word with None
        sampled_word = None

        # Iterate through words and their indexes
        for word, index in tokenizer.word_index.items() :

            # If the index is equal to sampled word's index
            if sampled_word_index == index :

                # Add the word to the decoded string
                decoded_translation += ' {}'.format(word)

                # Update the sampled word
                sampled_word = word
        
        # If sampled word is equal to "end" OR the length of decoded string is more that what is allowed
        if sampled_word == 'end' or len(decoded_translation.split()) > maxlen_answers:

            # Make the stop_condition to true
            stop_condition = True
            
        # Initialize back the target sequence to zero - array([[0.]])    
        empty_target_seq = np.zeros(shape = (1, 1))  

        # Update the target sequence with index of "start"
        empty_target_seq[0, 0] = sampled_word_index

        # Get the state values
        states_values = [h, c] 

    # Print the decoded string
    print(decoded_translation[:-3])

Enter question : Hello
 hi 
Enter question : Hi
 hello 
Enter question : How are you doing?
 i am doing well how about you 
Enter question : Can i ask you a question?
 sure ask 
Enter question : What are your interests?
 i am interested in all kinds of things we can talk about anything 


<br>

# 4. Text to Speech

---

In this section, we will use a library called pyttsx3 which does the text-to-speech conversion. Unlike alternative libraries, this works offline and is compatible with both Python 2 and 3.

In [47]:
# Import the libraries
import pyttsx3

In [48]:
# Construct a new TTS engine instance
engine = pyttsx3.init()

In [49]:
# Get all of the voices
voices = engine.getProperty('voices')

# Loop over voices and print their descriptions
for index, voice in enumerate(voices):
    print("Voice {}: ".format(index))
    print(" - ID: %s" % voice.id)
    print(" - Name: %s" % voice.name)
    print(" - Languages: %s" % voice.languages)
    print(" - Gender: %s" % voice.gender)
    print(" - Age: %s" % voice.age)
    print("")

Voice 0: 
 - ID: com.apple.speech.synthesis.voice.Alex
 - Name: Alex
 - Languages: ['en_US']
 - Gender: VoiceGenderMale
 - Age: 35

Voice 1: 
 - ID: com.apple.speech.synthesis.voice.alice
 - Name: Alice
 - Languages: ['it_IT']
 - Gender: VoiceGenderFemale
 - Age: 35

Voice 2: 
 - ID: com.apple.speech.synthesis.voice.alva
 - Name: Alva
 - Languages: ['sv_SE']
 - Gender: VoiceGenderFemale
 - Age: 35

Voice 3: 
 - ID: com.apple.speech.synthesis.voice.amelie
 - Name: Amelie
 - Languages: ['fr_CA']
 - Gender: VoiceGenderFemale
 - Age: 35

Voice 4: 
 - ID: com.apple.speech.synthesis.voice.anna
 - Name: Anna
 - Languages: ['de_DE']
 - Gender: VoiceGenderFemale
 - Age: 35

Voice 5: 
 - ID: com.apple.speech.synthesis.voice.carmit
 - Name: Carmit
 - Languages: ['he_IL']
 - Gender: VoiceGenderFemale
 - Age: 35

Voice 6: 
 - ID: com.apple.speech.synthesis.voice.damayanti
 - Name: Damayanti
 - Languages: ['id_ID']
 - Gender: VoiceGenderFemale
 - Age: 35

Voice 7: 
 - ID: com.apple.speech.synthesis.

In [50]:
### Voice properties    

# Speed percent (can go over 100)
engine.setProperty(name = 'rate', value = 180)    

# Volume 0-1
engine.setProperty(name = 'volume', value = 0.9)

# Voice ID
en_voice_id = "com.apple.speech.synthesis.voice.daniel.premium"
engine.setProperty('voice', en_voice_id)

In [51]:
# Convert the text to speech
engine.say("You've got mail!")
engine.say("The pyttsx3 module supports native Windows and Mac speech APIs but also supports espeak, making it the best available text-to-speech package.")
engine.runAndWait() 

<br>

# 5. Finalize your Conversational Based Agent

---

Now it's time to put everything together so you can do speech-to-text, text-to-text, and text-to-speech at the same time. For this, you will create a button which after pushing you can speak and your model will speck to you.

In [91]:
# Import the libraries 
import ipywidgets as widgets
from IPython.display import display
from text_to_text import text_to_text

In [92]:
# Conversational based agent 
def agent():
    button = widgets.Button(description="Click Here for Talking!")
    output = widgets.Output()
    display(button, output)
    def on_button_clicked(b):
        with output:
            # Speech recognition
            print("Say Something...")
            text = speech_recognizer.recognize_once().text
            print(" - YOU SAID: ", text)
            # Text-to-text
            response = text_to_text(text, enc_model, dec_model, str_to_tokens, preprocess_text, tokenizer, maxlen_answers)
            print(" + AGENT: ", response)
            # Text to speech
            engine.say(response)
            engine.runAndWait() 
            print("")
    button.on_click(on_button_clicked)

In [59]:
# Talk to your agent
agent()

Button(description='Click Here for Talking!', style=ButtonStyle())

Output()

# Good Job!