# 1. Applications
- Shopping and e-commerce
- News and content discovery
- Customer service
- Medical
- Legal

## A Simple FAQ (frequently asked questions) Bot
*Amazon ML FAQ to be used for a FAQ bot*

| Questions                                                                                                               | Answer                                                                                                                                                                                                                                                      |
|-------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| What can I do with Amazon Machine Learning? How can I use Amazon Machine Learning? What can Amazon Machine Learning do? | You can use Amazon Machine Learning to create a wide variety of predictive applications. For example, you can use Amazon Machine Learning to help you build applications that flag suspicious transactions, detect fraudulent orders, forecast demand, etc. |
| What algorithm does Amazon Machine Learning use to generate models? How does Amazon Machine Learning build models?      | Amazon Machine Learning currently uses an industrystandard logistic regression algorithm to generate models.                                                                                                                                                |
| Are there limits to the size of the dataset I can use for training? What is the maximum size of training dataset?       | Amazon Machine Learning can train models on datasets up to 100 GB in size.                                                                                                                                                                                  |

# 2. A Taxonomy of Chatbots
- Exact answer or FAQ bot with limited conversations
- Flow-based bot
- Open-ended bot

![](images/typeofchatbot.png)

## Goal-Oriented Dialog:
The natural human purpose of having a conversation is to accomplish a goal via relevant information seeking.

## Chitchats:
Humans also engage in
unstructured, open-domain conversations without any specific goals.
These human-human conversations involve free-form, opinionated
discussions about various topics.

# 3. A Pipeline for Building Dialog Systems
![](images/pipefordiago.png)

<center>Pipeline for a dialog system</center>

- Speech recognition algorithms transcribe speech to natural text.
- Natural language understanding (NLU): the system tries to analyze and “understand” the
transcribed text.
- Dialog and task manager:  gathers and systematically decides which
pieces of information are important or not
- Natural language generation: decides a strategy for responding,
the natural language generation module generates a response in a
human-readable form according to the strategy devised by the
dialog manager.

# 4. Dialog Systems in Detail
- Dialog act or intent: This is the aim of a user command
- Slot or entity: holds information
regarding specific entities related to the intent.
- Dialog state or context: ontological construct that contains both the
information about the dialog act as well as state-value pairs.

![](images/terminology.png)
<center>Example of different terminology used in chatbots</center>

## PizzaStop Chatbot
Dialogflow is a conversational agent–building platform by Google. By
providing the tools to understand and generate natural language and
manage the conversation.

### BUILDING OUR DIALOGFLOW AGENT
1. First, we need to create an agent.
![](images/create-dialogflow.png)
<center>Creating an agent using Dialogflow</center>

2. You’ll then be redirected to another page with options that
allow you to create the bot.

![](images/dialogflow-UI.png)
<center>Dialogflow UI after creating an agent</center>

3. Now, we need to add the intents and entities we care about to
our agent.
4. Now, we’ll create the first intent: orderPizza. As we create a
new intent, we have to provide training examples, called
“training phrases,” to enable the bot to detect variations of
responses that belong to the intent.
5. Since we’ve included intent, we need to add the respective
entities to remember important information provided by the
user. Create an entity named pizzaSize, enable “fuzzy
matching” (which matches entities even if they’re only
approximately the same), and provide the necessary values.
6. Now, let’s go back to the Intents block to add additional
information to the Action and Parameters section.
7. We also need to provide sample responses.
8. So far, we’ve added a simple intent and entities.
9. We can add many more intents and entities to make our agent
robust.

### TESTING OUR AGENT
![](images/test_pizza-oder.png)
<center>Making a simple order using our agent</center>

# 5. Deep Dive into Components of a Dialog System
![](images/restaurent-booking.png)
<center>Conversation about restaurant booking</center>

## Dialog Act Classification
Dialog act classification is a task to identify how the user utterance
plays a role in the context of dialog. Identifying intent helps to understand what the user is asking for and to
take actions accordingly.

## Identifying Slots
Once we’ve extracted the intents, we want to move on to extracting
entities. Extracting entities is also important for generating correct and
appropriate responses to the user’s input. 

## Response Generation
- Fixed responses
- Use of templates
- Automatic generation

## Dialog Examples with Code Walkthrough
### DATASETS
Table. Goal-oriented datasets from various domains and
their usage

| Dataset  | Domain           | Usage                                                                                                                                                                                                                               |
|----------|------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ATIS     | AirTicketBooking | Benchmark for intent classification and slot filling. This is a single-domain dataset, hence entities and intents are restricted to one domain.                                                                                     |
| SNIPS    | Multidomain      | Benchmark for intent classification and slot filling. This is a multidomain dataset, hence the entities belong to multiple domains. Multiple-domain datasets are challenging to model due their variability.                        |
| DSTC     | Restaurants      | Benchmark for dialog state tracking or joint determination of intent and slots. This is similarly a single-domain dataset, but the entities are expressed more in terms of annotations and contain more metadata.                   |
| MultiWoZ | Multidomain      | Benchmark for dialog state tracking or joint determination of intent and slots that spans over multiple domains. For the similar reason of variability, modeling this dataset is more challenging than modeling single domain ones. |

### DIALOG ACT PREDICTION
In this notebook we demonstrate various CNN and RNN architectures for the task of intent detection on the ATIS dataset. The ATIS dataset is a standard benchmark dataset for the tast of intent detection. ATIS Stands for Airline Travel Information System. 

### Imports

In [4]:
#general imports
import os
import sys
import random
random.seed(0) #for reproducability of results

#basic imports
import numpy as np
import pandas as pd


#NN imports
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from keras.layers import Dense, Input, GlobalMaxPooling1D
from keras.layers import Conv1D, MaxPooling1D, Embedding, LSTM
from keras.models import Model, Sequential
from keras.initializers import Constant

#encoder
from sklearn.preprocessing import LabelEncoder

### Data Loading
We load the data with the help of a few functions from utils.py which is included in this repository's Ch6 folder under folder name "Data".

In [11]:
# Training Data
from data.utils import fetch_data, read_method

sents,labels,intents = fetch_data('data\data2\atis.train.w-intent.iob')

train_sentences = [" ".join(i) for i in sents]

train_texts = train_sentences
train_labels= intents.tolist()

vals = []

for i in range(len(train_labels)):
    if "#" in train_labels[i]:
        vals.append(i)
        
for i in vals[::-1]:
    train_labels.pop(i)
    train_texts.pop(i)

print ("Number of training sentences :",len(train_texts))
print ("Number of unique intents :",len(set(train_labels)))

for i in zip(train_texts[:5], train_labels[:5]):
    print(i)

KeyError: 'data\\data2\x07tis.train.w-intent.iob'

In [None]:
# Testing Data
from data.utils import fetch_data, read_method

sents,labels,intents = fetch_data('data/data2/atis.test.w-intent.iob')

test_sentences = [" ".join(i) for i in sents]

test_texts = test_sentences
test_labels = intents.tolist()

new_labels = set(test_labels) - set(train_labels)

vals = []

for i in range(len(test_labels)):
    if "#" in test_labels[i]:
        vals.append(i)
    elif test_labels[i] in new_labels:
        print(test_labels[i])
        vals.append(i)
        
for i in vals[::-1]:
    test_labels.pop(i)
    test_texts.pop(i)

print ("Number of testing sentences :",len(test_texts))
print ("Number of unique intents :",len(set(test_labels)))

for i in zip(test_texts[:5], test_labels[:5]):
    print(i)

### Data Preprocessing

In [None]:
MAX_SEQUENCE_LENGTH = 300
MAX_NUM_WORDS = 20000 
EMBEDDING_DIM = 100 
VALIDATION_SPLIT = 0.3

In [None]:
tokenizer = Tokenizer(num_words=MAX_NUM_WORDS)
tokenizer.fit_on_texts(train_texts)
train_sequences = tokenizer.texts_to_sequences(train_texts) #Converting text to a vector of word indexes
test_sequences = tokenizer.texts_to_sequences(test_texts)
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))

In [None]:
le = LabelEncoder()
le.fit(train_labels)
train_labels = le.transform(train_labels)
test_labels = le.transform(test_labels)

In [None]:
#Converting this to sequences to be fed into neural network. Max seq. len is 1000 as set earlier
 #initial padding of 0s, until vector is of size MAX_SEQUENCE_LENGTH
trainvalid_data = pad_sequences(train_sequences, maxlen=MAX_SEQUENCE_LENGTH)
test_data = pad_sequences(test_sequences, maxlen=MAX_SEQUENCE_LENGTH)
trainvalid_labels = to_categorical(train_labels)

test_labels = to_categorical(np.asarray(test_labels), num_classes= trainvalid_labels.shape[1])

# split the training data into a training set and a validation set
indices = np.arange(trainvalid_data.shape[0])
np.random.shuffle(indices)
trainvalid_data = trainvalid_data[indices]
trainvalid_labels = trainvalid_labels[indices]
num_validation_samples = int(VALIDATION_SPLIT * trainvalid_data.shape[0])
x_train = trainvalid_data[:-num_validation_samples]
y_train = trainvalid_labels[:-num_validation_samples]
x_val = trainvalid_data[-num_validation_samples:]
y_val = trainvalid_labels[-num_validation_samples:]
#This is the data we will use for CNN and RNN training
print('Splitting the train data into train and valid is done')

### Modeling
Embedding Matrix

We need to prepare our embedding.

In [None]:
print('Preparing embedding matrix.')

# first, build index mapping words in the embeddings set
# to their embedding vector

# Download GloVe 6B from here: https://nlp.stanford.edu/projects/glove/
BASE_DIR = 'Data'
GLOVE_DIR = os.path.join(BASE_DIR, 'glove.6B')

embeddings_index = {}
with open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt'), encoding="utf-8") as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = coefs

print('Found %s word vectors in Glove embeddings.' % len(embeddings_index))
#print(embeddings_index["google"])

# prepare embedding matrix - rows are the words from word_index, columns are the embeddings of that word from glove.
num_words = min(MAX_NUM_WORDS, len(word_index)) + 1
embedding_matrix = np.zeros((num_words, EMBEDDING_DIM))
for word, i in word_index.items():
    if i > MAX_NUM_WORDS:
        continue
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        # words not found in embedding index will be all-zeros.
        embedding_matrix[i] = embedding_vector

# load these pre-trained word embeddings into an Embedding layer
# note that we set trainable = False so as to keep the embeddings fixed
embedding_layer = Embedding(num_words,
                            EMBEDDING_DIM,
                            embeddings_initializer=Constant(embedding_matrix),
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=False)
print("Preparing of embedding matrix is done")

### CNN with Pre-Trained Embeddings

In [None]:
print('Define a 1D CNN model.')

cnnmodel = Sequential()
cnnmodel.add(embedding_layer)
cnnmodel.add(Conv1D(128, 5, activation='relu'))
cnnmodel.add(MaxPooling1D(5))
cnnmodel.add(Conv1D(128, 5, activation='relu'))
cnnmodel.add(MaxPooling1D(5))
cnnmodel.add(Conv1D(128, 5, activation='relu'))
cnnmodel.add(GlobalMaxPooling1D())
cnnmodel.add(Dense(128, activation='relu'))
cnnmodel.add(Dense(len(trainvalid_labels[0]), activation='softmax'))

cnnmodel.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['acc'])

cnnmodel.summary()

#Train the model. Tune to validation set. 
cnnmodel.fit(x_train, y_train,
          batch_size=128,
          epochs=1, validation_data=(x_val, y_val))
#Evaluate on test set:
score, acc = cnnmodel.evaluate(test_data, test_labels)
print('Test accuracy with CNN:', acc)

### CNN-Embedding Layer
Here, we train a CNN model with an embedding layer which is being trained on the fly instead of using the pre-trained embeddings.

In [None]:
print("Defining and training a CNN model, training embedding layer on the fly instead of using pre-trained embeddings")
cnnmodel = Sequential()
cnnmodel.add(Embedding(MAX_NUM_WORDS, 128))
cnnmodel.add(Conv1D(128, 5, activation='relu'))
cnnmodel.add(MaxPooling1D(5))
cnnmodel.add(Conv1D(128, 5, activation='relu'))
cnnmodel.add(MaxPooling1D(5))
cnnmodel.add(Conv1D(128, 5, activation='relu'))
cnnmodel.add(GlobalMaxPooling1D())
cnnmodel.add(Dense(128, activation='relu'))
cnnmodel.add(Dense(len(trainvalid_labels[0]), activation='softmax'))

cnnmodel.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['acc'])

cnnmodel.summary()

#Train the model. Tune to validation set. 
cnnmodel.fit(x_train, y_train,
          batch_size=128,
          epochs=1, validation_data=(x_val, y_val))
#Evaluate on test set:
score, acc = cnnmodel.evaluate(test_data, test_labels)
print('Test accuracy with CNN:', acc)

### RNN-Embedding Layer
Here, we train a RNN model with an embedding layer which is being trained on the fly instead of using the pre-trained embeddings.

In [None]:
print("Defining and training an LSTM model, training embedding layer on the fly")

#modified from: 

rnnmodel = Sequential()
rnnmodel.add(Embedding(MAX_NUM_WORDS, 128))
rnnmodel.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
rnnmodel.add(Dense(len(trainvalid_labels[0]), activation='sigmoid'))
rnnmodel.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

rnnmodel.summary()

print('Training the RNN')
rnnmodel.fit(x_train, y_train,
          batch_size=32,
          epochs=1,
          validation_data=(x_val, y_val))
score, acc = rnnmodel.evaluate(test_data, test_labels,
                            batch_size=32)
print('Test accuracy with RNN:', acc)

### LSTM with Pre-Trained Embeddings`

In [None]:
print("Defining and training an LSTM model, using pre-trained embedding layer")

#modified from: 

rnnmodel2 = Sequential()
rnnmodel2.add(embedding_layer)
rnnmodel2.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
rnnmodel2.add(Dense(len(trainvalid_labels[0]), activation='sigmoid'))
rnnmodel2.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

rnnmodel2.summary()

print('Training the RNN')
rnnmodel2.fit(x_train, y_train,
          batch_size=32,
          epochs=1,
          validation_data=(x_val, y_val))
score, acc = rnnmodel2.evaluate(test_data, test_labels,
                            batch_size=32)
print('Test accuracy with RNN:', acc)

# 6. Other Dialog Pipelines
## End-to-End Approach
we can build a chatbot using seq2seq models.
Imagine that the input of the model is the user utterance: a sequence of
words. As the output, it generates another sequence of words, which is
the response from the bot. Seq2seq models are end-to-end trainable, so
we don’t have to maintain multiple modules, and they are generally
LSTM based.

## Deep Reinforcement Learning for Dialogue Generation
You can see that the reinforcement learning–based model generated a
more diverse response instead of collapsing into a generic default
response.

![](images/reinforcement.png)
<center>Comparison of deep reinforcement learning and a seq2seq model</center>

## Human-in-the-Loop
The machine may
improve its performance if humans intervene in its learning process
and reward or penalize based on the correct or incorrect response.
These rewards or penalties act as feedback for the model.

# 7. Rasa NLU
![](images/rasa.png)
<center>Rasa chatbot interface and interactive learning framework</center>

- Context-based conversations
- Interactive learning
- Data annotation
- API integration
- Customize your models in Rasa

# 8. A Case Study: Recipe Recommendations
## Utilizing Existing Frameworks
We’ll start with Dialogflow, the cloud API. Before we start, we need to define
entities like we did before, such as ingredients, cuisine, calorie level,
cooking time. We can build an ontology for the cooking domain and
identify the number of slots we’d like our chatbot to support.