# **COM3029 Group Project**

## Project Description

This project aims to deliver a chatbot that will act as a conversational diary. The user will be able to add entries to their own digital diary through natural conversation with the chatbot. Additionally, the chatbot will allow the user to request details regarding previous entries in the diary. Each user will be personally identifiable to the chatbot by providing the chatbot with their name and a special phrase or word that will unlock their diary information.

The diary will store user information regarding places they visited, people they met and how they felt that day.

## **Q1- Model Serving Decisions**

In this section we will discuss the different model serving options that we researched, what we chose for the final implementation, and what led us to those choices.

### **Model Serving Options**

#### ***Model embedded in the app*** 

Model embedding is the most direct way to use a model in an application. By embedding the file that contains the model in the the application code, the application can directly interact with it, and fetch predictions on demand. This is a simple infrastructure as it easy to set up and allow the user to interact with the chatbot offline.

However this is not a very scalable approach as models can oftentimes be large files. We found that especially with transformer based model architecture, the model files are often over 400mb which would require a lot of bandwidth for initial setup for an end user. Additionally, this means that almost 1gb of data is transferred every time the model needs to be updated.

#### ***Model served as an API***

An alternative to embedding the model is wrapping a binary file around a microservice that includes features to make the model accessable to applications. This is when we can use a pickle or a dump of the python object of the model than can then be deserialised and exposed to an endpoint for applications to interact with. This means that despite the complexity of the model it can be saved and loaded in the same way.

#### ***Model serving choice***

We chose to serve the model as an API as it is a simple approach that we had experience with, and it would allow applications to interact with the model without having to embed large files into the application.

### **API Considerations**

For this project we wanted to deliver the project in such a way that it follows a realistic deployment process that would be appropriate for the delivery of a production level application. 

Various model serving options using an API approach were explored to determine the best way to deliver the application.

#### ***Django***

Django is a very well known framework for making full-stack web applications. It uses the REST framework to expose endpoints to clients. The REST framework includes endpoints GET and POST which can be used to send client requests to host.

However, high perfomance can be diffcult to achieve using Django as it has a significantly larger codebase than other solutions we explore. It also has a monolithic work flow that can complicate things as Django also includes many functions that are not necessary for a simple project.

#### ***FastAPI***

FastAPI is a fast, high performance web framework that allows developers to build APIss using python.

It is a good approach to use as it is offers a great approach for creating scalable products. It also provides an alternative to REST in GraphQL.

While REST is the de-facto standard for web APIs. It can cause request overfetching when multiple endpoints are created.

Comparatively, GraphQL is a query language that uses one endpoint and the return values are dependent on client requests.

As our project only required using two endpoints at most, GraphQL was not considered as necessary for our process.

#### ***Flask***

Flask is another web framework that can be classified as a micro-framework.

It is a light-weight approach that allows for simple protoypes to be made that enables rapid development. It is also easily extended to cover many use cases such as serving models from an endpoint. It uses REST to create endpoints for client requests to the server. Flask is considered the most policed and feature-rich micro framework.

#### ***Bottle***

Bottle is similar to Flask. The main difference to Flask is that it is only a wrapper around a server. It is not as extensible as Flask nor does it scale to include other modules that Flask can.

#### ***Final model serving approach***

As we decided to use REST API for creating our endpoints, we chose Flask as our API framework as we found that it easy to setup and develop on. Additionally, Flask contained enough functionality for our use case without extra bulk, as discussed.

## **Q2- Web Service and Architectural Choices**

In this section, we discuss the process of building a web service to host our chosen models for the chat bot.

For each component (intent classification, NER, dialogue flow, and the chatbot's response mechanisms) we detail what models were chosen, how responses or predictions are fetched by the web service, and how they interact with other components of the model when necessary.

### **Core components**

To begin with, we walk through the individual components and their implementations, starting with the dialogue flow manager, and this forms the basis of our chatbot functionality, followed by the implementation of our intent classifier, sentiment analyser, and the NER model.

#### **Dialogue Flow Manager**

We decided to implement a heuristics based approach for our dialogue flow manager (DFM) as it performed better than other attempts with AI models during the research stage. Our dialogue for our chatbot is controlled by a state machine that can use the intent classfier to determine a state change. The flow of the dialog can be viewed below.

![Dialog flow](images\dialogflow.jpg)

The DFM relies on the intent classifier to determine the intent of the user's message and uses this to set the state.

((WRITE MORE HERE PLEASE))

#### **Intent Classification Component**

The intent classifier determines what the user wants to do with the chatbot. The following intents are included in the intent classification model to be used to determine the state of the chatbot:

* greeting - any messages that indicate a user is saying hi to the bot
* yes - any messages that indicate a postive agreement
* no - messages that indicate a negative response
* goodbye - when a user indicates leaving the chat, exiting, or saying bye to the bot
* add_entry - custom intent that indicates a user wants to add a diary entry
* query_entry - any messages where the user wants to look back on previous entries

Additionally, there was an intent included for out-of-scope messages, such as "What is the time?" which are outside of the chatbot's intended use.

Any message that the user sends is passed to the intent classifier and is used by the dialogue flow manager to determine the chatbot's response.

##### Model choice, dataset, and training results

The intent classification model chosen for this application is a CNN based architecture (inspired by Yoon Kim's <a href="https://doi.org/10.48550/arXiv.1408.5882">TextCNN</a>) featuring fastText word embeddings. This model architecture was chosen as it performed well during research completed for the individual coursework component of this module. Additionally, this model is more lightweight thatn the corresponding BERT or transformer based models.

As intent classification will be performed on all messages, a lighter model was chosen over potentially more accurate but bulkier models. (Usage of transformer based models is demonstrated in the NER and Sentiment Classification components which are only called once when the user makes a diary entry, so efficiency was less important in those cases.)

We trained the intent classifier model on a custom intent dataset inspired by <a href="https://archive.ics.uci.edu/ml/datasets/CLINC150">CLINC150</a> with manual dataset entry generation for the add_entry and query_entry inents. (The .csv of intents can be found in the training_documentation folder).

The custom dataset contains over 1000 sample message entries and is visualised as below.

![Intent distribution](images\intents.png)

The code to train the model is provided in notebook format for reference purposes in the 'training_documentation/intent classifier' folder. The model created from training on the custom dataset performed well against the validation set of 10% of the dataset, reaching a validation accuracy of 96.23% and a loss of 0.0795. The intent classifier works sufficiently well at predicting the intent of a message. Graphs of the training loss and accuracy against validation loss and accuracy are included below.

![Accuracy graph](images\accuracy_intents.png) ![Loss graph](images\loss_intents.png)

The model is then saved in keras' legacy h5 format, and the label encoder and vectoriser used for training are pickled. These three components form the model and are saved in the intent_classifier folder to be used in the intent handler for the web service.

The file "intent_handler.py" loads the model, vectoriser, and label encoder, and provides a method to return the intent as a string when passed the user's input.

##### Loading the model


```python
#load model from folder
model = load_model("intent_classifier/intents.h5")

#unpickle vectoriser configs
from_disk = pickle.load(open("intent_classifier/vectoriser.pkl", "rb"))

#load vectoriser
vectoriser = TextVectorization.from_config(from_disk['config'])
vectoriser.adapt(tf.data.Dataset.from_tensor_slices(["xyz"]))  # call adapt on dummy data (necessary due to a keras bug)
vectoriser.set_weights(from_disk['weights'])
encoder = pickle.load(open("intent_classifier/encoder.pkl", "rb"))

#get the label mappings
class_names = encoder.classes_

#create the end-to-end pipeline model for intent prediction
input_str = tf.keras.Input(shape=(1,), dtype="string")
x = vectoriser(input_str)  # vectorize
output = model(x)
intent_classifier = tf.keras.Model(input_str, output)
```

##### Method to return intent

```python
def predict_intent(user_input):
    user_input = clean(user_input) # clean the text
    prediction = intent_classifier.predict([[user_input]]) # get model prediction
    intent = class_names[np.argmax(prediction[0])] # return label
    chatbot_logger.log_prediction("Predicting user intent", user_input, intent) # log
    return intent
```

##### Cleaning text


The pipeline additionally includes text cleaning as this was performed on the training data. We strip the punctuation and numbers from the user's message before passing it through the model to generate an intent.

```python
def clean(message):
    message = " ".join([word.lower() for word in message.split()])

    remove = str.maketrans((string.punctuation + '£' + string.digits), ' '*len((string.punctuation + '£' + string.digits)))
    result = message.translate(remove)
    return result
```

##### **Sentiment Classification Model**

We wanted to develop the chatbot to be able to classify user diary entries by the following emotions:

* happiness
* sadness
* anxiety
* anger
* neutral / no emotion

The chatbot will take a user's diary entry, analyse the content, and return an emotion to be saved as part of the diary entry.

##### Model choice, dataset, and training results

To implement the sentiment analysis component, we chose to fine-tune a BERT model using the HuggingFace Transformer's library, specifically the <a href="https://huggingface.co/bert-base-uncased">BERT base uncased</a> pre-trained model.

We chose to use BERT as sentiment analysis will be performed once at most per user, as mentioned previously. Additionally, BERT performed the best in predicting a diary entry's emotion during research and testing. Due to diary entries being longer than the standard response messages, fetching emotions from a diary entry is reliant on the context of words throughout the whole entry and BERT historically performs well on such tasks.

A dataset collated from three different sources was used for the training of this model. We used entries from <a href="http://yanran.li/dailydialog.html">DailyDialog</a>, <a href="https://www.site.uottawa.ca/~diana/resources/emotion_stimulus_data/">emotion simulus</a>, and <a href="https://github.com/sinmaniphel/py_isear_dataset">ISEAR</a>.

We preprocessed the datasets to cohesively combine them as the datasets were labelled differently (i.e.,"joy" vs "happy" vs "happiness" vs encoded emotion labels). We removed any emotions that were irrelevant, such as love, and saved this dataset for training BERT. Some of the emotions had a lot more entries than others, as shown in the distribution below. Neutral has over 800,000 entries compared to anxiety which only contained around 1,000 so we randomly undersampled the neutral and happiness classes to 3,000 each. (The final dataset can be found in "training_documentation/sentiment_analysis/emotions_final.csv").

![Emotions](images\emotions_dist.png)

The code for finetuning BERT on our custom dataset is provided in notebook format for reference in "training_documentation/sentiment_analysis". We split the dataset into training, validation, and test datasets, before pre-process text using HuggingFace's AutoTokenizer and AutoModelForSequenceClassification to finetune BERT.

The evaluation metrics from training were as follows:

```python 
{'eval_loss': 1.1812270879745483,
 'eval_accuracy': 0.8042553191489362,
 'eval_f1': 0.8043095167183316,
 'eval_runtime': 1.3743,
 'eval_samples_per_second': 342.003,
 'eval_steps_per_second': 21.83,
 'epoch': 10.0}
```

And test prediction metrics:
```python
{'test_loss': 1.189466953277588,
 'test_accuracy': 0.781021897810219,
 'test_f1': 0.7798639468951565,
 'test_runtime': 3.5763,
 'test_samples_per_second': 306.459,
 'test_steps_per_second': 19.293}
```
Once the model was finetuned on our data, the model generated can be saved and shared (for this project it is in the model folder of the root folder). It can be used to form an end-to-end pipeline to return a prediction of an emotion based on a user's input.

##### Loading the model

The file "sentiment_handler.py" of the webservice loads the model as a text classification pipeline and provides a method for the webservice to analyse text for an emotion.

```python
from transformers import pipeline

classifier = pipeline("text-classification", model='model') # load model in as a pipeline that returns the top prediction only
# the pipeline automatically cleans and tokenizes the text as required for BERT as well as making the prediction
```

##### Method to return emotion

```python
# dictionary of text emojis to represent emotions, could be replaced with real emojis if a GUI is implemented
moods = {
    "happy": ":D",
    "sad": ":C",
    "anxious": ":z",
    "angry": ">:C",
    "tired": "(z_Z)",
    "bored": ":|",
    "neutral": ":L"}


def predict_sentiment(user_input):
    sent_pred = classifier(user_input) # use pipeline to predict emotion
    sentiment = sent_pred[0]['label'] # return prediction as string
    chatbot_logger.log_prediction("Making Sentiment Prediction", user_input, sentiment) # log
    return sentiment


def get_emoticon(user_input):
    sentiment = predict_sentiment(user_input) # get emotion
    emoticon = moods[sentiment] # get emoticon related to emotion
    return sentiment, emoticon
```


The emotion is returned as a string to be stored with an entry.

One modification that needed to be made was to ensure the labels returned the correct emotions rather than an encoded label number. This required a one-time manual editing of the file "model/config.json" to change the id2label and label2id dictionaries to show an emotion rather than LABEL_03, for example.

#### **Named Entity Recognition**

Simple Transformers' [NERModel](https://simpletransformers.ai/docs/ner-model/) was used to train our name entity recognition component. The [conll2003](https://huggingface.co/datasets/conll2003) dataset as it focuses on language independednt named entity recognition which we found performed well for out chatbot domain where users may have non English names. As most of the data in conll2003 is from newspapers, we were able to extract date and time tokens, re-label them and train the model on. The model itself is a pretrained model of BERT (bert-based-cased). We found that this had a high accuracy in entity recognition and allowed returned preditions that were easy to process. 

### **Implementation**

To run the Flask app, run the command "python build_and_run.py" in terminal.

Requirements:
Python Version: 3.9.7

In [2]:
from chatbot import Chatbot

In [4]:
dear_bot = Chatbot("Dear Bot")
dear_bot.say_greeting()

{'response': 'Hey there! Have you used Dear Bot before?', 'state': 12}

In [5]:
dear_bot.get_response("no")

STATE.CHECK_IF_NEW
no


{'response': "No worries, let's create a profile for you. What's your name?",
 'state': 2}

## **Q3- Basic Functionality Testing**

Once the Flask server is running, a client can then send REST requests to the app to interact with the bot.
These functionalities can be tested in the `test_endpoints.ipynb`

## **Q4- Performance of Chatbot**

### **Size**

## **Q5- Basic Monitoring Capability**

For logging user input and the chatbot’s response. We created a wrapper for the built-in python logging module and extended the functionality with functions as needed to log various information.

#### Conversation logging

![log_conversation function](images\log_conversation.png)

`log_conversation` is used to log an interaction between the user and the chatbot.


##### Example conversation log

![Example conversation log](images\user_chatbot_interaction.png "Example conversation log")

#### State logging

![log_bot_state function](images\log_state.png)

`log_bot_state` can be used to log the dialog state of the chatbot.


##### Example state logging

![log_bot_state function](images\logged_state.png)

#### Prediction logging

![log_prediction function](images\log_prediction.png)

`log_prediction` is used to log any prediction the chatbot makes about user input such as intent and NER predictions.

##### Example prediction logging

***NER Logging***

![log ner](images\log_ner.png)

***Intent Logging***

![log ner](images\logged_intent.png)

## **Q6- CI/CD Build and Deployment**

To deploy this project, we will use a build management and continuous integration server software to host the server-side of the chatbot application. The build management software of choice is TeamCity by JetBrains. The codebase for the project itself will be stored on GitHub. The TeamCity server will have a project created, where the version control system (VCS) root will use the main branch for the project repository on GitHub.

![VCS Root in TeamCity](images/TeamCity_VCS.png "VCS Root in TeamCity")

Once the VCS Root has been set up to track the main, a build configuration will be created to automate the deployment of the server application. We will call the step, "run dear_bot", as this is the Python program that must be run to start the server program.

![Build Configuration in TeamCity](images/TeamCity_Build_Config.png "VCS Root in TeamCity")

We can then edit the build steps to execute a command of our choosing. This could be command line level input, or if the build agent supports it, it can be the direct execution of a file. For this project, the server will be deployed onto a local machine rather than on a machine on the cloud. It is known that this machine has the necessary Python dependencies for the execution of dear_bot.py, so the build step can be set to directly execute this file.

![Build Steps in TeamCity](images/build_steps.png "VCS Root in TeamCity")

Additionally, the requirements to be installed should are specified from a text file. This is so that if the server is deployed on a different machine, the required dependencies are installed automatically through TeamCity.

![Build Steps in TeamCity](images/requirements.png "VCS Root in TeamCity")

We can then run the build in TeamCity to deploy the server on the local machine.

![Build Steps in TeamCity](images/running_build.png "VCS Root in TeamCity")

## **Q7- Recording**

The recording can be found in the submission zip: 