## Conversational AI
### Introduction
Chatbots are computer programs designed to simulate human conversation, typically through a natural language interface. They are widely used to provide instant responses to routine queries and are often deployed in customer-facing roles such as website help desks, online banking support, or mobile apps. One of their major advantages is that they operate continuously 24 hours a day, 7 days a week, meaning users can receive support at any time without waiting for a human operator.

From a business perspective, chatbots help reduce the burden on human support teams by handling frequently asked questions and directing users to relevant resources. This allows customer service staff to focus their attention on more complex cases that require empathy, discretion, or nuanced understanding. Studies have also shown that, if designed effectively, chatbots can enhance customer satisfaction and loyalty, particularly when users feel a sense of personality, or brand alignment in the chatbot's tone and behaviour.

Although the terms *chatbot* and *conversational AI* are often used interchangeably, they refer to distinct approaches. The difference lies primarily in the level of sophistication and adaptability of the system:

- *Chatbots* in their traditional form, are rule-based systems. They rely on a series of predefined decision trees or "if-this-then-that" logic to guide their responses. These rules are hand-coded by developers and typically require frequent manual updates to handle new queries or refine their performance. You'll often see this type of chatbot appear on websites as a pop-up interface, prompting visitors to ask questions, or choose from a limited set of options.

- *Conversational AI* on the other hand, refers to systems that use advanced natural language processing (NLP) and machine learning (ML) to understand and respond to user input. Rather than matching exact keywords to pre-written responses, conversational AI can interpret meaning from phrasing, context, and user intent. This makes the system more flexible and capable of responding to questions phrased in different ways, including variations it may not have seen before.

One of the key strengths of conversational AI is that we can improve it over time through exposure to real world queries from users to create more training data. When the system fails to correctly identify a user's request, that interaction can be logged and reviewed. Annotating these cases with the correct *intent* allows the model to learn from its mistakes and update its understanding without reprogramming the entire logic manually. This data-driven approach makes conversational AI systems much more scalable and adaptable in dynamic real-world settings.

In this practical, we will explore the creation of a simple chatbot demonstrating how a basic conversational AI model can be trained to understand and respond to user questions. Rather than using hand-coded rules, we will develop a small neural network that learns patterns from training data consisting of questions and their corresponding intents.

As a creative example, we will train the model using phrases that a user might pose to *Jane Austen*, a renowned British novelist, about her life, works, and legacy. This will showcase how conversational AI can be used in educational or cultural settings, beyond commercial applications.

For comparison, we will also provide a second dataset representing a more traditional use case: a business-oriented chatbot designed to handle enquiries related to services, pricing, and support. 

### The dataset

The dataset defines the types of questions or prompts the model can recognise and respond to. The training data is composed of two elements: *patterns* and *intents*:

- *Patterns* are example phrases or sentences that a user might say. These are written in natural language and represent the many ways a person could express a given request - even the same request with different vocabulary.
- Each pattern is linked to a specific *intent*, which is a label that categorises what the user is trying to achieve, such as greeting the assistant, asking a question, or making a request.

For this practical, we have created two custom datasets. We'll start by using the *Jane Austen dataset* (`austen_intents.json`) as our default. This data is currently used to train a neural network intended for use in an immersive exhibition, where the responses and 3D visuals will be projected via a holographic display. It provides a creative, cultural example based on the life of *Jane Austen*, which was hand-constructed using factual information gathered from public sources such as Wikipedia. This dataset is intended to simulate the kinds of questions one might pose to a virtual assistant portraying the author, for example, as part of an educational, or museum installation.

We have also included a more typical business-oriented dataset (`business_intents.json`). This simulates a chatbot that might be deployed by a company to automate responses to high-volume enquiries. Many large organisations already use such systems to help users find answers quickly without involving a human agent. This is a hand-curated dataset built around common user enquiries such as opening hours, payment methods, or account issues.  

Here is a sample of the structure of the training data we have, written in JSON format:

```json
{
  "intents": [
    {
      "tag": "greeting",
      "patterns": ["Hi", "Is anyone there?", "Hello", "Good day"],
      "responses": ["Hello, thanks for visiting", "Hi there, how can I help?"]
    },
    {
      "tag": "thanks",
      "patterns": ["Thanks", "Thank you", "That's helpful"],
      "responses": ["Happy to help!", "Any time!", "My pleasure"]
    },
    {
      "tag": "bot_or_not",
      "patterns": ["Is this a bot?", "Are you a machine?", "Am I talking to a person?"],
      "responses": ["That's for me to know, and you to find out.", "I'd really rather not say."]
    }
  ]
}
```
Each intent has a `tag` (its name), a list of `patterns` (word input examples), and a set of `responses` (pre-written replies the bot/we can choose from). When the model is trained on this data, it learns to associate different phrases with their respective intent labels. During inference, when a user types a message, the model attempts to classify it into one of these known intents and then selects an appropriate response. To add some variety to the responses we provide several possible responses in a list, we can then randomly select one of them to present to the user. This makes the chatbot come across as smarter than it actually is -  a simple trick we can employ with our chatbots.

### Installing Python libraries

In [None]:
!python -m pip install tensorflow numpy --user

To switch between datasets, you can simply comment or uncomment the relevant filename in the code below that loads the data, allowing you to experiment with both applications of conversational AI:

In [None]:
import json

# To run the Jane Austen conversation AI use this file:
intent_file = 'austin_intents.json'

# To run the business version uncomment this line:

# intent_file = 'business_intents.json'

with open('./data/' + intent_file) as json_data:
    intents = json.load(json_data)

print(intents)

### Preprocessing
We start by gathering all the sentences (patterns) from the dataset and converting them to lower case. This normalisation step ensures that words like "Hello" and "hello" are treated the same. We also remove punctuation such as question marks, which can introduce unnecessary variation in how the model sees different patterns.

As we go through the data, we also compile a list of all the individual words used across the entire dataset. This list forms the model's vocabulary, which is the set of known words it can recognise when making predictions. This vocabulary defines the "language" of our conversational AI and therefore its extent limits the words it will be able to understand and classify.

Each word will be converted into a numerical format, such as a bag-of-words vector, so that it can be fed into a neural network. The same goes for the intent labels, which are transformed into numerical class indices for classification.

Our preprocessing step extracts and cleans the phrases and intents from the dataset, prepares the input through tokenisation, and the output for training. We have then built up a complete vocabulary that defines the scope of the assistant's knowledge:

In [None]:
import re
import numpy as np

ignore_words = ["?", "the", "and" "a"]

# Define our own custom tokeniser for simplicity
def simple_tokenise(text):
    """
    Tokenise a sentence into words. Very simple, but you can expand on this.
    Removes punctuation (except apostrophes) and splits on whitespace.
    """
    text = text.lower()
    
    # Keep apostrophes inside words (e.g. "don't") but remove other punctuation
    text = re.sub(r"[^\w\s']", '', text)
    return text.split()

words = [] # The full vocabulary of all words across patterns.
documents = [] # A list of (tokenised_pattern, intent) pairs.
classes = []   # A list of all unique intent labels.

# 'intents' is our loaded JSON dictionary
for intent in intents['intents']:
    for pattern in intent['patterns']:
        # tokenise
        w = simple_tokenise(pattern)
        # Add to our vocabulary
        words.extend(w)

        # add to documents in our corpus
        documents.append((w, intent['tag']))
        # add to our classes list
        if intent['tag'] not in classes:
            classes.append(intent['tag'])

# Lower case each word and remove duplicates
words = [w.lower() for w in words if w not in ignore_words]
words = sorted(list(set(words)))

# Remove duplicates
classes = sorted(list(set(classes)))

print (len(documents), "documents")
print (len(classes), "intent classes", classes)

# All words in our vocabulary
print (len(words), "unique stemmed words", words)

It's important to note that our approach is *domain-specific*. The chatbot will only be able to understand and respond to questions that fall within the scope of the topics it has been trained on. For example, a chatbot trained to answer questions about Jane Austen will not know how to respond to questions about weather forecasts or sports scores. General-purpose conversational agents, like Siri or Alexa, require vastly larger datasets and more complex models to handle the wide variety of topics and language variations they encounter in real-world use.

Before we can train our neural network, we convert our raw input, the user phrases and their corresponding intents, into a format the model can understand. This involves transforming each sentence into a *bag of words* vector, and each intent label into a *one-hot encoded* target vector.

The bag of words approach works here by checking which known words from our vocabulary appear in a given sentence. For each word in the vocabulary, we place a `1` in the vector if the word is present in the sentence, and a `0` if it is not. This creates a fixed-length binary vector for every training pattern, where the position of each element corresponds to a specific word in the vocabulary.

Alongside this, we create a one-hot vector for the *output class*. This is a list of zeros the same length as the number of intent labels, with a `1` in the position corresponding to the correct intent for that sentence. Together, the input vectors and their matching output vectors form our full training dataset.

Once this has been done for all training examples, we shuffle the dataset to remove any unintended ordering effects and then separate the data into two arrays containing the input vectors (`train_X`), and one containing the corresponding output labels (`train_Y`). These are then converted to arrays, which are the standard format expected by most deep learning frameworks:

In [None]:
import numpy as np
import random

training = []  # This will hold our full training dataset
output_empty = [0] * len(classes)  # A zeroed template for one-hot encoding output labels

# Loop through each training example
for doc in documents:
    bag = []  # Initialise an empty bag-of-words vector for this sentence
    pattern_words = doc[0]  # Get the tokenised words from the current pattern

    # For each word in the vocabulary, set 1 if it appears in the pattern, else 0
    for w in words:
        bag.append(1 if w in pattern_words else 0)

    # Create a one-hot encoded vector for the intent (output label)
    output_row = list(output_empty)  # Start with all zeros
    output_row[classes.index(doc[1])] = 1  # Set a 1 at the index of the correct intent

    # Add the (input_vector, output_vector) pair to the training set
    training.append((bag, output_row))

# Randomly shuffle the training set to prevent bias during training
random.shuffle(training)

# Split the data into input (X) and output (Y) sets
train_X = [item[0] for item in training]  # Feature vectors (bag-of-words)
train_Y = [item[1] for item in training]  # One-hot intent labels

# Convert lists to NumPy arrays (required by many ML libraries)
train_X = np.array(train_X)
train_Y = np.array(train_Y)

# Shape of our data
print(len(train_X), len(train_Y))

### The model
The model we are building is a feedforward neural network (also known as a multilayer perceptron), which is well-suited to basic classification tasks such as identifying user intents from short phrases. It consists of an input layer, a couple of hidden layers, and an output layer.

Each input is a bag-of-words vector representing the words found in the user's question. The output is a probability distribution across the possible intent labels, calculated using a softmax activation function, which allows the model to choose the most likely intent for each input sentence.

We will use TensorFlow's `Sequential` model, then specify how many units each layer has, what activation functions to use (in this case, *ReLU* for the hidden layers and *softmax* for the output), and how the model should learn (using categorical cross-entropy loss and the Adam optimiser). We also inspect a summary of the model's shape and parameters, which helps us verify that it has been set up correctly.

This simple network is a solid baseline for intent classification and can be extended later with more layers, different activation functions, or other improvements such as dropout for regularisation if we need them:

In [None]:
import tensorflow as tf

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

# Define the model
model = Sequential()

# Input layer with 8 hidden units
model.add(Dense(8, input_shape=(len(train_X[0]),), activation='relu'))

# Second hidden layer with 8 units
model.add(Dense(8, activation='relu'))

# Output layer with softmax activation for classification
model.add(Dense(len(train_Y[0]), activation='softmax'))

# Compile the model
model.compile(optimizer=Adam(learning_rate=0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])


# Summary of the model
model.summary()

### Train the model
Once the model has been defined, the next step is to train it on our prepared dataset. Training involves feeding the model examples (input vectors and their corresponding intent labels) and allowing it to adjust its internal weights to minimise error.

In this example, we specify the number of epochs and the batch size (how many examples are processed at once before updating the model). We also set `verbose=1` to display the progress of training in the console, showing the loss and accuracy after each epoch.

After training is complete, we save the model to a file (`model.keras`). This allows us to reuse the trained model later. For example, when deploying it in a chatbot application, without having to retrain it each time:

In [None]:
num_epochs = 100

# Fit the model and store training history
history = model.fit(train_X, train_Y,
                    epochs=num_epochs,
                    batch_size=8,
                    verbose=1)

# Save the trained model
model.save('model.keras')

### Evaluation
We have not created test data in this practical so we can only evaluate how training went. Initially, the model began with a moderate accuracy of around 64% and a loss slightly above 1.10. This suggests it had already learned some basic patterns in the data, likely due to good model initialisation or helpful pre-processing. Over the first 20 epochs, we observe a consistent decline in the training loss, paired with a corresponding increase in accuracy, reaching above 75% by epoch 25. This early phase indicates that the model was effectively learning the underlying structure of the problem and adjusting its parameters meaningfully.

As training continued through epochs 30 to 60, the model sustained this positive trend. Accuracy frequently rose above 85% and eventually passed 90%, with loss steadily decreasing to values well below 0.5. This middle period reflects a strong learning phase, where the model refined its decision boundaries and reduced its prediction errors. Around epochs 40–60, there were some fluctuations in loss and accuracy, which is expected in longer training cycles, possibly due to the optimiser navigating more complex regions of the parameter space or slight overfitting on certain mini-batches.

In the final third of training, from epoch 70 onward, the model demonstrated impressive consistency and performance. Accuracy peaked at nearly 99% (epoch 88 and 99), and loss dropped as low as 0.1274 by the end. Such performance on training data suggests that the model has converged successfully and has very likely learned to generalise the training patterns with high precision. However, without a validation or test set, it's difficult to determine whether the model is overfitting. Still, the steady improvement and stable accuracy in the later epochs are strong indicators that the model is behaving well and would likely perform effectively on unseen data.

We can now visualise how the model performed during training by plotting our two key metrics over time; accuracy and loss. The plot shows how the model's accuracy improved over each training epoch. This gives an indication of how well the model is learning to classify the correct intents. The training loss reflects how far off the model's predictions were from the correct answers. Ideally, this should decrease over time:

In [None]:
import matplotlib.pyplot as plt

# Plot training accuracy and loss over epochs
plt.figure(figsize=(15, 6))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Metric')
plt.title('Training Metrics Over Epochs')
plt.legend()
plt.show()

# Show the plots
plt.tight_layout()
plt.show()

### Prediction

The final stage is to make predictions based on the natural language of the user. We first define some helper functions, which will convert the users sentences into a bag of words represented by a vector of 0's and 1s showing the presence of the word (or not) in out vocabulary:, then we can predict the intent and select a suitable response at random:

In [None]:
import re
import numpy as np
import random

# Preprocess the input sentence by tokenising it
def clean_up_sentence(sentence):
    return simple_tokenise(sentence)  # Custom tokeniser function used earlier in our pipeline

# Convert a tokenised sentence into a bag-of-words (BoW) vector
# This creates a binary vector where each position corresponds to a known word:
# 1 if the word is present in the input sentence, 0 otherwise
def bow(sentence, words):
    sentence_words = clean_up_sentence(sentence)
    bag = [1 if w in sentence_words else 0 for w in words]
    return np.array(bag)

# Predict the class label for an input sentence and return a matching response
def predict(sentence):
    # Convert the input sentence into a bag-of-words vector
    word_vec = bow(sentence, words)

    # Run the vector through the trained model to obtain class probabilities
    pred_y = model.predict(np.array([word_vec]))[0]  # Predict returns a batch, take first element

    # Find the class with the highest predicted probability
    high_prob = max(pred_y)
    idx = np.argmax(pred_y)
    intent_label = classes[idx]  # Get the corresponding intent tag from the class list

    # Search for the corresponding intent and pick a response at random
    for intent in intents['intents']:
        if intent['tag'] == intent_label:
            answer = random.choice(intent['responses'])

            # Print out the prediction and response
            print(">>", sentence)
            print("Predicted class:", intent_label)
            print("Response:", answer)
            print()
            return


### Jane Austen

This example assumes you trained the model on the ```austin_intents.json``` file. If you have trained the model on the ```business_intents.json``` file then you will see some interesting responses, which will demonstrate how conversational AIs fail when posed a question that they have not been designed or trained for.  In any case, let's have a conversaton with Jane Austen...:

In [None]:
predict("Hi there!")
predict("Do I know you?")
predict("Are you famous?")
predict("when were you born?")
predict("Am I talking to a robot?")
predict("Well, good bye then...")

### Customer query

This example assumes you trained the model on the ```business_intents.json``` file.  If you used the ```austin_intents.json```, then as mentioned you might see some bizarre responses.  Otherwise, this might be how a customer interacts with a chat bot designed to address common customer queries:

In [None]:
predict("Hi")
predict("When do you open?")
predict("Can I pay by card?")
predict("Am I talking to a robot?")
predict("I'd like to close my account.")
predict("Well, good bye then...")


### What have we learnt?

Now that you've seen how to run the chatbot using two example intent files, one creative and one business-focused, you're encouraged to experiment further by creating your own version. You can do this by editing or replacing the intent file with your own set of *patterns*, *intent labels*, and *responses*, tailored to a topic or context that interests you.

This kind of intent-based chatbot is flexible and adaptable. In a real-world scenario, you might build a training dataset using real email exchanges, support tickets, or help desk queries. Doing so would allow you to train a chatbot that can automatically handle routine customer service interactions, saving time and resources while improving response times.

Interestingly, this technology also raises questions that go beyond simple automation. For example, Microsoft has filed a <a href="https://pdfpiw.uspto.gov/.piw?PageNum=0&docid=10853717&IDKey=&HomeUrl=%2F" target="_blank">patent</a> for a chatbot that can simulate the personality and responses of a real person, living or deceased, based on their digital footprint. This opens the door to new and thought-provoking uses of conversational AI, from virtual memorials to personalised education, but also invites ethical questions about identity, consent, and digital legacy.

So if you're building a practical assistant or a creative prototype, this practical shows how powerful and accessible conversational AI has become. If you wanted to create your own chatbot, you can simply replace the dataset with your own and it does not have to be that big to be functional as you have seen.
