How to Add Conversation Memory to Your Chatbot
==============================================

A tutorial for adding conversation memory to a chatbot.

## What is it?

This article demonstrates how to take an existing chatbot that can answer questions from a document of private knowledge and enhance it to allow for full conversation, back and forth, between the chatbot and user. The chatbot will maintain context of the conversation, remembering what the user has previously asked, and what responses the chatbot has previously replied with.

The chatbot uses an AI large language model (LLM) with retrieval augmented generation (RAG) to answer questions from a PDF or HTML file. By enhancing it with a history of the conversation, the user can ask continuous questions without having to repeat details of the subject.

## What can it do?

The chatbot capable of remembering the history of a conversation is built upon the existing [project](https://github.com/primaryobjects/chatbot/blob/main/chatbot.ipynb) that creates an LLM-based chatbot from a private knowledge-base.

The origial [chatbot](https://github.com/primaryobjects/chatbot/blob/main/chatbot.ipynb) could answer one-off questions from content within the ingested document. However, if the user were to ask follow-up questions, the chatbot would not maintain a history and would perform an entirely new search in the document to try and answer the question. If specific keywords were missing from the query, the chatbot would be unlikely to give a suitable response.

This enhanced version of the chatbot works by maintaining a conversation history using an in-memory list. The memory list could also be adapted to use a distributed memory, such as [Redis](https://pypi.org/project/redis/). At each round of interaction with the chatbot, we store the user's query, the context knowledge found within the document, and the chatbot's response. This list of converation items is provided in the prompt to the chatbot as additional context during each subsequent query in the conversation.

## How does it work?

The chatbot enhanced with conversation memory works using the following steps.

1. Use the original chatbot to load a PDF or HTML file of private knowledge.
2. The user asks a query.
3. Retrieve a search result from the document that best matches the user's query to use as context.
4. The LLM uses the context to answer the question and respond to the user.
5. Store the user's query, context, and LLM response in a list.
6. Upon subsequent queries, include the entire conversation list as additional context.

Note, the prompt will grow larger at each round of the conversation, due to the expanding length of the conversation history. It's important to keep note of the length of the conversation and the maximum token size that the LLM model can maintain.

## Setting up the chatbot

We begin by defining the methods for the original LLM chatbot. The methods include processing a PDF or HTML file and loading into a knowledge-base, searching the document for a best matching result to the user's query, and calling the LLM for a response while using the search result as context.

In [40]:
%%capture
%pip install numpy pandas scikit-learn nltk PyPDF2 Cohere
from dotenv import load_dotenv
load_dotenv(override=True)

NUMBER_OF_MATCHES = 3 # Number of matching items to provide as context to the AI
CHUNK_SIZE = 99999 # Size of each matching item (99999 = entire document as context)

import os
import nltk
import cohere
import requests
from nltk.tokenize import sent_tokenize
from nltk.stem import PorterStemmer
from PyPDF2 import PdfReader
from bs4 import BeautifulSoup

nltk.download('punkt')
ps = PorterStemmer()

def process_text(text, chunk_size=CHUNK_SIZE):
    sentences = sent_tokenize(text)
    original_chunks = []
    processed_chunks = []
    chunk = ""
    for sentence in sentences:
        if len(chunk) + len(sentence) > chunk_size:
            original_chunks.append(chunk)
            processed_chunks.append(' '.join([ps.stem(word) for word in chunk.split()]))
            chunk = sentence
        else:
            chunk += " " + sentence
    if chunk:
        original_chunks.append(chunk)
        processed_chunks.append(' '.join([ps.stem(word) for word in chunk.split()]))
    return original_chunks, processed_chunks

def read_pdf(file_path):
    with open(file_path, 'rb') as file:
        reader = PdfReader(file)
        text = ''
        for page in reader.pages:
            text += page.extract_text()
    return process_text(text)

def read_html(file_path):
    with open(file_path, 'r') as file:
        soup = BeautifulSoup(file, 'html.parser')
        text = soup.get_text()
        return process_text(text)

def read_txt(file_path):
    with open(file_path, 'r') as file:
        text = file.read()
        return process_text(text)

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
documents = []  # This will hold all processed documents
original_documents = []  # This will hold all original documents
vectors = None

def process_and_add_document(file_path, file_type):
    if file_type == 'pdf':
        original_chunks, processed_chunks = read_pdf(file_path)
    elif file_type == 'html':
        original_chunks, processed_chunks = read_html(file_path)
    elif file_type == 'txt':
        original_chunks, processed_chunks = read_txt(file_path)
    else:
        raise ValueError('Unsupported file type')
    
    original_documents.extend(original_chunks)  # Store the original text chunks
    vectors = add_document(processed_chunks)
    return vectors

def add_document(text):
    documents.extend(text)
    vectors = vectorizer.fit_transform(documents)
    return vectors

def find_best_matches(query, top_n=NUMBER_OF_MATCHES):
    query_processed = process_text(query)[1]  # Get the processed version of the query
    query_vector = vectorizer.transform(query_processed)
    similarities = (query_vector * vectors.T).toarray()
    best_match_indices = similarities.argsort()[0][-top_n:][::-1]  # Get the indices of the top N matches
    return [original_documents[i] for i in best_match_indices], [documents[i] for i in best_match_indices]

co = cohere.ClientV2(os.getenv('COHERE_API_KEY'))

def get_cohere_response(query, context):
    messages = [
        {"role": "system", "content": "You are an AI assistant. Use the provided context to answer the user's query accurately in a short and concise response. Do not generate information that is not present in the context. If the context does not contain the answer, inform the user that the information is not available."},
        {"role": "system", "content": context},
        {"role": "user", "content": query}
    ]

    response = co.chat(
        model='command-r-plus-08-2024',
        messages=messages
    )
    return response.message.content[0].text.strip()

def reset_database():
    global documents, original_documents, vectors
    documents = []
    original_documents = []
    vectors = None

def initialize(file_name):
    file_type = file_name.split('.')[-1]
    return process_and_add_document(file_name, file_type)

def process_chat(user_query, is_debug = False):
    original_best_matches, processed_best_matches = find_best_matches(user_query)
    context = "\n\n".join(original_best_matches)  # Concatenate the top 3 best matches as context
    if is_debug:
        print(f"Context: {context}")
    response = get_cohere_response(user_query, context)
    return response, context

def chat(user_query, is_debug = False):
    return process_chat(user_query, is_debug)[0]

# Download the sample files from the provided URLs.
def download_sample_files():
    sample_files = [
        {
            "url": "https://www.ipcc.ch/report/ar6/wg1/downloads/outreach/IPCC_AR6_WGI_SummaryForAll.pdf",
            "file_name": "climatechange.pdf"
        },
        {
            "url": "https://medium.com/illumination/i-tried-10-decaf-coffees-as-a-first-time-coffee-drinker-heres-what-i-found-a8c5fb93a40e",
            "file_name": "coffee.html"
        }
    ]

    for file in sample_files:
        response = requests.get(file["url"])
        with open(file["file_name"], 'wb') as f:
            # Save the file to the same directory as the executing script.

            f.write(response.content)

    # Return a list of file names.
    return [file["file_name"] for file in sample_files]

# Initialize the chatbot.
file_names = download_sample_files()

## Running the original chatbot

Let's see an example of the original chatbot answering questions in a conversation. Notice how when the subject matter is included in the query, a suitable response is returned from the LLM. However, when subsequent questions are asked about the original topic, the LLM has no prior knowledge and is unable to answer the questions correctly.

The original chatbot, quite simply, forgets the conversation topic!

In [41]:
# Setup chatbot.
reset_database()
vectors = initialize('climatechange.pdf')

# Ask a conversation of questions on the topic of authors.
print(chat('Who are the first 5 authors from the report?'))

The first five authors of the report are:
1. Govindasamy Bala
2. Deliang Chen
3. Tamsin Edwards
4. Sandro Fuzzi
5. Thian Yew Gan


## Asking follow-up questions

Next, let's ask follow-up questions on the same topic as you would in a natural conversation. We won't mention the topic "authors" again as we want the chatbot to remember the history of the conversation. Of course, since we have not added memory yet, it won't remember!

In [42]:
print(chat('Which one is the first?'))
print(chat('Which one is last?'))

The first item in the list is the Intergovernmental Panel on Climate Change (IPCC).
The last item in the list is "Thank you to everyone who contributed to this summary."


## What went wrong?

In the above example, you can see how we asked three continuous questions about the authors of the document.

The first question, explicitly mentioned "authors" as the subject of the query. A suitable response was provided that listed all of the authors found in the document. However, the second and third questions continued upon the same topic. The user asked which was the first, followed by how many were there in total - without ever mentioning the topic "author". Since the original 

## Enhancing the chatbot with memory

We can update the chatbot to include a memory of the conversation by saving the user's query, context, and LLM response at each round. We can later load the conversation and include it as additional context in the prompt to the LLM.

Including within the prompt the prior conversation effectively gives the LLM a memory!

## Adding to the conversation

We can define a new method for saving the current query from the user as an item in the conversation. Each saved item will include the query, context, and LLM response.

In [43]:
# Initialize an empty list to store the conversation history
conversation_history = []

def add_to_conversation(user_input, search_result_context, llm_response):
    """
    Adds a conversation entry to the conversation history.

    Parameters:
    user_input (str): The user's input.
    search_result_context (str): The context found from the search results.
    llm_response (str): The response from the LLM.
    """
    conversation_entry = {
        "user_input": user_input,
        "search_result_context": search_result_context,
        "llm_response": llm_response
    }
    conversation_history.append(conversation_entry)

## Saving the conversation

Next, we can define new enhanced chat method that saves the conversation using the method that we've just defined.

In [44]:
def conversation(user_query, is_debug = False):
    # Append the conversation history to a single string to be used in the LLM prompt context.
    history = "\n\n=========================\n\n".join([f"User: {entry['user_input']}\nSearch Result Context: {entry['search_result_context']}\nLLM Response: {entry['llm_response']}" for entry in conversation_history])

    # Format the query to include the entire conversation history.
    query = f"This is the entire conversation up to this point. Use this as additional context when answering the question from the user.\n CONTEXT: {history}\n\n"
    query += f"\n\n=========================\n\n USER QUERY: {user_query}"

    # Get the response from the LLM.
    response, context = process_chat(query, is_debug)

    # Add the conversation entry to the history.
    add_to_conversation(user_query, context, response)

    return response


## Let's try that again

We can now run the chatbot enhanced with a conversation memory, and see how it can answer follow-up questions in the conversation.

In [45]:
conversation_history = []

# Ask a conversation of questions on the topic of authors.
print(conversation('Who are the first 5 authors from the report?'))
print(conversation('Which one is the first?'))
print(conversation('Which one is fifth?'))

The first five authors of the report are:
1. Govindasamy Bala
2. Deliang Chen
3. Tamsin Edwards
4. Sandro Fuzzi
5. Thian Yew Gan
Govindasamy Bala is the first author of the report.
The fifth author is Thian Yew Gan.


## Verifying the conversation history

To verify the chatbot is actually keep track of the conversation, let's ask it what the first question was. The chatbot must use the conversation memory as context in order to answer this question.

In [46]:
print(conversation('What was the first question that I asked?'))

The first question you asked was, "Who are the first 5 authors from the report?"


## Remembering the topic

Does the chatbot remember the topic of the conversation?

In [47]:
print(conversation('What is the topic of this conversation?'))

The topic of this conversation is the first five authors of the report.
