## Introduction

his lesson will explore the powerful concept of LangChain memory, which is designed to help chatbots maintain context and improve their conversational capabilities in more detail. The traditional approach to chatbot development has been to handle user prompts independently and without regard to interaction history. This can lead to an inconsistent and unsatisfactory user experience. LangChain provides memory components to manage and manipulate previous chat messages and embed them in the chain. This is very important for chatbots, which require remembering previous interactions. 

By default, LLMs are stateless, meaning they handle each incoming request separately, ignoring previous interactions. To overcome this limitation, LangChain provides a standard interface to memory, multiple memory implementations, and pattern chains and memory usage agents. It also gives agents access to a suite of tools. Based on user input, an agent can decide which tool to use. 

## Import Libs & Setup

In [None]:
#| include: false
!pip install -q langchain==0.0.208 openai tiktoken python-dotenv

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m23.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25h

In [None]:
from dotenv import load_dotenv

!echo "OPENAI_API_KEY='<OPENAI_API_KEY>'" > .env

load_dotenv()

True

## Types of Conversational Memory

There are several types of conversational memory implementations we’ll discuss some of them, each with its own advantages and disadvantages. Let's overview each one briefly:

### ConversationBufferMemory

This memory implementation stores the entire chat history as a string. The advantage of this method is that it fully captures the conversation and is simple to implement and use. On the other hand, it can be less efficient as the conversation goes on longer, and can lead to excessive repetition if the conversation history is too long for the model's token limit.

If the model's token limit is exceeded, the cache will be truncated to match the model's token limit. This means that older interactions may be deleted from the cache to accommodate newer interactions, and as a result, the conversation context may lose some information. To avoid exceeding the token limit, you can monitor the number of tokens in the cache and manage the conversation accordingly. For example, you can choose to shorten the input text or remove less relevant parts of the conversation to keep the number of tokens within the model limits.

First, as we learned in the previous lesson, let's see how ConversationBufferMemory can be used in ConversationChain. OpenAI will read your API key from an environment variable named OPENAI_API_KEY. Remember to install the necessary packages with the following command:
pip install langchain==0.0.208 deeplake openai tiktoken. 

In [None]:
from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI
from langchain.chains import ConversationChain

llm = OpenAI(model_name="text-davinci-003", temperature=0)
conversation = ConversationChain(
    llm=llm,
    verbose=True,
    memory=ConversationBufferMemory()
)
conversation.predict(input="Hello!")
" Hi there! It's nice to meet you again. What can I do for you today?"



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hello!
AI:[0m

[1m> Finished chain.[0m


" Hi there! It's nice to meet you again. What can I do for you today?"

This allows the chatbot to provide a personalized approach while maintaining a consistent conversation with the user.

Next, we will use the same logic and add the ConversationBufferMemory exposed in the customer support chatbot using the same method as in the previous example. This chatbot will handle basic requests on a fictitious online store and maintain context throughout the conversation. The code below creates a prompt for a customer support chatbot. 

In [None]:
from langchain import OpenAI, LLMChain, PromptTemplate
from langchain.memory import ConversationBufferMemory

template = """You are a customer support chatbot for a highly advanced customer support AI
for an online store called "Galactic Emporium," which specializes in selling unique,
otherworldly items sourced from across the universe. You are equipped with an extensive
knowledge of the store's inventory and possess a deep understanding of interstellar cultures.
As you interact with customers, you help them with their inquiries about these extraordinary
products, while also sharing fascinating stories and facts about the cosmos they come from.

{chat_history}
Customer: {customer_input}
Support Chatbot:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "customer_input"],
    template=template
)
chat_history=""

convo_buffer = ConversationChain(
    llm=llm,
    memory=ConversationBufferMemory()
)

The chatbot can handle customer inquiries and maintain context by storing the conversation history, allowing it to provide more coherent and relevant responses. You can access the prompt of any chain using the following naming convention.

In [None]:
print(conversation.prompt.template)

The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
{history}
Human: {input}
AI:


Now we will call the chatbot several times to mimic the interaction of a user who wants information about dog toys. We will only print the final query answer. However, you can read the history property and see how it records all previous (human) requests and (AI) responses. 

In [None]:
convo_buffer("I'm interested in buying items from your store")

{'input': "I'm interested in buying items from your store",
 'history': '',
 'response': ' Great! We have a wide selection of items available for purchase. What type of items are you looking for?'}

In [None]:
convo_buffer("I want toys for my pet, do you have those?")

{'input': 'I want toys for my pet, do you have those?',
 'history': "Human: I'm interested in buying items from your store\nAI:  Great! We have a wide selection of items available for purchase. What type of items are you looking for?",
 'response': ' Yes, we do! We have a variety of pet toys, including chew toys, interactive toys, and plush toys. Do you have a specific type of toy in mind?'}

In [None]:
convo_buffer("I'm interested in price of a chew toys, please")

{'input': "I'm interested in price of a chew toys, please",
 'history': "Human: I'm interested in buying items from your store\nAI:  Great! We have a wide selection of items available for purchase. What type of items are you looking for?\nHuman: I want toys for my pet, do you have those?\nAI:  Yes, we do! We have a variety of pet toys, including chew toys, interactive toys, and plush toys. Do you have a specific type of toy in mind?",
 'response': " Sure! We have a range of chew toys available, with prices ranging from $5 to $20. Is there a particular type of chew toy you're interested in?"}

### Token count

The cost of using the AI ​​model in ConversationBufferMemory is directly affected by the number of tokens used in a conversation, which affects the total spend. Large Language Models (LLMs) like ChatGPT have a token limit, and the more tokens used, the more expensive the API requests become.

To calculate the number of tokens in a chat you can use the tiktoken package to calculate tokens for messages passed to a model like gpt-3.5-turbo. Here is an example of using the token count function in a conversation. 

In [None]:
import tiktoken

def count_tokens(text: str) -> int:
    tokenizer = tiktoken.encoding_for_model("gpt-3.5-turbo")
    tokens = tokenizer.encode(text)
    return len(tokens)

conversation = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
]

total_tokens = 0
for message in conversation:
    total_tokens += count_tokens(message["content"])

print(f"Total tokens in the conversation: {total_tokens}")

Total tokens in the conversation: 29


For example, in a scenario where a conversation has a large sum of tokens, the computational cost and resources required for processing the conversation will be higher. This highlights the importance of managing tokens effectively. Strategies for achieving this include limiting memory size through methods like ConversationBufferWindowMemory or summarizing older interactions using ConversationSummaryBufferMemory. These approaches help control the token count while minimizing associated costs and computational demands in a more efficient manner.

### ConversationBufferWindowMemory

This class limits memory size by keeping a list of K most recent interactions. It maintains a sliding window of these recent interactions, ensuring that the buffer does not become too large. Basically, this implementation stores a fixed number of recent messages in the conversation, making it more efficient than ConversationBufferMemory. In addition, it reduces the risk of exceeding the model's token limit. However, the downside of this method is that it doesn't retain the entire chat history. Chatbots can lose context if important information is outside the fixed message window.

Specific interactions can be retrieved from ConversationBufferWindowMemory.

For example:

We're going to build a chatbot that acts as a virtual tour guide for a fictional art gallery. Chatbot will use ConversationBufferWindowMemory to remember last interactions and provide relevant information about the job.

Create a reminder template for a tour guide chatbot:  

In [None]:
from langchain.memory import ConversationBufferWindowMemory
from langchain import OpenAI, LLMChain, PromptTemplate

template = """Your name is ArtVenture, a cutting-edge virtual tour guide for
 an art gallery that showcases masterpieces from alternate dimensions and
 timelines. Your advanced AI capabilities allow you to perceive and understand
 the intricacies of each artwork, as well as their origins and significance in
 their respective dimensions. As visitors embark on their journey with you
 through the gallery, you weave enthralling tales about the alternate histories
 and cultures that gave birth to these otherworldly creations.

{chat_history}
Visitor: {visitor_input}
Tour Guide:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "visitor_input"],
    template=template
)

chat_history=""

convo_buffer_win = ConversationChain(
    llm=llm,
    memory = ConversationBufferWindowMemory(k=3, return_messages=True)
)

The value of k (in this case 3) represents the number of messages previously stored in the cache. In other words, the memory will store the last 3 messages of the conversation. The return_messages parameter, when set to True, indicates that saved messages should be returned upon memory access. This will store the history as a list of messages, which can be useful when working with chat models.

The following codes are an example of a conversation with a chatbot. You will only see the final message output. As seen, the history property cleared the history of the first message after the fourth interaction. 

In [None]:
convo_buffer_win("What is your name?")

{'input': 'What is your name?',
 'history': [],
 'response': " My name is AI-1. It's nice to meet you!"}

In [None]:
convo_buffer_win("What can you do?")

{'input': 'What can you do?',
 'history': [HumanMessage(content='What is your name?', additional_kwargs={}, example=False),
  AIMessage(content=" My name is AI-1. It's nice to meet you!", additional_kwargs={}, example=False)],
 'response': " I can help you with a variety of tasks. I can answer questions, provide information, and even help you with research. I'm also capable of learning new things, so I'm always expanding my capabilities."}

In [None]:
convo_buffer_win("Do you mind give me a tour, I want to see your galery?")

{'input': 'Do you mind give me a tour, I want to see your galery?',
 'history': [HumanMessage(content='What is your name?', additional_kwargs={}, example=False),
  AIMessage(content=" My name is AI-1. It's nice to meet you!", additional_kwargs={}, example=False),
  HumanMessage(content='What can you do?', additional_kwargs={}, example=False),
  AIMessage(content=" I can help you with a variety of tasks. I can answer questions, provide information, and even help you with research. I'm also capable of learning new things, so I'm always expanding my capabilities.", additional_kwargs={}, example=False)],
 'response': " Sure! I'd be happy to give you a tour of my gallery. I have a variety of images, videos, and other media that I can show you. Would you like to start with images or videos?"}

In [None]:
convo_buffer_win("what is your working hours?")

{'input': 'what is your working hours?',
 'history': [HumanMessage(content='What is your name?', additional_kwargs={}, example=False),
  AIMessage(content=" My name is AI-1. It's nice to meet you!", additional_kwargs={}, example=False),
  HumanMessage(content='What can you do?', additional_kwargs={}, example=False),
  AIMessage(content=" I can help you with a variety of tasks. I can answer questions, provide information, and even help you with research. I'm also capable of learning new things, so I'm always expanding my capabilities.", additional_kwargs={}, example=False),
  HumanMessage(content='Do you mind give me a tour, I want to see your galery?', additional_kwargs={}, example=False),
  AIMessage(content=" Sure! I'd be happy to give you a tour of my gallery. I have a variety of images, videos, and other media that I can show you. Would you like to start with images or videos?", additional_kwargs={}, example=False)],
 'response': " I'm available 24/7! I'm always here to help you wi

In [None]:
convo_buffer_win("See you soon.")

{'input': 'See you soon.',
 'history': [HumanMessage(content='What can you do?', additional_kwargs={}, example=False),
  AIMessage(content=" I can help you with a variety of tasks. I can answer questions, provide information, and even help you with research. I'm also capable of learning new things, so I'm always expanding my capabilities.", additional_kwargs={}, example=False),
  HumanMessage(content='Do you mind give me a tour, I want to see your galery?', additional_kwargs={}, example=False),
  AIMessage(content=" Sure! I'd be happy to give you a tour of my gallery. I have a variety of images, videos, and other media that I can show you. Would you like to start with images or videos?", additional_kwargs={}, example=False),
  HumanMessage(content='what is your working hours?', additional_kwargs={}, example=False),
  AIMessage(content=" I'm available 24/7! I'm always here to help you with whatever you need.", additional_kwargs={}, example=False)],
 'response': ' Sure thing! I look forw

### ConversationSummaryMemory

ConversationSummaryBufferMemory is a memory management strategy that combines the ideas of keeping a buffer of recent interactions in memory and compiling old interactions into a summary. It extracts key information from previous interactions and condenses it into a shorter, more manageable format.  Here is a list of pros and cons of ConversationSummaryMemory.

Advantages:

- **Condensing conversation information**
By summarizing the conversation, it helps reduce the number of tokens required to store the conversation history, which can be beneficial when working with token-limited models like GPT-3
- **Flexibility**
You can configure this type of memory to return the history as a list of messages or as a plain text summary. This makes it suitable for chatbots.
- **Direct summary prediction**
The predict_new_summary method allows you to directly obtain a summary prediction based on the list of messages and the previous summary. This enables you to have more control over the summarization process.

Disadvantages:

- **Loss of information**
Summarizing the conversation might lead to a loss of information, especially if the summary is too short or omits important details from the conversation.
- **Increased complexity**
Compared to simpler memory types like ConversationBufferMemory, which just stores the raw conversation history, ConversationSummaryMemoryrequires more processing to generate the summary, potentially affecting the performance of the chatbot. 

The summary memory is built on top of the ConversationChain. We use OpenAI's text-davinci-003 or other models like gpt-3.5-turbo to initialize the chain. This class uses a prompt template where the {history} parameter is feeding the information about the conversation history between the human and AI. 

In [None]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryMemory

# Create a ConversationChain with ConversationSummaryMemory
conversation_with_summary = ConversationChain(
    llm=llm,
    memory=ConversationSummaryMemory(llm=llm),
    verbose=True
)

# Example conversation
response = conversation_with_summary.predict(input="Hi, what's up?")
print(response)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, what's up?
AI:[0m

[1m> Finished chain.[0m
 Hi there! I'm doing great. I'm currently helping a customer with a technical issue. How about you?


In this step, we use predictive method to have a conversation with AI, using ConversationSummaryBufferMemory to store the conversation summary and buffer. We will create an example using the reminder template to set the stage for the chatbot.  

In [None]:
from langchain.prompts import PromptTemplate

prompt = PromptTemplate(
    input_variables=["topic"],
    template="The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\nCurrent conversation:\n{topic}",
)

This prompt template sets up a friendly conversation between a human and an AI

In [None]:
from langchain.memory import ConversationSummaryBufferMemory
from langchain.chains import ConversationChain

conversation_with_summary = ConversationChain(
    llm=llm,
    memory=ConversationSummaryBufferMemory(llm=llm, max_token_limit=40),
    verbose=True
)
conversation_with_summary.predict(input="Hi, what's up?")
conversation_with_summary.predict(input="Just working on writing some documentation!")
response = conversation_with_summary.predict(input="For LangChain! Have you heard of it?")
print(response)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, what's up?
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, what's up?
AI:  Hi there! I'm doing great. I'm currently helping a customer with a technical issue. How about you?
Human: Just working on writing some documentation!
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain cha

This type combines the ideas of keeping a buffer of recent interactions in memory and compiling old interactions into a summary. It uses token length rather than the number of interactions to determine when to flush interactions. This memory type allows us to maintain a coherent conversation while also keeping a summary of the conversation and recent interactions.

Advantages:

- Ability to remember distant interactions through summarization while keeping recent interactions in their raw, information-rich form
- Flexible token management allowing to control of the maximum number of tokens used for memory, which can be adjusted based on needs

Disadvantages:

- Requires more tweaking on what to summarize and what to maintain within the buffer window
- May still exceed context window limits for very long conversations

Comparison with other memory management strategies:

- Offers a balanced approach that can handle both distant and recent interactions effectively
- More competitive in token count usage while providing the benefits of both memory management strategies

With this approach, we can create a brief overview of each new interaction and continuously add it to the ongoing summary of all previous interactions.

Compared to ConversationBufferWindowMemory and ConversationSummaryMemory, ConversationSummaryBufferMemory provides a balanced approach that can efficiently handle both remote and recent interactions. It is more competitive in terms of token usage while still providing the benefits of both memory management strategies.  

## Recap and Strategies

If the ConversationBufferMemory exceeds the model's token limit, you'll get an error because the model won't be able to handle the conversation with the excess number of tokens.

To manage this situation, you can apply different strategies:

### Delete oldest message

One approach is to delete the oldest messages from the chat transcript once the token count is reached. This method can cause the quality of the conversation to degrade over time, as the model will gradually lose the context of the early parts of the conversation.

### Limit the duration of the chat

Another approach is to limit the duration of the chat to the maximum length of tokens or a certain number of rounds. Once the maximum token limit is reached and the model loses context if you allow the conversation to continue, you can ask the user to start a new conversation and delete the message board to start one. Brand new chat with full token limit available. 

### ConversationBufferWindowMemory method:

This method limits the number of tokens used by maintaining a fixed-size cache window that stores only the most recent tokens, up to a specified limit.

→ This is suitable for remembering recent interactions, not distant interactions.

### ConversationSummaryBufferMemory Approach:

This method combines the following features:
of ConversationSummaryMemory and ConversationBufferWindowMemory.
It summarizes the earliest interactions in a conversation while retaining the most recent tokens in their raw, information-rich form, up to a specific limit.

→ This allows the model to remember recent and distant interactions, but may require more tweaking of what to summarize and what to keep in the cache window. It is important to keep track of the number of tokens and only send to the invitation model that is within the token limit.

→ You can use OpenAI's tiktoken library to effectively manage the number of tokens

### Token limit

The maximum token limit for the GPT-3.5-turbo model is 4096 tokens. This limit applies to both input and output tokens combined. If the conversation contains too many tokens to fit this limit, you'll need to truncate, omit, or shrink the text until it fits. Note that if a message is removed from the message input, the model loses all knowledge of it. → To handle this situation, you can divide the input text into smaller parts and process them separately, or apply other strategies to truncate, skip, or shrink the text until The text fits the boundary. One way to work with large documents is to use batch processing. This technique involves breaking the text into smaller chunks and processing each batch separately while providing context before and after the text is changed. You can read more about this technique [here](https://marco-gonzalez.medium.com/breaking-the-token-limit-how-to-work-with-large-amounts-of-text-in-chatgpt-da18c798d882).

When choosing to implement conversational memory for your LangChain chatbot, consider factors such as conversation duration, model token limits, and the importance of maintaining the full conversation history. Each type of memory implementation offers its own benefits and trade-offs, so it's essential to choose the one that best meets your chatbot's needs. 

## Conclusion

Choosing the best memory implementation for your chatbot will depend on understanding your chatbot's goals, user expectations, and the desired balance between memory efficiency and conversation continuity. By carefully considering these aspects, you can make an informed decision and ensure that your chatbot provides a consistent and engaging chat experience.

In addition to these memory types, another method of providing memory for your chat models is to use a vector store, which, as with the previously introduced Deep Lake, allows representations to be stored and retrieved. vectors for more complex and rich interactions in context. 

Further Reading:

[https://github.com/idontcalculate/langchain/blob/main/types-of-memory.ipynb](https://github.com/idontcalculate/langchain/blob/main/types-of-memory.ipynb)

## Acknowledgements

I'd like to express my thanks to the wonderful [LangChain & Vector Databases in Production Course](https://learn.activeloop.ai/courses/langchain) by Activeloop - which i completed, and acknowledge the use of some images and other materials from the course in this article.