# LangChain: Memory

## Outline
* ConversationBufferMemory
* ConversationBufferWindowMemory
* ConversationTokenBufferMemory
* ConversationSummaryMemory

When you interact with these models, naturally they don't remember what you say before or any of the previous conversation, which is an issue when you're building some applications like chatbot and you want to have a conversation with them. And so in this section we'll cover memory, which is basically how do you remember previous parts of the conversation and feed that into the language model so that they can have this conversational flow as you're interacting with them.

Langchain offers multiple sophisticated options for managing these memories. 

## ConversationBufferMemory

In [None]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

import warnings
warnings.filterwarnings('ignore')

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory


In [None]:
llm = ChatOpenAI(temperature=0.0)
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

In [None]:
conversation.predict(input="Hi, my name is Andrew")

*OUTPUT*

"Hello Andrew! It's nice to meet you. How can I assist you today?"

If you want, you can change this verbose variable to True to see what Langchain is actually doing when you run predict. 

In [None]:
llm = ChatOpenAI(temperature=0.0)
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

In [None]:
conversation.predict(input="What is 1+1?")

*OUTPUT*
```
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Andrew
AI: Hello Andrew! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI:

> Finished chain.
'1+1 is equal to 2.'
```

In [None]:
conversation.predict(input="What is my name?")

*OUTPUT*
```
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Andrew
AI: Hello Andrew! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI: 1+1 is equal to 2.
Human: What is my name?
AI:

> Finished chain.
'Your name is Andrew.'
```

And notice that by the time I'm uttering the memory or the history of the conversation gets longer and longer.

In [None]:
print(memory.buffer)

*OUTPUT*
```
Human: Hi, my name is Andrew
AI: Hello Andrew! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI: 1+1 is equal to 2.
Human: What is my name?
AI: Your name is Andrew.
```

In [None]:
memory.load_memory_variables({})
# The curly braces here is actually an empty dictionary.

*OUTPUT*
```
{'history': "Human: Hi, my name is Andrew\nAI: Hello Andrew! It's nice to meet you. How can I assist you today?\nHuman: What is 1+1?\nAI: 1+1 is equal to 2.\nHuman: What is my name?\nAI: Your name is Andrew."}
```

In [None]:
memory = ConversationBufferMemory()

In [None]:
memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})

In [None]:
print(memory.buffer)

*OUTPUT*
```
Human: Hi
AI: What's up
```

In [None]:
memory.load_memory_variables({})

*OUTPUT*
```
{'history': "Human: Hi\nAI: What's up"}
```

In [None]:
memory.save_context({"input": "Not much, just hanging"}, 
                    {"output": "Cool"})

In [None]:
memory.load_memory_variables({})

*OUTPUT*
```
{'history': "Human: Hi\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool"}
```

When you use a large language model for a chat conversation, the launch language model itself is actually stateless. The language model itself does not remember the conversation you've had so far. And each transaction, each call to the API endpoint is independent. And chatbot is a period of memory only because there's usually wrapper code that provides the full conversation that's been had so far as context to the LM. And so the memory can store explicitly the turns or the utterances so far.

![Memory](immagini/05_memory.png)

The cost of sending a lot of tokens to the LM, which usually charges based on the number of tokens it needs to process, will also become more expensive. So Langchain provides several convenient kinds of memory to store and accumulate the conversation. Different type of memory:

## ConversationBufferWindowMemory

I'm going to import the conversation buffer window memory that only keeps a window of memory. And if I set memory to conversational buffer window memory with __k equals one__, the variable k equals one specifies that I wanted to remember just one conversational exchange. That is, one utterance from me and one utterance from the chat bot.

In [None]:
from langchain.memory import ConversationBufferWindowMemory

In [None]:
memory = ConversationBufferWindowMemory(k=1)               

In [None]:
memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})


In [None]:
memory.load_memory_variables({})

*OUTPUT*
```
{'history': 'Human: Not much, just hanging\nAI: Cool'}
```

In [None]:
llm = ChatOpenAI(temperature=0.0)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

In practice, you probably won't use this with K equals one. You use this with K set to a larger number. But still this prevents the memory from growing without limit as the conversation goes longer. 

In [None]:
conversation.predict(input="Hi, my name is Andrew")

*OUTPUT*
```
"Hello Andrew! It's nice to meet you. How can I assist you today?"
```

In [None]:
conversation.predict(input="What is 1+1?")

*OUTPUT*
```
'1+1 is equal to 2.'
```

In [None]:
conversation.predict(input="What is my name?")

*OUTPUT*
```
"I'm sorry, but I don't have access to personal information."
```

With the conversational token buffer memory, the memory will limit the number of tokens saved. And because a lot of LM pricing is based on tokens, this maps more directly to the cost of the LM calls.

## ConversationTokenBufferMemory

In [None]:
#!pip install tiktoken

In [None]:
from langchain.memory import ConversationTokenBufferMemory
from langchain.llms import OpenAI
llm = ChatOpenAI(temperature=0.0)

In [None]:
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=30)
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

In [None]:
memory.load_memory_variables({})

*OUTPUT*
```
{'history': 'AI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}
```

Has the whole conversation signed with AI as what if I decrease it? Then it chops off the earlier parts of this conversation to retain the number of tokens corresponding to the most recent exchanges, but subject to not exceeding the token limit.

llm=llm, this tells it to use the way of counting tokens that the chat OpenAI uses.

In [None]:
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

In [None]:
memory.load_memory_variables({})

*OUTPUT*
```
{'history': 'Human: AI is what?!\nAI: Amazing!\nHuman: Backpropagation is what?\nAI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}
```

## ConversationSummaryMemory

The conversation summary buffer memory: the idea is, instead of limiting the memory to fixed number of tokens based on the most recent utterances or a fixed number of conversational exchanges. Let's use an llm to write a summary of the conversation so far and let that be the memory.

In [None]:
from langchain.memory import ConversationSummaryBufferMemory

In [None]:
# create a long string
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=400)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})

In [None]:
memory.load_memory_variables({})

*OUTPUT*
```
{'history': "Human: Hello\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool\nHuman: What is on the schedule today?\nAI: There is a meeting at 8am with your product team. You will need your powerpoint presentation prepared. 9am-12pm have time to work on your LangChain project which will go quickly because Langchain is such a powerful tool. At Noon, lunch at the italian resturant with a customer who is driving from over an hour away to meet you to understand the latest in AI. Be sure to bring your laptop to show the latest LLM demo."}
```

Above --> Human & AI

And the response is that long schedule.So this memory now has quite a lot of text in it. In fact, let's take a look at the memory variables .It contains that entire piece of text because 400 tokens was sufficient to store all this text. But now if I were to reduce the max token limit, say to 100 tokens, remember this stores the entire conversational history. If I reduce the number of tokens to 100, then the conversation summary buffer memory has actually used an LM, the OpenAI endpoint in this case, because that's what we have set the llm, to actually generate a summary of the conversation so far.

Below --> System

In [None]:
# create a long string
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})

In [None]:
memory.load_memory_variables({})

*OUTPUT*
```
{'history': 'System: The human and AI exchange greetings. The human asks about the schedule for the day. The AI provides a detailed schedule, including a meeting with the product team, work on the LangChain project, and a lunch meeting with a customer interested in AI. The AI emphasizes the importance of bringing a laptop to showcase the latest LLM demo during the lunch meeting.'}
```

In [None]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

In [None]:
conversation.predict(input="What would be a good demo to show?")

*OUTPUT*
```
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
System: The human and AI exchange greetings. The human asks about the schedule for the day. The AI provides a detailed schedule, including a meeting with the product team, work on the LangChain project, and a lunch meeting with a customer interested in AI. The AI emphasizes the importance of bringing a laptop to showcase the latest LLM demo during the lunch meeting.
Human: What would be a good demo to show?
AI:

> Finished chain.

'A good demo to show during the lunch meeting with the customer interested in AI would be the latest LLM (Language Model) demo. The LLM is a cutting-edge AI model that can generate human-like text based on a given prompt. It has been trained on a vast amount of data and can generate coherent and contextually relevant responses. By showcasing the LLM demo, you can demonstrate the capabilities of our AI technology and how it can be applied to various industries and use cases.'
```

In [None]:
memory.load_memory_variables({})

*OUTPUT*
```
{'history': 'System: The human and AI exchange greetings and discuss the schedule for the day. The AI provides a detailed schedule, including a meeting with the product team, work on the LangChain project, and a lunch meeting with a customer interested in AI. The AI emphasizes the importance of bringing a laptop to showcase the latest LLM demo during the lunch meeting. The human asks what would be a good demo to show, and the AI suggests showcasing the latest LLM (Language Model) demo. The LLM is a cutting-edge AI model that can generate human-like text based on a given prompt. It has been trained on a vast amount of data and can generate coherent and contextually relevant responses. By showcasing the LLM demo, the AI believes they can demonstrate the capabilities of their AI technology and how it can be applied to various industries and use cases.'}
```

In the video, different types of memory used in Langchain were discussed. These memory types include:

1. Buffer Memories: These memories have limitations based on the number of composition exchanges or tokens they can store. They can also summarize tokens above a certain limit.

2. Vector Data Memory: This is a powerful memory type that stores word embeddings or text embeddings. It allows the retrieval of relevant blocks of text using these embeddings.

3. Force Memory: The details of this memory type were not explained in the summary.

4. Entity Memories: These memories are used to remember specific details about individuals or entities. For example, facts about a specific friend can be stored as an entity in an explicit way.

When implementing applications using Langchain, developers can use multiple types of memories, combining conversation memory with entity memory. This allows the system to remember both a summary of the conversation and important facts about specific people involved.

Additionally, developers may choose to store the entire conversation in a conventional database (such as a key-value store or SQL database) for auditing purposes or further system improvement.