### LangChain: Memory

#### Outline
- ConversationBufferMemory
- ConversationBufferWindowMemory
- ConversationTokenBufferMemory
- ConversationSummaryMemory

When we interact with LLM models, they don't remember what you say before or any of the previous conversation. This is an issue when you're building some applications like Chatbot and we want to have a conversation with them. 

Thus, memory becomes important, which is basically how we remember previous parts of the conversation and feed that into 
the language model so that they can have this conversational flow as you're interacting with them. 

LangChain offers multiple sophisticated options of managing these memories. 

In [1]:
import os
import openai

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

import warnings
warnings.filterwarnings('ignore')

# openai.api_key = os.environ['OPENAI_API_KEY']

### ConversationBufferMemory

In [2]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

We will start with an example for memory, using LangChain to manage a chat or a chatbot conversation. 
We will set the LLM as a chat interface of OpenAI with temperature 0 and use the memory as a ConversationBufferMemory.
Here, we will build a conversation chain

In [9]:
llm = ChatOpenAI(temperature=0.0)
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

We start with a conversation and provide an input to conversation.predict. Based on the input, conversation chain provides a reply.

In [10]:
conversation.predict(input="Hi, my name is Onkar")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Onkar
AI:[0m

[1m> Finished chain.[0m


"Hello Onkar! It's nice to meet you. How can I assist you today?"

In [11]:
conversation.predict(input="What is 1+1?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Onkar
AI: Hello Onkar! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI:[0m

[1m> Finished chain.[0m


'1+1 is equal to 2.'

In [12]:
conversation.predict(input="What is my name?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Onkar
AI: Hello Onkar! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI: 1+1 is equal to 2.
Human: What is my name?
AI:[0m

[1m> Finished chain.[0m


'Your name is Onkar.'

In [14]:
# momory.buffer stores the conversation that has happened so far
print(memory.buffer)

Human: Hi, my name is Onkar
AI: Hello Onkar! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI: 1+1 is equal to 2.
Human: What is my name?
AI: Your name is Onkar.


In [15]:
memory.load_memory_variables({})

{'history': "Human: Hi, my name is Onkar\nAI: Hello Onkar! It's nice to meet you. How can I assist you today?\nHuman: What is 1+1?\nAI: 1+1 is equal to 2.\nHuman: What is my name?\nAI: Your name is Onkar."}

LangChain stores the conversation that has happened so far using ConversationBufferMemory. This includes everything that the AI or 
the human has said so far.


If we want to use ConversationBufferMemory to specify a couple of inputs and outputs, we add new things to the memory if you wish to do so explicitly. 

In [16]:
memory = ConversationBufferMemory()

In [17]:
memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})

In [18]:
print(memory.buffer)

Human: Hi
AI: What's up


In [21]:
# Status of memory at this point of time
memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up"}

We can add additional data to the memory by saving additional context. 

In [22]:
memory.save_context({"input": "Not much, just hanging"}, 
                    {"output": "Cool"})

In [24]:
# Status of memory at this point of time. Langchain has remembered this much conversation so far
memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool"}

When we use a large language model for a chat conversation, the large language model 
itself is actually stateless. The language model itself does not remember the conversation 
you've had so far. And each transaction, each call to the API endpoint is independent. 

Chatbots appear to have memory only because there's usually rapid code that provides the full conversation that's 
been had so far as context to the LLM. And so, the memory can store explicitly the terms or the utterances so far.

This memory storage is used as input or additional context to the LLM so that they can generate an output as if it's 
just having the next conversational turn, knowing what's been said before. 

As the conversation becomes long, the amounts of memory needed becomes really, really long and does the cost of sending a lot of tokens to the LLM, which usually charges based on the number of tokens it needs to process, will also become more expensive. 

So LangChain provides several convenient kinds of memory to store and accumulate the conversation. 

### ConversationBufferWindowMemory

We discussed ConversationBufferMemory till now. We will now look at a different type of memory named, conversation buffer window 
memory that only keeps a window of memory.

In [26]:
from langchain.memory import ConversationBufferWindowMemory

In [27]:
memory = ConversationBufferWindowMemory(k=1)               

In [28]:
memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})


In [29]:
memory.load_memory_variables({})

{'history': 'Human: Not much, just hanging\nAI: Cool'}

We have set memory to conversational buffer window memory with k equals one, thus one conversational exchange is remebered. 
Thus one utterance from me and one utterance from chatbot is saved from memory. Thus, the output of memory.load_memory_variables({}) drops the context first context saved and remembers only the latest context saved.

In [34]:
llm = ChatOpenAI(temperature=0.0)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

In [35]:
conversation.predict(input="Hi, my name is Onkar")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Onkar
AI:[0m

[1m> Finished chain.[0m


"Hello Onkar! It's nice to meet you. How can I assist you today?"

In [36]:
conversation.predict(input="What is 1+1?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Onkar
AI: Hello Onkar! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI:[0m

[1m> Finished chain.[0m


'1+1 is equal to 2.'

In [37]:
conversation.predict(input="What is my name?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: What is 1+1?
AI: 1+1 is equal to 2.
Human: What is my name?
AI:[0m

[1m> Finished chain.[0m


"I'm sorry, but I don't have access to personal information about individuals unless it has been shared with me in the course of our conversation."

Since, k =1, llm was not able to answer my last question as it remembers only the last exchange and it has forgotten the early exchange as K is set to 1

In practice, we dont use k equals one. Instead, we set k  to a larger number. 
This prevents the memory from growing without limit as the conversation goes longer.

### Conversation Token Buffer Memory

With conversational token buffer memory, the memory will limit the number of tokens saved. Since, LLM pricing is based on 
tokens, this maps more directly to the cost of the LLM calls. 

So, if we will set the max token limit to 50, 
and actually let me inject a few comments. 

In [38]:
from langchain.memory import ConversationTokenBufferMemory
from langchain.llms import OpenAI
llm = ChatOpenAI(temperature=0.0)

In [43]:
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=30)
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

In [44]:
memory.load_memory_variables({})

{'history': 'AI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

If we use token limit as 30, llm removes earlier parts of conversation and retains the number of tokens 
corresponding to the most recent exchanges but subject to not exceeding the token limit. 

Thus, with max_token_limit = 30, we get initial part of conversation removed.

But, if we increase the token limit to 100, we are able to get full conversation. 

### ConversationSummaryMemory

Till now, we set memory to a fixed number of tokens based on the most recent utterances or a fixed number 
of conversational exchanges. Now, we we will use an LLM to write a summary of conversation that has happened so far and let
that be as the memory. 


In [45]:
from langchain.memory import ConversationSummaryBufferMemory

In [53]:
# create a long string
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})

In [54]:
memory.load_memory_variables({})

{'history': 'System: The human and AI exchange greetings. The human asks about the schedule for the day. The AI provides a detailed schedule, including a meeting with the product team, work on the LangChain project, and a lunch meeting with a customer interested in AI. The AI emphasizes the importance of bringing a laptop to showcase the latest LLM demo during the lunch meeting.'}

Here, if we keep max_token_limitas 400, we are able to store all text. But if we reduce max_token_limit to 100,  then conversation sumamry buffer memory has used an an llm, which is an OpenAI end point here, to generate a summary of conversation

Now, we will have a conversation using the llm by creating a conversation chain. We will ask with an input as "What would be a good demo to show?

In [56]:

conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

In [57]:
conversation.predict(input="What would be a good demo to show?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
System: The human and AI exchange greetings. The human asks about the schedule for the day. The AI provides a detailed schedule, including a meeting with the product team, work on the LangChain project, and a lunch meeting with a customer interested in AI. The AI emphasizes the importance of bringing a laptop to showcase the latest LLM demo during the lunch meeting.
Human: What would be a good demo to show?
AI:[0m

[1m> Finished chain.[0m


'A good demo to show during the lunch meeting with the customer interested in AI would be the latest LLM (Language Model) demo. The LLM is a cutting-edge AI model that can generate human-like text based on a given prompt. It has been trained on a vast amount of data and can generate coherent and contextually relevant responses. By showcasing the LLM demo, you can demonstrate the capabilities of AI in natural language processing and how it can be applied to various industries and use cases.'

In [58]:
memory.load_memory_variables({})

{'history': 'System: The human and AI exchange greetings. The human asks about the schedule for the day. The AI provides a detailed schedule, including a meeting with the product team, work on the LangChain project, and a lunch meeting with a customer interested in AI. The AI emphasizes the importance of bringing a laptop to showcase the latest LLM demo during the lunch meeting. A good demo to show during the lunch meeting with the customer interested in AI would be the latest LLM (Language Model) demo. The LLM is a cutting-edge AI model that can generate human-like text based on a given prompt. It has been trained on a vast amount of data and can generate coherent and contextually relevant responses. By showcasing the LLM demo, you can demonstrate the capabilities of AI in natural language processing and how it can be applied to various industries and use cases.'}

By setting, verbose=True, we can see the llm prompt as well for the conversation chain.

Here, AI reposnse has been incorporated with AI and the question has been added as part of system message.

Conversation summary buffer memory tries to keep the explicit storage of the messages up to max_token_limit. 
So, in this case, the explicit storage has been capped at 100 tokens and nothing beyond that. 
 
LLM will generate a summary for any thing beyond 100 tokens. 

We used these different memories using a chat as an example, these memories can be used for other applications too.  
We can use these memories in situations where we might get new snippets of text or new information where we want to keep total memory used to store this growing list of facts capped and not growing arbitrarily long. 

We saw a memories such as buffer memories that limits based on number of 
conversation exchanges or tokens or a memory that can summarize tokens above a certain 
limit. 

LangChain actually supports additional memory types as well. 
One of the most powerful is vector data memory, in which case, vector database is used to store word embeddings and/or text embeddings.
In case of vector data memory, we can retrieve the most relevant blocks of text using this type of a vector database for its memory. 
LangChain also supports entity memories, which is applicable when you wanted to remember details about specific people, specific other entities, such as if you talk about a specific friend, you can have LangChain remember facts about that friend, which would be 
an entity in an explicit way. 

We can use multiple types of memories such as conversation memory + entity memory to recall individuals using langchain.
So this way we can remember a summary of a conversation plus an explicit way of storing important facts about important people in 
the conversation. 

Further, in addition to using these memory types, we can store the entire conversation in the conventional database such as a key value store or SQL database. 