# LangChain: Memory

By default, LLMs do not remember anything. But we can achieve this via:
- ConversationBufferMemory
- ConversationBufferWindowMemory
- ConversationTokenBufferMemory
- ConversationSummaryBufferMemory

---

## Setup

In [1]:
import openai
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())
openai.api_type = os.environ.get("OPENAI_API_TYPE")
openai.api_base = os.environ.get("OPENAI_API_BASE")
openai.api_key = os.environ.get("OPENAI_API_KEY")
openai.api_version = os.environ.get("OPENAI_API_VERSION")

## ConversationBufferMemory
- Unlimited memory.
- Stores all conversations.
- Entire conversation is sent to LLM every time.
- Can reach token limit quickly.
- Costly as the tokens sent to LLM will be increasing with every ping.

In [2]:
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

In [3]:
chat = AzureChatOpenAI(
    deployment_name="gpt4",
    temperature=0,
)

In [4]:
memory = ConversationBufferMemory()
conversation = ConversationChain(llm=chat, memory=memory, verbose=True)

In [5]:
conversation.predict(input="Hi, my name is Alice?")



[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Alice?
AI:[0m

[1m> Finished chain.[0m


"Hello Alice! It's nice to meet you. I'm an AI here to help you with any questions or topics you'd like to discuss. What can I help you with today?"

In [6]:
conversation.predict(input="What's 1+1?")



[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Alice?
AI: Hello Alice! It's nice to meet you. I'm an AI here to help you with any questions or topics you'd like to discuss. What can I help you with today?
Human: What's 1+1?
AI:[0m

[1m> Finished chain.[0m


'1+1 equals 2.'

In [7]:
conversation.predict(input="What's my name?")



[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Alice?
AI: Hello Alice! It's nice to meet you. I'm an AI here to help you with any questions or topics you'd like to discuss. What can I help you with today?
Human: What's 1+1?
AI: 1+1 equals 2.
Human: What's my name?
AI:[0m

[1m> Finished chain.[0m


'Your name is Alice.'

In [8]:
memory.buffer

"Human: Hi, my name is Alice?\nAI: Hello Alice! It's nice to meet you. I'm an AI here to help you with any questions or topics you'd like to discuss. What can I help you with today?\nHuman: What's 1+1?\nAI: 1+1 equals 2.\nHuman: What's my name?\nAI: Your name is Alice."

In [9]:
memory.load_memory_variables({})

{'history': "Human: Hi, my name is Alice?\nAI: Hello Alice! It's nice to meet you. I'm an AI here to help you with any questions or topics you'd like to discuss. What can I help you with today?\nHuman: What's 1+1?\nAI: 1+1 equals 2.\nHuman: What's my name?\nAI: Your name is Alice."}

### Modify memory

In [10]:
memory2 = ConversationBufferMemory()
memory2.save_context({"input": "Hi"}, {"output": "What's up?"})
memory2.buffer

"Human: Hi\nAI: What's up?"

In [11]:
memory2.save_context({"input": "Nothing much, just hanging."}, {"output": "Cool."})
memory2.buffer

"Human: Hi\nAI: What's up?\nHuman: Nothing much, just hanging.\nAI: Cool."

## ConversationBufferWindowMemory
- Only keeps a window of memory.
- `k` controls the window size.
- prevents token limit from being reached.
- not as costly as `ConversationBufferMemory` since number of tokens does not keep on growing.

In [12]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=2)
memory.save_context({"input": "Hi"}, {"output": "What's up?"})
memory.save_context({"input": "Nothing much, just hanging."}, {"output": "Cool."})
memory.save_context({"input": "What's 1+1?"}, {"output": "2"})
memory.buffer  # shows entire buffer

[HumanMessage(content='Hi', additional_kwargs={}, example=False),
 AIMessage(content="What's up?", additional_kwargs={}, example=False),
 HumanMessage(content='Nothing much, just hanging.', additional_kwargs={}, example=False),
 AIMessage(content='Cool.', additional_kwargs={}, example=False),
 HumanMessage(content="What's 1+1?", additional_kwargs={}, example=False),
 AIMessage(content='2', additional_kwargs={}, example=False)]

In [13]:
memory.load_memory_variables({})  # shows only last k conversations

{'history': "Human: Nothing much, just hanging.\nAI: Cool.\nHuman: What's 1+1?\nAI: 2"}

In [14]:
conversation = ConversationChain(llm=chat, memory=memory, verbose=True)
conversation.predict(input="What's my name?")



[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Nothing much, just hanging.
AI: Cool.
Human: What's 1+1?
AI: 2
Human: What's my name?
AI:[0m

[1m> Finished chain.[0m


"I'm sorry, I don't know your name."

## ConversationTokenBufferMemory
- memory will limit the number of tokens saved.
- more control on the cost since we can control the number of tokens sent to LLM.
- need to also provide the `llm` along with `max_token_limit` (since, different llm count tokens differently).

In [15]:
from langchain.memory import ConversationTokenBufferMemory

memory = ConversationTokenBufferMemory(llm=chat, max_token_limit=40)
memory.save_context({"input": "Hi"}, {"output": "What's up?"})
memory.save_context({"input": "Nothing much, just hanging."}, {"output": "Cool."})
memory.save_context({"input": "What's 1+1?"}, {"output": "2"})
memory.buffer  # shows only last k tokens

[HumanMessage(content='Nothing much, just hanging.', additional_kwargs={}, example=False),
 AIMessage(content='Cool.', additional_kwargs={}, example=False),
 HumanMessage(content="What's 1+1?", additional_kwargs={}, example=False),
 AIMessage(content='2', additional_kwargs={}, example=False)]

In [16]:
memory.load_memory_variables({})  # shows only last k tokens

{'history': "Human: Nothing much, just hanging.\nAI: Cool.\nHuman: What's 1+1?\nAI: 2"}

## ConversationSummaryBufferMemory
- stores the summary of the conversation in memory.
- requires `llm` along with `max_token_limit`.
- tokens are counted based on the `llm` provided.
- if `max_token_limit` is reached, rest of the conversation is summarized using `llm` and stored in the memory as system message along with unsummarized last `max_token_limit` tokens.

In [17]:
from langchain.memory import ConversationSummaryBufferMemory

memory = ConversationSummaryBufferMemory(llm=chat, max_token_limit=40)
memory.save_context({"input": "Hi"}, {"output": "What's up?"})
memory.save_context({"input": "Nothing much, just hanging."}, {"output": "Cool."})
memory.save_context({"input": "What's 1+1?"}, {"output": "2"})
memory.buffer  # shows only last k tokens

[HumanMessage(content='Nothing much, just hanging.', additional_kwargs={}, example=False),
 AIMessage(content='Cool.', additional_kwargs={}, example=False),
 HumanMessage(content="What's 1+1?", additional_kwargs={}, example=False),
 AIMessage(content='2', additional_kwargs={}, example=False)]

In [18]:
memory.load_memory_variables(
    {}
)  # shows the summary along with the unsummarized last k tokens

{'history': "System: The human greets the AI, and the AI responds by asking what's up.\nHuman: Nothing much, just hanging.\nAI: Cool.\nHuman: What's 1+1?\nAI: 2"}

In [20]:
conversation = ConversationChain(llm=chat, memory=memory, verbose=True)
conversation.predict(input="What is the capital of India?")



[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
System: The human greets the AI, and the AI responds by asking what's up.
AI: 2
Human: What is the capital of India?
AI: The capital of India is New Delhi.
Human: What is the capital of India?
AI:[0m

[1m> Finished chain.[0m


'The capital of India is New Delhi.'

In [21]:
memory.load_memory_variables({})  # shows the summary

{'history': "System: The human greets the AI, and the AI responds by asking what's up. The AI answers the human's question about the capital of India, which is New Delhi.\nHuman: What is the capital of India?\nAI: The capital of India is New Delhi."}

## Additional Memory Types
- **Vector data memory**: Stores texts in a vector database and retrieves the most similar/relevant text to the input text.
- **Entity memories**: Using an LLM, it remembers details about specific entities (like people, place, organization).

> Note: You can also use multiple memory types together. You can also store the conversation in a database (such as key-value store or SQL).