# LangChain with Memory

- ConversationBufferMemory

- ConversationBufferWindowMemory
- ConversationTokenBufferMemory
- ConversationSummaryBufferMemory

In [2]:
# Loading Environment variables file

from dotenv import load_dotenv
_ = load_dotenv()

In [4]:
from langchain.chat_models import init_chat_model

llm = init_chat_model("gemini-2.0-flash", model_provider="google_genai")

## ConversationBufferMemory

Stores all the conversation incrementally from the start in form of a buffer memory, which will be passed as context when the llm is invoked. 

In [6]:
from langchain.chains import ConversationChain # Deprecated as of 2025 and replaced with "RunnableWithMessageHistory"
from langchain.memory import ConversationBufferMemory

conversationBufferMemory = ConversationBufferMemory() # Creating an object 

# Likewise once we interact with the LLM, the context gets saved incrementally
conversationBufferMemory.save_context({"input" : "Hi I am Sai Kiran!"}, { "output" : "Hi Sai, how may I help you!"})
conversationBufferMemory.save_context({"input" : "What is 1+1?"}, { "output" : "The answer is 2"})

# Displays the complete context
print(conversationBufferMemory.load_memory_variables({}))


{'history': 'Human: Hi I am Sai Kiran!\nAI: Hi Sai, how may I help you!\nHuman: What is 1+1?\nAI: The answer is 2'}


## ConversationBufferWindowMemory

Similar to `ConversationBufferMemory`, but it stores the latest conversation based on the provided `k` value during instantiation. 

For example, there are around 10 turns of conversation happened, and if we initialize the value of `k = 2`, the context will be trimmed to last 2 conversations only. 

In [10]:
from langchain.chains import ConversationChain # Deprecated as of 2025 and replaced with "RunnableWithMessageHistory"
from langchain.memory import ConversationBufferWindowMemory

conversationBufferWindowMemory = ConversationBufferWindowMemory(k=2)

# Likewise once we interact with the LLM, the context gets saved incrementally
conversationBufferWindowMemory.save_context({"input" : "Hi I am Sai Kiran!"}, { "output" : "Hi Sai, how may I help you!"})
conversationBufferWindowMemory.save_context({"input" : "What is 1+1?"}, { "output" : "The answer is 2"})
conversationBufferWindowMemory.save_context({"input" : "How it is going?"}, { "output" : "Its going better."})
conversationBufferWindowMemory.save_context({"input" : "Hey!"}, { "output" : "I am good."})

# Displays the context showing only the latest k conversation
print(conversationBufferWindowMemory.load_memory_variables({}))

{'history': 'Human: How it is going?\nAI: Its going better.\nHuman: Hey!\nAI: I am good.'}


## ConversationTokenBufferMemory

Instead of using the value of `k`, the conversation memory will be limited to the number of tokens as specified while instantating an object. It means, if 100 is the token limit we set, automatically the older or earlier tokens will be truncated acccordingly irrespective of what content is present before.

This memory will be useful when there are strict restrictions on Token Usage.

In [13]:
from langchain.chains import ConversationChain # Deprecated as of 2025 and replaced with "RunnableWithMessageHistory"
from langchain.memory import ConversationTokenBufferMemory

conversationTokenBufferMemory = ConversationTokenBufferMemory(
                                    llm=llm, # LLM will be passed as token size differs from each LLM to LLM
                                    max_token_limit=30) # Token limit explicitly set.

# Likewise once we interact with the LLM, the context gets saved incrementally
conversationTokenBufferMemory.save_context({"input" : "Hi I am Sai Kiran!"}, { "output" : "Hi Sai, how may I help you!"})
conversationTokenBufferMemory.save_context({"input" : "What is 1+1?"}, { "output" : "The answer is 2"})
conversationTokenBufferMemory.save_context({"input" : "How it is going?"}, { "output" : "Its going better."})
conversationTokenBufferMemory.save_context({"input" : "Hey!"}, { "output" : "I am good."})

# Displays the context showing based on the token limit set
print(conversationTokenBufferMemory.load_memory_variables({}))

{'history': 'AI: The answer is 2\nHuman: How it is going?\nAI: Its going better.\nHuman: Hey!\nAI: I am good.'}


## ConversationSummaryBufferMemory

Similar to before memory, here also we will set the `max_token_limit` to some value. But instead of truncating the previous conversation directly, when the `load_memory_variables({})` method is called, LLM generates a summary based on the conversation and it makes sures the conversation (summary + recent raw messages) is in adherence to the `max_token_limit`. 

In [16]:
from langchain.chains import ConversationChain # Deprecated as of 2025 and replaced with "RunnableWithMessageHistory"
from langchain.memory import ConversationSummaryBufferMemory

# creating a long string containing a summary
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

conversationSummaryBufferMemory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)

conversationSummaryBufferMemory.save_context({"input": "Hello"}, {"output": "What's up"})
conversationSummaryBufferMemory.save_context({"input": "Not much, just hanging"}, {"output": "Cool"})
conversationSummaryBufferMemory.save_context({"input": "What is on the schedule today?"}, {"output": f"{schedule}"})

# Displays the summary + conversation in adherence to the max_token_limit that was set. 
print(conversationSummaryBufferMemory.load_memory_variables({}))

{'history': 'System: The human greets the AI. The AI responds. The human states they are not doing much. The AI acknowledges this. The human asks the AI about its schedule.\nAI: There is a meeting at 8am with your product team. You will need your powerpoint presentation prepared. 9am-12pm have time to work on your LangChain project which will go quickly because Langchain is such a powerful tool. At Noon, lunch at the italian resturant with a customer who is driving from over an hour away to meet you to understand the latest in AI. Be sure to bring your laptop to show the latest LLM demo.'}
