# Conversational Memory for LangChain

Conversational memory allows our chatbots and agents to remember previous interactions within a conversation. Without conversational memory, our chatbots would only ever be able to respond to the last message they received, essentially forgetting all previous messages with each new message.

Naturally, conversations require our chatbots to be able to respond over multiple interactions and refer to previous messages to understand the context of the conversation.

> ⚠️ **Important**

Although it is currently recommended to use the **LangGraph** library to implement chats with memory — since it allows you to efficiently work with complex workflows (for example: 🗓️ saving notes to Google Calendar, 🗄️ querying relational databases, and 🤖 using MCPs) — in this notebook we will use **LangChain** to explain the basic concepts of conversational memory.

This will make it easier to understand before moving on to more advanced tools.

## LangChain's Memory Types
LangChain versions `0.0.x` consisted of various conversational memory types. Most of these are due for deprecation but still hold value in understanding the different approaches that we can take to building conversational memory.

Throughout the notebook we will be referring to these older memory types and then rewriting them using the recommended RunnableWithMessageHistory class. We will learn about:

- **ConversationBufferMemory:** the simplest and most intuitive form of conversational memory, keeping track of a conversation without any additional bells and whistles.
- **ConversationBufferWindowMemory:** similar to ConversationBufferMemory, but only keeps track of the last k messages.
- **ConversationSummaryMemory:** rather than keeping track of the entire conversation, this memory type keeps track of a summary of the conversation.
- **ConversationSummaryBufferMemory:** merges the ConversationSummaryMemory and ConversationTokenBufferMemory types.

We'll work through each of these memory types in turn, and rewrite each one using the RunnableWithMessageHistory class.

In [32]:
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI

load_dotenv()
os.environ["LANGSMITH_PROJECT"] = "llm-training-05-rag-p4"

llm =  ChatOpenAI(
    model="gpt-4.1-nano",
    api_key=os.getenv("OPENAI_API_KEY"),
    temperature=0.7,
    verbose=True
)

### 1. ConversationBufferMemory with RunnableWithMessageHistory
When implementing `unnableWithMessageHistory` we will use LangChain Expression Language **(LCEL)** and for this we need to define our prompt template and LLM components.

In [35]:
from langchain.prompts import (
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
    ChatPromptTemplate
)

system_prompt = "You are a helpful assistant called Zeta."

prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_prompt),
    MessagesPlaceholder(variable_name="history"),
    HumanMessagePromptTemplate.from_template("{query}"),
])
pipeline = prompt_template | llm


Our `RunnableWithMessageHistory` requires our pipeline to be wrapped in a `RunnableWithMessageHistory` object. This object requires a few input parameters. One of those is `get_session_history`, which requires a function that returns a `ChatMessageHistory` object based on a session ID. We define this function ourselves:

In [36]:
from langchain_core.chat_history import InMemoryChatMessageHistory

chat_map = {}
def get_chat_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = InMemoryChatMessageHistory()
    return chat_map[session_id]

We also need to tell our runnable which variable name to use for the chat history (ie history) and which to use for the user's query (ie query).

In [37]:
from langchain_core.runnables.history import RunnableWithMessageHistory

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history"
)

In [45]:
# Test the pipeline with history
pipeline_with_history.invoke(
    {"query": "where is the library?"},
    config={"session_id": "id_123"}
)

AIMessage(content="Could you please specify which library you're referring to? Are you asking about a specific library at your location, a public library nearby, or a particular library related to your work at LangSmith? \n\nProviding a bit more detail will help me give you the most accurate information.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 54, 'prompt_tokens': 187, 'total_tokens': 241, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': 'fp_38343a2f8f', 'id': 'chatcmpl-C1ZriMrpNW2hOOOBUgzYh7z1wQTFp', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--2c757058-92c3-4c53-b50d-407bd1fbbb8f-0', usage_metadata={'input_tokens': 187, 'output_tokens': 54, 'total_tokens': 241, 'input_token_detai

In [44]:
chat_map["id_123"].messages

[HumanMessage(content='Hi, my name is Roger', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Hello, Roger! Nice to meet you. How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 26, 'total_tokens': 42, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': 'fp_38343a2f8f', 'id': 'chatcmpl-C1Zp33V2RwhnYr3rsXMREiICJEQN8', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--519e9459-140f-4e0a-a144-7f3fb3d80e62-0', usage_metadata={'input_tokens': 26, 'output_tokens': 16, 'total_tokens': 42, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}),
 HumanMessage(content="What's my name", additio

In [6]:
# Test the pipeline with history, getting a response using the same session ID
pipeline_with_history.invoke(
    {"query": "What's my name?"},
    config={"session_id": "id_123"}
)

AIMessage(content='Your name is Roger.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 54, 'total_tokens': 59, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': 'fp_38343a2f8f', 'id': 'chatcmpl-C1YgKNHyddqquiwkwLkFMC9ZfJYmA', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--f30e3886-71d5-43c8-b07e-6a724d4fb35a-0', usage_metadata={'input_tokens': 54, 'output_tokens': 5, 'total_tokens': 59, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

In [7]:
# Test the pipeline with history, getting a response using a new session ID
pipeline_with_history.invoke(
    {"query": "What's my name?"},
    config={"session_id": "id_456"}
)

AIMessage(content="Hello! I don't know your name unless you tell me. What's your name?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 24, 'total_tokens': 40, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': 'fp_38343a2f8f', 'id': 'chatcmpl-C1YgKXC5xHiUE0FR6quo4GsJGomB3', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--ce869a36-08c1-4931-a396-385d408b928a-0', usage_metadata={'input_tokens': 24, 'output_tokens': 16, 'total_tokens': 40, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

We will use **PostgreSQL** for this example, with a database hosted on **NeonDB**.

In [8]:
# Create the table (if it doesn't exist) to store chat history
from langchain_postgres import PostgresChatMessageHistory
import psycopg
import os

CONNECTION_STRING = os.getenv("POSTGRESQL_CONNECTION_STRING")
sync_connection = psycopg.connect(CONNECTION_STRING)

table_name = "chat_history"
PostgresChatMessageHistory.create_tables(sync_connection, table_name)

In [21]:
from langchain.memory import ConversationBufferWindowMemory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableWithMessageHistory
from langchain_core.messages import SystemMessage, AIMessage, HumanMessage
import uuid

# Replace with your PostgreSQL credentials
session_id = str(uuid.uuid4())

# Initialize the chat history manager
chat_history = PostgresChatMessageHistory(
    table_name,
    session_id,
    sync_connection=sync_connection
)

chat_history.add_messages(
    [
        SystemMessage(content="You are a helpful assistant."),
        HumanMessage(content="Hello My name is Roger I live in Cajamarca, Peru, how are you?"),
        AIMessage(content="I'm doing well, thank you! How can I assist you today?"),
        HumanMessage(content="What is my name?"),
        AIMessage(content="Your name is Roger."),
        HumanMessage(content="Where do I live?"),
        AIMessage(content="You live in Cajamarca, Peru."),
        HumanMessage(content="How old am I?"),
        AIMessage(content="I'm not sure how old you are, but I can help you with other questions."),
    ]
)


pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    lambda s: PostgresChatMessageHistory(table_name, s, sync_connection=sync_connection),
    input_messages_key="query",   # must match the prompt input variable
    history_messages_key="history"  # must match the prompt history variable
)

pipeline_with_history.invoke(
    {"query": "Hello, how are you?"},
    config={"configurable": {"session_id": session_id}}
)

AIMessage(content="Hello! I'm doing well, thank you. How can I assist you today?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 144, 'total_tokens': 160, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': 'fp_38343a2f8f', 'id': 'chatcmpl-C1YmPzdceLc0IIyZEGi6WG9yxWQVi', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--63ffd87b-73c3-4c40-a82e-af82c4d130e9-0', usage_metadata={'input_tokens': 144, 'output_tokens': 16, 'total_tokens': 160, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

### 2. ConversationBufferWindowMemory
The `ConversationBufferWindowMemory` type is similar to `ConversationBufferMemory`, but only keeps track of the **last k messages**. There are a few reasons why we would want to keep only the last k messages:

⚡️ **Very Important**: More messages mean more tokens are sent with each request, and more tokens increases latency and cost. 💸⏳

LLMs tend to perform worse when given more tokens, making them more likely to deviate from instructions, hallucinate, or "forget" information provided to them. Conciseness is key to high performing LLMs.

If we keep all messages we will eventually hit the LLM's context window limit, by adding a window size k we can ensure we never hit this limit.

The buffer window solves many problems that we encounter with the standard buffer memory, while still being a very simple and intuitive form of conversational memory.

In [28]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=3, return_messages=True)

memory.chat_memory.add_user_message("Hi, my name is Roger")
memory.chat_memory.add_ai_message("Hey Roger, what's up? I'm an AI model called Zeta.")
memory.chat_memory.add_user_message("I'm researching the different types of conversational memory.")
memory.chat_memory.add_ai_message("That's interesting, what are some examples?")
memory.chat_memory.add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
memory.chat_memory.add_ai_message("That's interesting, what's the difference?")
memory.chat_memory.add_user_message("Buffer memory just stores the entire conversation, right?")
memory.chat_memory.add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
memory.chat_memory.add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
memory.chat_memory.add_ai_message("Very cool!")

memory.load_memory_variables({})

{'history': [HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})]}

In [29]:
from langchain.chains import ConversationChain

chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)
chain.invoke({"input": "what is my name again?"})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
[HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}), AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}), HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}), AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}), HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}), AIMe

{'input': 'what is my name again?',
 'history': [HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})],
 'response': "I'm sorry, but I don't know your name since you haven't told me. Would you like to share it?"}

### 3. ConversationSummaryMemory
Next up we have ConversationSummaryMemory, this memory type keeps track of a summary of the conversation rather than the entire conversation. This is useful for long conversations where we don't need to keep track of the entire conversation, but we do want to keep some thread of the full conversation.

As before, we'll start with the original memory class before reimplementing it with the RunnableWithMessageHistory class.

In [30]:
from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(llm=llm)
chain = ConversationChain(
    llm=llm,
    memory = memory,
    verbose=True
)


  memory = ConversationSummaryMemory(llm=llm)


In [31]:
chain.invoke({"input": "hello there my name is James"})
chain.invoke({"input": "I am researching the different types of conversational memory."})
chain.invoke({"input": "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory."})
chain.invoke({"input": "Buffer memory just stores the entire conversation"})
chain.invoke({"input": "Buffer window memory stores the last k messages, dropping the rest."})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: hello there my name is James
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
The human introduces himself as James, and the AI responds warmly, greeting James and offering assistance.
Human: I am researching the different types of conversational memory.
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new Convers

{'input': 'Buffer window memory stores the last k messages, dropping the rest.',
 'history': 'The human, named James, mentions researching ConversationBufferMemory and ConversationBufferWindowMemory, and the AI explains that ConversationBufferMemory stores the full conversation history for context, while ConversationBufferWindowMemory maintains a sliding window of recent messages to optimize performance; the AI offers to provide more details or guidance on their use. The human clarifies that BufferMemory stores the entire conversation, and the AI confirms this, explaining the advantages and trade-offs of each approach, and offers to assist further.',
 'response': "That's correct! ConversationBufferWindowMemory keeps a fixed number of the most recent messages—in other words, the last *k* messages—while discarding older ones beyond that window. This approach helps manage memory usage and maintains relevant context without overwhelming the system with the entire conversation history. It's