LangChain versions `0.0.x` consisted of various conversational memory types. Most of these are due for deprecation but still hold value in understanding the different approaches that we can take to building conversational memory.

Throughout the notebook we will be referring to these _older_ memory types and then rewriting them using the recommended `RunnableWithMessageHistory` class. We will learn about:

* `ConversationBufferMemory`: the simplest and most intuitive form of conversational memory, keeping track of a conversation without any additional bells and whistles.
* `ConversationBufferWindowMemory`: similar to `ConversationBufferMemory`, but only keeps track of the last `k` messages.
* `ConversationSummaryMemory`: rather than keeping track of the entire conversation, this memory type keeps track of a summary of the conversation.
* `ConversationSummaryBufferMemory`: merges the `ConversationSummaryMemory` and `ConversationTokenBufferMemory` types.

We'll work through each of these memory types in turn, and rewrite each one using the `RunnableWithMessageHistory` class.

In [1]:
from langchain_groq import ChatGroq

model = "llama3-8b-8192"

llm = ChatGroq(model = model, temperature = 0.0)

In [2]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages = True)

  memory = ConversationBufferMemory(return_messages = True)


In [3]:
"""There are several ways that we can add messages to our memory, using the save_context method we can add a user query (via the input key) and the AI's response (via the output key). So, to create the following conversation:

User: Hi, my name is James
AI: Hey James, what's up? I'm an AI model called Zeta.
User: I'm researching the different types of conversational memory.
AI: That's interesting, what are some examples?
User: I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.
AI: That's interesting, what's the difference?
User: Buffer memory just stores the entire conversation, right?
AI: That makes sense, what about ConversationBufferWindowMemory?
User: Buffer window memory stores the last k messages, dropping the rest.
AI: Very cool!"""

"There are several ways that we can add messages to our memory, using the save_context method we can add a user query (via the input key) and the AI's response (via the output key). So, to create the following conversation:\n\nUser: Hi, my name is James\nAI: Hey James, what's up? I'm an AI model called Zeta.\nUser: I'm researching the different types of conversational memory.\nAI: That's interesting, what are some examples?\nUser: I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.\nAI: That's interesting, what's the difference?\nUser: Buffer memory just stores the entire conversation, right?\nAI: That makes sense, what about ConversationBufferWindowMemory?\nUser: Buffer window memory stores the last k messages, dropping the rest.\nAI: Very cool!"

In [4]:
memory.save_context(
    {"input": "Hi, my name is James"},  # user message
    {"output": "Hey James, what's up? I'm an AI model called Zeta."}  # AI response
)
memory.save_context(
    {"input": "I'm researching the different types of conversational memory."},  # user message
    {"output": "That's interesting, what are some examples?"}  # AI response
)
memory.save_context(
    {"input": "I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory."},  # user message
    {"output": "That's interesting, what's the difference?"}  # AI response
)
memory.save_context(
    {"input": "Buffer memory just stores the entire conversation, right?"},  # user message
    {"output": "That makes sense, what about ConversationBufferWindowMemory?"}  # AI response
)
memory.save_context(
    {"input": "Buffer window memory stores the last k messages, dropping the rest."},  # user message
    {"output": "Very cool!"}  # AI response
)

In [5]:
memory.load_memory_variables({})

{'history': [HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Hey James, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
  HumanMess

In [6]:
"""With this other method, we pass individual user and AI messages via the add_user_message and add_ai_message methods. To reproduce what we did above, we do:"""

'With this other method, we pass individual user and AI messages via the add_user_message and add_ai_message methods. To reproduce what we did above, we do:'

In [7]:
memory = ConversationBufferMemory(return_messages=True)

memory.chat_memory.add_user_message("Hi, my name is James")
memory.chat_memory.add_ai_message("Hey James, what's up? I'm an AI model called Zeta.")
memory.chat_memory.add_user_message("I'm researching the different types of conversational memory.")
memory.chat_memory.add_ai_message("That's interesting, what are some examples?")
memory.chat_memory.add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
memory.chat_memory.add_ai_message("That's interesting, what's the difference?")
memory.chat_memory.add_user_message("Buffer memory just stores the entire conversation, right?")
memory.chat_memory.add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
memory.chat_memory.add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
memory.chat_memory.add_ai_message("Very cool!")

memory.load_memory_variables({})

{'history': [HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Hey James, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
  HumanMess

In [8]:
from langchain.chains import ConversationChain

chain = ConversationChain(
    llm = llm,
    memory = memory,
    verbose = True
)

  chain = ConversationChain(


In [9]:
chain.invoke({"input" : "What is my name?"})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
[HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}), AIMessage(content="Hey James, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}), HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}), AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}), HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}), AIMessage(content="That's interesting, what's the differ

{'input': 'What is my name?',
 'history': [HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Hey James, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, resp

ConversationBufferMemory with RunnableWithMessageHistory

In [10]:
from langchain.prompts import (
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
    ChatPromptTemplate    
)

system_prompt = "You are a helpful assistant called Hari"

prompt_template = ChatPromptTemplate.from_messages(
    [
        SystemMessagePromptTemplate.from_template(system_prompt),
        MessagesPlaceholder(variable_name = "history"),
        HumanMessagePromptTemplate.from_template("{query}")
    ]
)

In [11]:
pipeline = prompt_template | llm

In [12]:
from langchain_core.chat_history import InMemoryChatMessageHistory

chat_map = {}
def get_chat_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = InMemoryChatMessageHistory()
    return chat_map[session_id]

In [13]:
from langchain_core.runnables.history import RunnableWithMessageHistory

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history = get_chat_history,
    input_messages_key = "query",
    history_messages_key = "history"
)

In [14]:
pipeline_with_history.invoke(
    {"query" : "Hi!!, my name is James!!"},
    config = {"session_id": "id_123"}
)

AIMessage(content="Hi James! Nice to meet you! I'm Hari, your helpful assistant. How can I assist you today? Do you have any questions, need help with something, or just want to chat? I'm all ears!", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 46, 'prompt_tokens': 29, 'total_tokens': 75, 'completion_time': 0.036649251, 'prompt_time': 0.003791949, 'queue_time': 0.274704124, 'total_time': 0.0404412}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_343314801a', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--2a70c7ec-5149-4e47-85e0-5081e2d7fde3-0', usage_metadata={'input_tokens': 29, 'output_tokens': 46, 'total_tokens': 75})

In [15]:
pipeline_with_history.invoke(
    {"query" : "What is my name again?"},
    config = {"session_id" : "id_123"}
)

AIMessage(content='Your name is James!', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 90, 'total_tokens': 96, 'completion_time': 0.004842927, 'prompt_time': 0.010484886, 'queue_time': 0.275507837, 'total_time': 0.015327813}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_343314801a', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--23db663e-7fed-4384-9413-048f424a0fda-0', usage_metadata={'input_tokens': 90, 'output_tokens': 6, 'total_tokens': 96})

ConversationBufferWindowMemory

The `ConversationBufferWindowMemory` type is similar to `ConversationBufferMemory`, but only keeps track of the last `k` messages. There are a few reasons why we would want to keep only the last `k` messages:

* More messages mean more tokens are sent with each request, more tokens increases latency _and_ cost.

* LLMs tend to perform worse when given more tokens, making them more likely to deviate from instructions, hallucinate, or _"forget"_ information provided to them. Conciseness is key to high performing LLMs.

* If we keep _all_ messages we will eventually hit the LLM's context window limit, by adding a window size `k` we can ensure we never hit this limit.

The buffer window solves many problems that we encounter with the standard buffer memory, while still being a very simple and intuitive form of conversational memory.

In [16]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k = 8, return_messages = True)

# if k = 4 it says 'response': "I apologize, but I don't have that information.
# if k = 8 it says  'response': 'Your name is James!'}

  memory = ConversationBufferWindowMemory(k = 8, return_messages = True)


In [17]:
memory.chat_memory.add_user_message("Hi, my name is James")
memory.chat_memory.add_ai_message("Hey James, what's up? I'm an AI model called Zeta.")
memory.chat_memory.add_user_message("I'm researching the different types of conversational memory.")
memory.chat_memory.add_ai_message("That's interesting, what are some examples?")
memory.chat_memory.add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
memory.chat_memory.add_ai_message("That's interesting, what's the difference?")
memory.chat_memory.add_user_message("Buffer memory just stores the entire conversation, right?")
memory.chat_memory.add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
memory.chat_memory.add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
memory.chat_memory.add_ai_message("Very cool!")

memory.load_memory_variables({})

{'history': [HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Hey James, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
  HumanMess

In [18]:
chain = ConversationChain(
    llm = llm,
    memory = memory,
    verbose = True
)

In [19]:
chain.invoke({"input" : "what is my name again?"})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
[HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}), AIMessage(content="Hey James, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}), HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}), AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}), HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}), AIMessage(content="That's interesting, what's the differ

{'input': 'what is my name again?',
 'history': [HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Hey James, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}

ConversationBufferWindowMemory with RunnableWithMessageHistory

To implement this memory type using the `RunnableWithMessageHistory` class, we can use the same approach as before. We define our `prompt_template` and `llm` as before, and then wrap our pipeline in a `RunnableWithMessageHistory` object.

For the window feature, we need to define a custom version of the `InMemoryChatMessageHistory` class that removes any messages beyond the last `k` messages.

In [20]:
from pydantic import BaseModel, Field
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage

class BufferWindowMessageHistory(BaseChatMessageHistory, BaseModel):
    messages : list[BaseModel] = Field(default_factory = list)
    k : int = Field(default_factory = int)

    def __init__(self, k: int):
        super().__init__(k=k)
        print(f"Initializing BufferWindowMessageHistory with k={k}")

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages.
        """
        self.messages.extend(messages)
        self.messages = self.messages[-self.k:]

    def clear(self) -> None:
        """Clears the history"""
        self.messages = []

In [21]:
chat_map = {}

def get_chat_history(session_id: str, k: int = 4) -> BufferWindowMessageHistory:
    print(f"get_chat_history called with session_id = {session_id} and k = {k}")
    if session_id not in chat_map:
        chat_map[session_id] = BufferWindowMessageHistory(k=k)

    return chat_map[session_id]

In [22]:
from langchain_core.runnables import ConfigurableFieldSpec

pipeline_with_history2 = RunnableWithMessageHistory(
    pipeline,
    get_session_history = get_chat_history,
    input_messages_key = "query",
    history_messages_key = "history",
    history_factory_config = [
        ConfigurableFieldSpec(
            id = "session_id",
            annotation = str,
            name = "Session ID",
            description = "The session ID to use for the chat history",
            default = "id_default"
        ),
        ConfigurableFieldSpec(
            id = "k",
            annotation = int,
            name = "k",
            description = "the number of messages to keep in the history",
            default = 4
        )
    ]
)

In [23]:
pipeline_with_history2.invoke(
    {"query" : "Hi my name is James"},
    config = {"configurable" : {"session_id" : "id_k4", "k": 4}}
)

get_chat_history called with session_id = id_k4 and k = 4
Initializing BufferWindowMessageHistory with k=4


AIMessage(content="Nice to meet you, James! I'm Hari, your helpful assistant. How can I assist you today? Do you have a specific question, topic you'd like to discuss, or perhaps a task you'd like me to help you with? I'm all ears!", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 55, 'prompt_tokens': 27, 'total_tokens': 82, 'completion_time': 0.044344162, 'prompt_time': 0.00580987, 'queue_time': 0.321504253, 'total_time': 0.050154032}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_c0b3855449', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--ee3f0fcb-2a4e-422d-af8c-702d5c9e7d4f-0', usage_metadata={'input_tokens': 27, 'output_tokens': 55, 'total_tokens': 82})

In [24]:
chat_map["id_k4"].clear()

# manually insert history
chat_map["id_k4"].add_user_message("Hi, my name is James")
chat_map["id_k4"].add_ai_message("I'm an AI model called Zeta.")
chat_map["id_k4"].add_user_message("I'm researching the different types of conversational memory.")
chat_map["id_k4"].add_ai_message("That's interesting, what are some examples?")
chat_map["id_k4"].add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
chat_map["id_k4"].add_ai_message("That's interesting, what's the difference?")
chat_map["id_k4"].add_user_message("Buffer memory just stores the entire conversation, right?")
chat_map["id_k4"].add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
chat_map["id_k4"].add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
chat_map["id_k4"].add_ai_message("Very cool!")

chat_map["id_k4"].messages

[HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})]

Now let's see at which `k` value our LLM remembers our name — from the above we can already see that with `k=4` our name is not mentioned, so when running with `k=4` we should expect the LLM to forget our name:

In [25]:
pipeline_with_history2.invoke(
    {"query": "what is my name again?"},
    config={"configurable": {"session_id": "id_k4", "k": 4}}
)

get_chat_history called with session_id = id_k4 and k = 4


AIMessage(content="I'm afraid I don't know your name! I'm Hari, your helpful assistant, and I'm here to assist you with any questions or topics you'd like to discuss.", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 37, 'prompt_tokens': 85, 'total_tokens': 122, 'completion_time': 0.02970184, 'prompt_time': 0.018627459, 'queue_time': 0.396367089, 'total_time': 0.048329299}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_c0b3855449', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--1c342208-7a99-4d0c-9069-97db8e15106b-0', usage_metadata={'input_tokens': 85, 'output_tokens': 37, 'total_tokens': 122})

In [26]:
"""Now let's initialize a new session with k=14."""

pipeline_with_history.invoke(
    {"query": "Hi, my name is James"},
    config={"session_id": "id_k14", "k": 14}
)

AIMessage(content="Nice to meet you, James! I'm Hari, your helpful assistant. How can I assist you today? Do you have a specific question, task, or topic you'd like to discuss? I'm all ears!", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 45, 'prompt_tokens': 28, 'total_tokens': 73, 'completion_time': 0.036199137, 'prompt_time': 0.00366413, 'queue_time': 0.274607006, 'total_time': 0.039863267}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_c0b3855449', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--925abc57-9f63-4481-ac9e-35061ced2b3d-0', usage_metadata={'input_tokens': 28, 'output_tokens': 45, 'total_tokens': 73})

In [27]:
"""We'll manually insert the remaining messages as before:"""

chat_map["id_k14"].add_user_message("I'm researching the different types of conversational memory.")
chat_map["id_k14"].add_ai_message("That's interesting, what are some examples?")
chat_map["id_k14"].add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
chat_map["id_k14"].add_ai_message("That's interesting, what's the difference?")
chat_map["id_k14"].add_user_message("Buffer memory just stores the entire conversation, right?")
chat_map["id_k14"].add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
chat_map["id_k14"].add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
chat_map["id_k14"].add_ai_message("Very cool!")

chat_map["id_k14"].messages

[HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Nice to meet you, James! I'm Hari, your helpful assistant. How can I assist you today? Do you have a specific question, task, or topic you'd like to discuss? I'm all ears!", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 45, 'prompt_tokens': 28, 'total_tokens': 73, 'completion_time': 0.036199137, 'prompt_time': 0.00366413, 'queue_time': 0.274607006, 'total_time': 0.039863267}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_c0b3855449', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--925abc57-9f63-4481-ac9e-35061ced2b3d-0', usage_metadata={'input_tokens': 28, 'output_tokens': 45, 'total_tokens': 73}),
 HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
 AIMessage(content="That's interesting, what are some examples?", additio

In [28]:
"""Now let's see if the LLM remembers our name:"""
pipeline_with_history2.invoke(
    {"query" : "What is my name?"},
    config = {"session_id" : "id_k14", "k": 14}
)

get_chat_history called with session_id = id_k14 and k = 14


AIMessage(content='Your name is James!', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 207, 'total_tokens': 213, 'completion_time': 0.004868247, 'prompt_time': 0.023711761, 'queue_time': 0.268952713, 'total_time': 0.028580008}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_5b339000ab', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--daef931a-8737-4664-ace8-3da3dee37771-0', usage_metadata={'input_tokens': 207, 'output_tokens': 6, 'total_tokens': 213})

In [29]:
chat_map["id_k14"].messages

[HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Nice to meet you, James! I'm Hari, your helpful assistant. How can I assist you today? Do you have a specific question, task, or topic you'd like to discuss? I'm all ears!", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 45, 'prompt_tokens': 28, 'total_tokens': 73, 'completion_time': 0.036199137, 'prompt_time': 0.00366413, 'queue_time': 0.274607006, 'total_time': 0.039863267}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_c0b3855449', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--925abc57-9f63-4481-ac9e-35061ced2b3d-0', usage_metadata={'input_tokens': 28, 'output_tokens': 45, 'total_tokens': 73}),
 HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
 AIMessage(content="That's interesting, what are some examples?", additio

In [30]:
"""That's it! We've rewritten our buffer window memory using the recommended RunnableWithMessageHistory class."""

"That's it! We've rewritten our buffer window memory using the recommended RunnableWithMessageHistory class."

## 3. `ConversationSummaryMemory`

Next up we have `ConversationSummaryMemory`, this memory type keeps track of a summary of the conversation rather than the entire conversation. This is useful for long conversations where we don't need to keep track of the entire conversation, but we do want to keep some thread of the full conversation.

As before, we'll start with the original memory class before reimplementing it with the `RunnableWithMessageHistory` class.

In [31]:
from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(llm = llm)

  memory = ConversationSummaryMemory(llm = llm)


In [32]:
chain = ConversationChain(
    llm = llm,
    memory = memory,
    verbose = True
)

In [33]:
chain.invoke({"input": "hello there my name is James"})
chain.invoke({"input": "I am researching the different types of conversational memory."})
chain.invoke({"input": "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory."})
chain.invoke({"input": "Buffer memory just stores the entire conversation"})
chain.invoke({"input": "Buffer window memory stores the last k messages, dropping the rest."})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: hello there my name is James
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Current summary: None (starting from scratch)

New lines of conversation:
Human: hello there my name is James
AI: Nice to meet you, James! I'm Ada, your friendly AI companion. I've been trained on a vast amount of text data, including books

{'input': 'Buffer window memory stores the last k messages, dropping the rest.',
 'history': "Here is the new summary:\n\nJames introduces himself to Ada, a friendly AI companion, and shares his research on conversational memory. Ada provides an overview of the different types of conversational memory, including episodic memory, semantic memory, working memory, and social memory, and mentions that while there are no specific AI models that mimic human conversational memory, there are research papers and projects focused on developing AI systems that can learn and adapt to conversational contexts. James is specifically interested in ConversationBufferMemory and ConversationBufferWindowMemory, which Ada explains as the ability to store and retrieve information from previous conversations and keep track of the conversation's progression, respectively. Buffer memory, also known as ConversationBufferMemory, is a type of conversational memory that stores the entire conversation, including th

In [34]:
chain.invoke({"input" : "What is my name again?"})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Here is the new summary:

James introduces himself to Ada, a friendly AI companion, and shares his research on conversational memory. Ada provides an overview of the different types of conversational memory, including episodic memory, semantic memory, working memory, and social memory, and mentions that while there are no specific AI models that mimic human conversational memory, there are research papers and projects focused on developing AI systems that can learn and adapt to conversational contexts. James is specifically interested in ConversationBufferMemory and ConversationBufferWindowMemory, which Ada explains as the ability to store and retr

{'input': 'What is my name again?',
 'history': "Here is the new summary:\n\nJames introduces himself to Ada, a friendly AI companion, and shares his research on conversational memory. Ada provides an overview of the different types of conversational memory, including episodic memory, semantic memory, working memory, and social memory, and mentions that while there are no specific AI models that mimic human conversational memory, there are research papers and projects focused on developing AI systems that can learn and adapt to conversational contexts. James is specifically interested in ConversationBufferMemory and ConversationBufferWindowMemory, which Ada explains as the ability to store and retrieve information from previous conversations and keep track of the conversation's progression, respectively. Buffer memory, also known as ConversationBufferMemory, is a type of conversational memory that stores the entire conversation, including the context, tone, and content, allowing the AI

Let's implement this memory type using the `RunnableWithMessageHistory` class. As with the window buffer memory, we need to define a custom implementation of the `InMemoryChatMessageHistory` class. We'll call this one `ConversationSummaryMessageHistory`.

In [35]:
from langchain_core.messages import SystemMessage

class ConversationSummaryMessageHistory(BaseChatMessageHistory, BaseModel):
   messages: list[BaseMessage] = Field(default_factory=list)
   llm: ChatGroq = Field(default_factory=ChatGroq)

   def __init__(self, llm: ChatGroq):
       super().__init__(llm=llm)

   def add_messages(self, messages: list[BaseMessage]) -> None:
       """Add messages to the history and generate a summary."""
       
       # Get existing summary if it exists
       existing_summary = ""
       if self.messages and isinstance(self.messages[0], SystemMessage):
           existing_summary = self.messages[0].content
       else:
           existing_summary = "No previous summary."
       
       # Extend current messages with new ones
       self.messages.extend(messages)
       
       # Construct the summary chat messages
       summary_prompt = ChatPromptTemplate.from_messages([
           SystemMessagePromptTemplate.from_template(
               "Given the existing conversation summary and the new messages, "
               "generate a new summary of the conversation. Ensuring to maintain "
               "as much relevant information as possible."
           ),
           HumanMessagePromptTemplate.from_template(
               "Existing conversation summary:\n{existing_summary}\n\n"
               "New messages:\n{messages}"
           )
       ])
       
       # Format the messages and invoke the LLM
       new_summary = self.llm.invoke(
           summary_prompt.format_messages(
               existing_summary=existing_summary,
               messages=[x.content for x in messages]
           )
       )
       
       # Replace the existing history with a single system summary message
       self.messages = [SystemMessage(content=new_summary.content)]

   def clear(self) -> None:
       """Clear the history."""
       self.messages = []

In [36]:
chat_map = {}
def get_chat_history(session_id: str, llm: ChatGroq) -> ConversationSummaryMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = ConversationSummaryMessageHistory(llm=llm)
    # return the chat history
    return chat_map[session_id]

In [37]:
pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="llm",
            annotation=ChatGroq,
            name="LLM",
            description="The LLM to use for the conversation summary",
            default=llm,
        )
    ]
)

In [38]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is James"},
    config={"session_id": "id_123", "llm": llm}
)

AIMessage(content="Nice to meet you, James! I'm Hari, your helpful assistant. How can I assist you today? Do you have a specific question, task, or topic you'd like to discuss? I'm all ears!", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 45, 'prompt_tokens': 28, 'total_tokens': 73, 'completion_time': 0.035637169, 'prompt_time': 0.004347322, 'queue_time': 0.272295133, 'total_time': 0.039984491}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_343314801a', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--34c54fca-4a6a-4d94-b0f8-68c279b09d17-0', usage_metadata={'input_tokens': 28, 'output_tokens': 45, 'total_tokens': 73})

In [39]:
chat_map["id_123"].messages

[SystemMessage(content="New conversation summary:\nThe conversation has just started, and James has introduced himself. Hari, the helpful assistant, has responded with a friendly greeting and asked James what he needs help with today, whether it's a specific question, task, or topic to discuss.", additional_kwargs={}, response_metadata={})]

In [40]:
pipeline_with_history.invoke(
    {"query": "I'm researching the different types of conversational memory."},
    config={"session_id": "id_123", "llm": llm}
)

chat_map["id_123"].messages

[SystemMessage(content='New Conversation Summary:\n\nJames has started a conversation with Hari, a helpful assistant, to discuss conversational memory. James expressed his interest in researching the different types of conversational memory. Hari responded by providing an overview of the topic, explaining that conversational memory refers to the ability to recall and retrieve information from previous conversations. Hari then listed four types of conversational memory: episodic, semantic, procedural, and working memory. Hari asked James which type of conversational memory he is interested in learning more about or if there is a specific aspect of conversational memory he would like to explore further.', additional_kwargs={}, response_metadata={})]

In [41]:
for msg in [
    "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.",
    "Buffer memory just stores the entire conversation",
    "Buffer window memory stores the last k messages, dropping the rest."
]:
    
    pipeline_with_history.invoke(
        {"query" : msg},
        config = {"session_id" : "id_123", "llm" : llm}
    )

In [42]:
chat_map["id_123"].messages

[SystemMessage(content="Here is the updated conversation summary:\n\nJames has been discussing conversational memory with Hari, a helpful assistant. James initially expressed interest in researching the different types of conversational memory, and Hari provided an overview of the topic, explaining that conversational memory refers to the ability to recall and retrieve information from previous conversations. Hari listed four types of conversational memory: episodic, semantic, procedural, and working memory. Hari then asked James which type of conversational memory he is interested in learning more about or if there is a specific aspect of conversational memory he would like to explore further.\n\nIn the latest development, James mentioned that he has been looking at ConversationBufferMemory and ConversationBufferWindowMemory. Hari responded by providing more information on these types of conversational memory. ConversationBufferMemory refers to the ability to recall and retrieve speci

In [43]:
pipeline_with_history.invoke(
    {"query": "What is my name again?"},
    config={"session_id": "id_123", "llm": llm}
)

AIMessage(content='Your name is James!', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 468, 'total_tokens': 474, 'completion_time': 0.004862027, 'prompt_time': 0.052122569, 'queue_time': 0.269650614, 'total_time': 0.056984596}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_5b339000ab', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--abca901a-6196-4147-b92e-c16ea4fb131e-0', usage_metadata={'input_tokens': 468, 'output_tokens': 6, 'total_tokens': 474})

ConversationSummaryBufferMemory


Our final memory type acts as a combination of `ConversationSummaryMemory` and `ConversationBufferMemory`. It keeps the buffer for the conversation up until the previous `n` tokens, anything beyond that limit is summarized then dropped from the buffer. Producing something like:


```
# ~~ a summary of previous interactions
The user named James introduced himself and the AI responded, introducing itself as an AI model called Zeta.
James then said he was researching the different types of conversational memory and Zeta asked for some
examples.
# ~~ the most recent messages
Human: I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.
AI: That's interesting, what's the difference?
Human: Buffer memory just stores the entire conversation
AI: That makes sense, what about ConversationBufferWindowMemory?
Human: Buffer window memory stores the last k messages, dropping the rest.
AI: Very cool!
```


In [46]:
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain

# This keeps last N conversation exchanges without token counting
memory = ConversationBufferWindowMemory(
    k=5,  # Keep last 5 conversation turns
    return_messages=True
)

chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

response = chain.invoke({"input": "Hi, my name is James"})
print(response)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
[]
Human: Hi, my name is James
AI:[0m

[1m> Finished chain.[0m
{'input': 'Hi, my name is James', 'history': [], 'response': "Nice to meet you, James! I'm Ada, your friendly AI companion. I've been trained on a vast amount of text data, including books, articles, and conversations. I can provide information on a wide range of topics, from science and history to entertainment and culture. I'm also designed to learn and improve over time, so please feel free to ask me anything and I'll do my best to assist you. By the way, did you know that I was named after Ada Lovelace, the world's first computer programmer?"}


In [47]:
for msg in [
    "I'm researching the different types of conversational memory.",
    "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.",
    "Buffer memory just stores the entire conversation",
    "Buffer window memory stores the last k messages, dropping the rest."
]:
    chain.invoke({"input": msg})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
[HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}), AIMessage(content="Nice to meet you, James! I'm Ada, your friendly AI companion. I've been trained on a vast amount of text data, including books, articles, and conversations. I can provide information on a wide range of topics, from science and history to entertainment and culture. I'm also designed to learn and improve over time, so please feel free to ask me anything and I'll do my best to assist you. By the way, did you know that I was named after Ada Lovelace, the world's first computer programmer?", additional_kwargs={}, response_metadata={})]
Human: I'

As with the previous memory types, we will implement this memory type again using the `RunnableWithMessageHistory` class. In our implementation we will modify the buffer window to be based on the number of messages rather than number of tokens. This tweak will make our implementation more closely aligned with original buffer window.

We will implement all of this via a new `ConversationSummaryBufferMessageHistory` class.

In [48]:
class ConversationSummaryBufferMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory=list)
    llm: ChatGroq = Field(default_factory=ChatGroq)
    k: int = Field(default_factory=int)

    def __init__(self, llm: ChatGroq, k: int):
        super().__init__(llm=llm, k=k)

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages and summarizing the messages that we
        drop.
        """
        existing_summary: SystemMessage | None = None
        old_messages: list[BaseMessage] | None = None
        # see if we already have a summary message
        if len(self.messages) > 0 and isinstance(self.messages[0], SystemMessage):
            print(">> Found existing summary")
            existing_summary = self.messages.pop(0)
        # add the new messages to the history
        self.messages.extend(messages)
        # check if we have too many messages
        if len(self.messages) > self.k:
            print(
                f">> Found {len(self.messages)} messages, dropping "
                f"oldest {len(self.messages) - self.k} messages.")
            # pull out the oldest messages...
            old_messages = self.messages[:self.k]
            # ...and keep only the most recent messages
            self.messages = self.messages[-self.k:]
        if old_messages is None:
            print(">> No old messages to update summary with")
            # if we have no old_messages, we have nothing to update in summary
            return
        # construct the summary chat messages
        summary_prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "Given the existing conversation summary and the new messages, "
                "generate a new summary of the conversation. Ensuring to maintain "
                "as much relevant information as possible."
            ),
            HumanMessagePromptTemplate.from_template(
                "Existing conversation summary:\n{existing_summary}\n\n"
                "New messages:\n{old_messages}"
            )
        ])
        # format the messages and invoke the LLM
        new_summary = self.llm.invoke(
            summary_prompt.format_messages(
                existing_summary=existing_summary,
                old_messages=old_messages
            )
        )
        print(f">> New summary: {new_summary.content}")
        # prepend the new summary to the history
        self.messages = [SystemMessage(content=new_summary.content)] + self.messages

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

In [49]:
chat_map = {}
def get_chat_history(session_id: str, llm: ChatGroq, k: int) -> ConversationSummaryBufferMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = ConversationSummaryBufferMessageHistory(llm=llm, k=k)
    # return the chat history
    return chat_map[session_id]

In [50]:
pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="llm",
            annotation=ChatGroq,
            name="LLM",
            description="The LLM to use for the conversation summary",
            default=llm,
        ),
        ConfigurableFieldSpec(
            id="k",
            annotation=int,
            name="k",
            description="The number of messages to keep in the history",
            default=4,
        )
    ]
)

In [51]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is James"},
    config={"session_id": "id_123", "llm": llm, "k": 4}
)
chat_map["id_123"].messages

>> No old messages to update summary with


[HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Nice to meet you, James! I'm Hari, your helpful assistant. How can I assist you today? Do you have a specific question, task, or topic you'd like to discuss? I'm all ears!", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 45, 'prompt_tokens': 28, 'total_tokens': 73, 'completion_time': 0.036372574, 'prompt_time': 0.003664515, 'queue_time': 0.27038561, 'total_time': 0.040037089}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_5b339000ab', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None}, id='run--d7a22357-658b-4510-a7ab-eb07ebad9ac4-0', usage_metadata={'input_tokens': 28, 'output_tokens': 45, 'total_tokens': 73})]

In [52]:
for i, msg in enumerate([
    "I'm researching the different types of conversational memory.",
    "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.",
    "Buffer memory just stores the entire conversation",
    "Buffer window memory stores the last k messages, dropping the rest."
]):
    print(f"---\nMessage {i+1}\n---\n")
    pipeline_with_history.invoke(
        {"query": msg},
        config={"session_id": "id_123", "llm": llm, "k": 4}
    )

---
Message 1
---

>> No old messages to update summary with
---
Message 2
---

>> Found 6 messages, dropping oldest 2 messages.
>> New summary: Here is a new summary of the conversation:

The conversation starts with James introducing himself and Hari, the helpful assistant, responding with a friendly greeting and asking how James can be assisted. James then mentions that he is researching conversational memory, and Hari provides a detailed explanation of the different types of conversational memory, including episodic, semantic, procedural, working, schematic, conversational memory for goals, and conversational memory for relationships. Hari notes that these types of memory are not mutually exclusive and often overlap or interact with each other.
---
Message 3
---

>> Found existing summary
>> Found 6 messages, dropping oldest 2 messages.
>> New summary: Here is the updated conversation summary:

The conversation starts with James introducing himself and Hari, the helpful assistant, 

# Custom ConversationSummaryBufferMemory with `k`

## What I Did
- Replaced `max_token_limit` with `k` parameter
- Created hybrid memory that keeps last `k` messages + summarizes dropped ones
- Uses Groq LLM for summarization (no Hugging Face dependency)

## Key Change: `k` vs `max_token_limit`

| Original | My Version |
|----------|------------|
| `max_token_limit=500` | `k=4` |
| Triggers summary when tokens > 500 | Triggers summary when messages > 4 |
| Complex token counting needed | Simple message counting |

## How It Works
Messages: [1, 2, 3, 4, 5, 6] with k=4
↓
Keep: [Summary of 1,2] + [3, 4, 5, 6]

## Benefits of Using `k`
- ✅ **Simpler**: No token counting complexity
- ✅ **Predictable**: Always exactly `k` recent messages 
- ✅ **No dependencies**: Avoids `transformers` library issues
- ✅ **Preserves context**: Old info becomes summary, not lost
- ✅ **Groq-native**: Uses your existing LLM for summarization

## Result
A memory system that's easier to manage but still intelligent about preserving conversation history.RetryClaude does not have the ability to run the code it generates yet.Claude can make mistakes. Please double-check responses.