# # Migrating off ConversationBufferWindowMemory or ConversationTokenBufferMemory

Follow this guide if you're trying to migrate off one of the old memory classes listed below:


| Memory Type                          | Description                                                                                                                                          |
|---------------------------------------|-----------------------------------------------------------------------
| `ConversationBufferWindowMemory`      | Keeps the last `n` turns of the conversation. Drops the oldest turn when the buffer is full.                                                          |
| `ConversationTokenBufferMemory`       | Keeps only the most recent messages in the conversation under the constraint that the total number of tokens in the conversation does not exceed a certain limit. |





In [None]:
</details>

## Implementing Conversation History Processing

Each of the following memory types applies specific logic to handle the conversation history:

| Memory Type                          | Description                                                                                                                                          |
|---------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
| `ConversationBufferWindowMemory`      | Keeps the last `n` turns of the conversation. Drops the oldest turn when the buffer is full.                                                          |
| `ConversationTokenBufferMemory`       | Keeps only the most recent messages in the conversation under the constraint that the total number of tokens in the conversation does not exceed a certain limit. |
| `ConversationSummaryMemory`           | Continually summarizes the conversation history. The summary is updated after each conversation turn. The abstraction returns the summary of the conversation history. |
| `ConversationSummaryBufferMemory`     | Provides a running summary of the conversation together with the most recent messages in the conversation under the constraint that the total number of tokens in the conversation does not exceed a certain limit. |

The general approach involves writing the necessary logic for processing conversation history and integrating it at the correct point.

We’ll start by building a simple processor using LangChain's built-in [trim_messages](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.utils.trim_messages.html) function, and then demonstrate how to integrate it into your application.

You can later replace this basic setup with more advanced logic tailored to your specific needs.


:::important

We’ll begin by exploring a straightforward method that involves applying processing logic to the entire conversation history.

While this approach is easy to test and implement, it has a downside: as the conversation grows, s
o does the latency, since the logic is re-applied to all previous exchanges at each turn.

More advanced strategies focus on incrementally updating the conversation history to avoid redundant processing.

For instance, the langgraph [how-to guide on summarization](https://langchain-ai.github.io/langgraph/how-tos/memory/add-summary-conversation-history/) demonstrates
how to maintain a running summary of the conversation while discarding older messages, ensuring they aren't re-processed during later turns.
:::


### ConversationBufferWindowMemory, ConversationTokenBufferMemory

<details open>

## Set up

In [1]:
%%capture --no-stderr
%pip install --upgrade --quiet langchain-openai langchain langchain-community

In [2]:
import os
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass()

</details>

## Implementing Conversation History Processing

Each of the following memory types applies specific logic to handle the conversation history:

| Memory Type                          | Description                                                                                                                                          |
|---------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
| `ConversationBufferWindowMemory`      | Keeps the last `n` turns of the conversation. Drops the oldest turn when the buffer is full.                                                          |
| `ConversationTokenBufferMemory`       | Keeps only the most recent messages in the conversation under the constraint that the total number of tokens in the conversation does not exceed a certain limit. |
| `ConversationSummaryMemory`           | Continually summarizes the conversation history. The summary is updated after each conversation turn. The abstraction returns the summary of the conversation history. |
| `ConversationSummaryBufferMemory`     | Provides a running summary of the conversation together with the most recent messages in the conversation under the constraint that the total number of tokens in the conversation does not exceed a certain limit. |

The general approach involves writing the necessary logic for processing conversation history and integrating it at the correct point.

We’ll start by building a simple processor using LangChain's built-in [trim_messages](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.utils.trim_messages.html) function, and then demonstrate how to integrate it into your application.

You can later replace this basic setup with more advanced logic tailored to your specific needs.


:::important

We’ll begin by exploring a straightforward method that involves applying processing logic to the entire conversation history.

While this approach is easy to test and implement, it has a downside: as the conversation grows, s
o does the latency, since the logic is re-applied to all previous exchanges at each turn.

More advanced strategies focus on incrementally updating the conversation history to avoid redundant processing.

For instance, the langgraph [how-to guide on summarization](https://langchain-ai.github.io/langgraph/how-tos/memory/add-summary-conversation-history/) demonstrates
how to maintain a running summary of the conversation while discarding older messages, ensuring they aren't re-processed during later turns.
:::


### ConversationBufferWindowMemory, ConversationTokenBufferMemory

<details open>

In [12]:
from langchain_core.messages import (
    AIMessage,
    BaseMessage,
    HumanMessage,
    SystemMessage,
    trim_messages,
)


full_message_history = [
    SystemMessage("you're a good assistant, you always respond with a joke."),
    HumanMessage("i wonder why it's called langchain"),
    AIMessage(
        'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
    ),
    HumanMessage("and who is harrison chasing anyways"),
    AIMessage(
        "Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
    ),
    HumanMessage("why is 42 always the answer?"),
    AIMessage(
        "Because it’s the only number that’s constantly right, even when it doesn’t add up!"
    ),
    HumanMessage("What did the cow say?"),
]


def message_processor(messages: list[BaseMessage]) -> list[BaseMessage]:
    """A sample message processor that:

    1. Keeps the system message
    2. Keeps up to max number of messages
    3. Make sure that the last message is a HumanMessage

    You will likely want to instead count based on tokens,
    and/or increase the number of messages.

    Please see the API reference for trim_messages for more details.

    https://python.langchain.com/api_reference/core/messages/langchain_core.messages.utils.trim_messages.html
    """
    return trim_messages(
        messages,
        token_counter=len,  # <-- Will just count the number of messages rather than tokens
        max_tokens=5,  # <-- allow up to 5 messages.
        strategy="last",
        include_system=True,
        allow_partial=False,
    )


message_processor(full_message_history)

[SystemMessage(content="you're a good assistant, you always respond with a joke.", additional_kwargs={}, response_metadata={}),
 AIMessage(content="Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='why is 42 always the answer?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Because it’s the only number that’s constantly right, even when it doesn’t add up!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='What did the cow say?', additional_kwargs={}, response_metadata={})]

</details>

## LCEL: Add a pre-processor in front of the chat model

The simplest way to add complex conversation management is by introducing a pre-processing step in front of the chat model and pass the full conversation history to the pre-processing step.

This approach is conceptually simple and will work in many situations; for example, if using a [RunnableWithMessageHistory](/docs/how_to/message_history/) instead of wrapping the chat model, wrap the chat model with the pre-processor.

The obvious downside of this approach is that latency starts to increase as the conversation history grows because of two reasons:

1. As the conversation gets longer, more data may need to be fetched from whatever store your'e using to store the conversation history (if not storing it in memory).
2. The pre-processing logic will end up doing a lot of redundant computation, repeating computation from previous steps of the conversation.

:::caution

If you're using tools, remember to bind the tools to the model before adding a pre-processing step to it!

:::

<details open>

In [15]:
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

model = ChatOpenAI()


@tool
def what_did_the_cow_say() -> str:
    """Check to see what the cow said."""
    return "foo"

model_with_tools = model.bind_tools([what_did_the_cow_say])

# highlight-next-line
model_with_preprocessor = message_processor | model_with_tools

# full_message_history in the previous code block.
# We pass it explicity to the model_with_preprocesor for illustrative purposes.
# If you're using `RunnableWithMessageHistory` the history will be automatically
# read from the source the you configure.
model_with_preprocessor.invoke(full_message_history)

AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_g7l7AFBHnfInA9ps2DpnkrsU', 'function': {'arguments': '{}', 'name': 'what_did_the_cow_say'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 126, 'total_tokens': 142, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-3df002ca-da48-4b91-ad7f-c8e3ad5c7550-0', tool_calls=[{'name': 'what_did_the_cow_say', 'args': {}, 'id': 'call_g7l7AFBHnfInA9ps2DpnkrsU', 'type': 'tool_call'}], usage_metadata={'input_tokens': 126, 'output_tokens': 16, 'total_tokens': 142})

</details>

If you need to implement more efficient logic and want to use `RunnableWithMessageHistory` for now the way to achieve this
is to subclass from [BaseChatMessageHistory](https://api.python.langchain.com/en/latest/chat_history/langchain_core.chat_history.BaseChatMessageHistory.html) and
define appropriate logic for `add_messages` (that doesn't simply append the history, but instead re-writes it).

Unless you have a good reason to implement this solution, you should instead use LangGraph,

## LangGraph

### Agent Executor with a pre-built agent

If you're migrating off a pre-built langchain agent that uses memory.

You can create a pre-built agent using: [create_react_agent](https://langchain-ai.github.io/langgraph/reference/prebuilt/#create_react_agent).

To add memory pre-processing to the agent, you can do the following:

```python

...

# highlight-start
def state_modifier(state) -> list[BaseMessage]:
    """Given the agent state, return a list of messages for the chat model."""
    # We're using the message processor defined above.
    return message_processor(state['messages'])
# highlight-end    

app = create_react_agent(
    model,
    tools=[get_user_age], 
    checkpointer=memory,
    # highlight-next-line
    state_modifier=state_modifier
)

...

```

At each turn of the conversation, 

In [14]:
import uuid

from langchain_core.messages import HumanMessage
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import create_react_agent


@tool
def get_user_age(name: str) -> str:
    """Use this tool to find the user's age."""
    # This is a placeholder for the actual implementation
    if "bob" in name.lower():
        return "42 years old"
    return "41 years old"


memory = MemorySaver()
model = ChatOpenAI()


# highlight-start
def state_modifier(state) -> list[BaseMessage]:
    """Given the agent state, return a list of messages for the chat model."""
    # We're using the message processor defined above.
    return message_processor(state["messages"])


# highlight-end

app = create_react_agent(
    model,
    tools=[get_user_age],
    checkpointer=memory,
    # highlight-next-line
    state_modifier=state_modifier,
)

# The thread id is a unique key that identifies
# this particular conversation.
# We'll just generate a random uuid here.
thread_id = uuid.uuid4()
config = {"configurable": {"thread_id": thread_id}}

# Tell the AI that our name is Bob, and ask it to use a tool to confirm
# that it's capable of working like an agent.
input_message = HumanMessage(content="hi! I'm bob. What is my age?")

for event in app.stream({"messages": [input_message]}, config, stream_mode="values"):
    event["messages"][-1].pretty_print()

# Confirm that the chat bot has access to previous conversation
# and can respond to the user saying that the user's name is Bob.
input_message = HumanMessage(content="do you remember my name?")

for event in app.stream({"messages": [input_message]}, config, stream_mode="values"):
    event["messages"][-1].pretty_print()


hi! I'm bob. What is my age?
Tool Calls:
  get_user_age (call_kTEpBUbRFbKE3DZolG9tFrgD)
 Call ID: call_kTEpBUbRFbKE3DZolG9tFrgD
  Args:
    name: bob
Name: get_user_age

42 years old

Bob, you are 42 years old.

do you remember my name?

Yes, your name is Bob.


### ConversationSummaryMemory / ConversationSummaryBufferMemory

It’s essential to summarize conversations efficiently to prevent growing latency as the conversation history grows.

Please follow the guide [how to add summary of the conversation history](https://langchain-ai.github.io/langgraph/how-tos/memory/add-summary-conversation-history/) to see learn how to
handle conversation summarization efficiently with LangGraph.

## Next steps

Explore persistence with LangGraph:

* [LangGraph quickstart tutorial](https://langchain-ai.github.io/langgraph/tutorials/introduction/)
* [How to add persistence ("memory") to your graph](https://langchain-ai.github.io/langgraph/how-tos/persistence/)
* [How to manage conversation history](https://langchain-ai.github.io/langgraph/how-tos/memory/manage-conversation-history/)
* [How to add summary of the conversation history](https://langchain-ai.github.io/langgraph/how-tos/memory/add-summary-conversation-history/)

Add persistence with simple LCEL (favor langgraph for more complex use cases):

* [How to add message history](/docs/how_to/message_history/)

Working with message history:

* [How to trim messages](/docs/how_to/trim_messages)
* [How to filter messages](/docs/how_to/filter_messages/)
* [How to merge message runs](/docs/how_to/merge_message_runs/)