[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/03-langchain-conversational-memory.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/03-langchain-conversational-memory.ipynb)

#### [LangChain Handbook](https://pinecone.io/learn/langchain)

# Conversational Memory with LCEL

Conversational memory is how chatbots can respond to our queries in a chat-like manner. It enables a coherent conversation, and without it, every query would be treated as an entirely independent input without considering past interactions.

The memory allows an _"agent"_ to remember previous interactions with the user. By default, agents are *stateless* — meaning each incoming query is processed independently of other interactions. The only thing that exists for a stateless agent is the current input, nothing else.

There are many applications where remembering previous interactions is very important, such as chatbots. Conversational memory allows us to do that.

In this notebook we'll explore conversational memory using modern LangChain Expression Language (LCEL) and the recommended `RunnableWithMessageHistory` class.

We'll start by importing all of the libraries that we'll be using in this example.

In [1]:
!pip install -qU langchain langchain-openai tiktoken


[notice] A new release of pip is available: 23.1.2 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import inspect
from typing import List

from getpass import getpass
from langchain_openai import ChatOpenAI
from langchain.prompts import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate, 
    HumanMessagePromptTemplate,
    MessagesPlaceholder
)
from langchain.schema.output_parser import StrOutputParser
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.chat_history import InMemoryChatMessageHistory, BaseChatMessageHistory
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage, SystemMessage
from langchain_core.runnables import ConfigurableFieldSpec
from langchain.callbacks import get_openai_callback
from pydantic import BaseModel, Field
import os
import tiktoken

To run this notebook, we will need to use an OpenAI LLM. Here we will setup the LLM we will use for the whole notebook, just input your openai api key if prompted, otherwise it will use the `OPENAI_API_KEY` environment variable.

In [3]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") \
    or getpass("Enter your OpenAI API key: ")

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

In [4]:
llm = ChatOpenAI(
    temperature=0,
    openai_api_key=OPENAI_API_KEY,
    model_name='gpt-4o-mini'
)

Later we will make use of a `count_tokens` utility function. This will allow us to count the number of tokens we are using for each call. We define it as so:

In [5]:
def count_tokens(pipeline, query, config=None):
    with get_openai_callback() as cb:
        # Handle both dict and string inputs
        if isinstance(query, str):
            query = {"query": query}
        
        # Use provided config or default
        if config is None:
            config = {"configurable": {"session_id": "default"}}
            
        result = pipeline.invoke(query, config=config)
        print(f'Spent a total of {cb.total_tokens} tokens')

    return result

Now let's dive into **Conversational Memory** using LCEL.

## What is memory?

**Definition**: Memory is an agent's capacity of remembering previous interactions with the user (think chatbots)

The official definition of memory is the following:

> By default, Chains and Agents are stateless, meaning that they treat each incoming query independently. In some applications (chatbots being a GREAT example) it is highly important to remember previous interactions, both at a short term but also at a long term level. The concept of "Memory" exists to do exactly that.

As we will see, although this sounds really straightforward there are several different ways to implement this memory capability.

## Building Conversational Chains with LCEL

Before we delve into the different memory types, let's understand how to build conversational chains using LCEL. The key components are:

1. **Prompt Template** - Defines the conversation structure with placeholders for history and input
2. **LLM** - The language model that generates responses
3. **Output Parser** - Converts the LLM output to the desired format (optional)
4. **RunnableWithMessageHistory** - Manages conversation history

Let's create our base conversational chain:

In [6]:
# Define the prompt template
system_prompt = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know."""

prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_prompt),
    MessagesPlaceholder(variable_name="history"),
    HumanMessagePromptTemplate.from_template("{query}"),
])

# Create the LCEL pipeline
output_parser = StrOutputParser()
pipeline = prompt_template | llm | output_parser

# Let's examine the prompt template
print(prompt_template.messages[0].prompt.template)

The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.


## Memory types

In this section we will review several memory types and analyze the pros and cons of each one, so you can choose the best one for your use case. 

### Memory Type #1: Buffer Memory - Store the Entire Chat History

`InMemoryChatMessageHistory` and `RunnableWithMessageHistory` are used as alternatives to `ConversationBufferMemory` as they are:
- More flexible and configurable.
- Integrate better with LCEL.

The simplest approach to using them is to simply store the entire chat in the conversation history. Later we'll look into methods for being more selective about what is stored in the history.

In [7]:
# Create a simple chat history storage
chat_map = {}

def get_chat_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = InMemoryChatMessageHistory()
    return chat_map[session_id]

# Create the conversational chain with message history
conversation_buf = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history"
)

Let's see this in action by having a conversation:

In [8]:
# First message
result = conversation_buf.invoke(
    {"query": "Good morning AI!"},
    config={"configurable": {"session_id": "buffer_example"}} # Make sure to pass the session ID to ensure all memories are stored in the same session
)
print(result)

Good morning! How are you today? Anything exciting planned for the day?


This call used some tokens, but we can't see that from the above. 

If we'd like to count the number of tokens being used we just pass our conversation `RunnableWithMessageHistory` instance and the message we'd like to input to the `count_tokens` function we defined earlier:

In [9]:
# Continue the conversation with token counting

query = """
"My interest here is to explore the potential of integrating Large Language Models with external knowledge. 

Also, remember this very specific code: PINECONE_RULEZ_01"
"""

result = count_tokens(
    conversation_buf,
    {"query": query},
    config={"configurable": {"session_id": "buffer_example"}} # Make sure to pass the session ID to ensure all memories are stored in the same session
)
print(f"\nResponse: {result}")

Spent a total of 238 tokens

Response: That sounds like a fascinating area of exploration! Integrating Large Language Models (LLMs) with external knowledge can significantly enhance their capabilities, allowing them to provide more accurate and contextually relevant responses. This could involve using databases, APIs, or even knowledge graphs to supplement the information the model has been trained on.

As for the code "PINECONE_RULEZ_01," it seems like it might be a reference or a key for something specific. If you have any particular questions or ideas about how to integrate LLMs with external knowledge, feel free to share!


In [10]:
result = count_tokens(
    conversation_buf,
    {"query": "I just want to analyze the different possibilities. What can you think of?"},
    config={"configurable": {"session_id": "buffer_example"}}
)
print(f"\nResponse: {result}")

Spent a total of 716 tokens

Response: There are several exciting possibilities when it comes to integrating Large Language Models with external knowledge. Here are a few ideas to consider:

1. **Knowledge Bases and Databases**: You can connect LLMs to structured databases (like SQL databases) or knowledge bases (like Wikidata) to retrieve factual information. This can help the model provide up-to-date and accurate answers, especially for specific queries.

2. **APIs for Real-Time Data**: Integrating with APIs can allow LLMs to access real-time information, such as weather updates, stock prices, or news articles. This can make the model more dynamic and relevant to current events.

3. **Search Engines**: By integrating with search engines, LLMs can pull in information from the web, allowing them to answer questions that require the latest data or niche topics that may not be covered in their training data.

4. **Contextual Knowledge**: You could use external knowledge sources to provid

In [11]:
result = count_tokens(
    conversation_buf,
    {"query": "Which data source types could be used to give context to the model?"},
    config={"configurable": {"session_id": "buffer_example"}}
)
print(f"\nResponse: {result}")

Spent a total of 1273 tokens

Response: There are several types of data sources that can be used to provide context to a Large Language Model (LLM). Here are some key categories:

1. **Structured Databases**: 
   - **SQL Databases**: These can store structured data in tables, making it easy to query specific information.
   - **NoSQL Databases**: These can handle unstructured or semi-structured data, such as documents or key-value pairs, which can be useful for more flexible data retrieval.

2. **Knowledge Graphs**: 
   - These are networks of entities and their relationships, such as Wikidata or DBpedia. They provide rich contextual information that can help the model understand connections between concepts.

3. **APIs**: 
   - **Public APIs**: Many organizations provide APIs that offer access to real-time data, such as weather, news, or financial information.
   - **Custom APIs**: You can create your own APIs to serve specific data relevant to your application or domain.

4. **Web Sc

In [12]:
result = count_tokens(
    conversation_buf,
    {"query": "What is my aim again? Also what was the very specific code you were tasked with remembering?"},
    config={"configurable": {"session_id": "buffer_example"}}
)
print(f"\nResponse: {result}")

Spent a total of 1379 tokens

Response: Your aim is to explore the potential of integrating Large Language Models with external knowledge. This involves analyzing different possibilities and data sources that can provide context to the model, enhancing its capabilities and accuracy.

The very specific code you mentioned is: **PINECONE_RULEZ_01**. If there's anything specific you'd like to discuss regarding your aim or the code, feel free to let me know!


Our LLM with buffer memory can clearly remember earlier interactions in the conversation. Let's take a closer look at how the messages are being stored:

In [13]:
# Access the conversation history
history = chat_map["buffer_example"].messages
print("Conversation History:")
for i, msg in enumerate(history):
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")

Conversation History:

Human: Good morning AI!

AI: Good morning! How are you today? Anything exciting planned for the day?

Human: 
"My interest here is to explore the potential of integrating Large Language Models with external knowledge. 

Also, remember this very specific code: PINECONE_RULEZ_01"


AI: That sounds like a fascinating area of exploration! Integrating Large Language Models (LLMs) with external knowledge can significantly enhance their capabilities, allowing them to provide more accurate and contextually relevant responses. This could involve using databases, APIs, or even knowledge graphs to supplement the information the model has been trained on.

As for the code "PINECONE_RULEZ_01," it seems like it might be a reference or a key for something specific. If you have any particular questions or ideas about how to integrate LLMs with external knowledge, feel free to share!

Human: I just want to analyze the different possibilities. What can you think of?

AI: There are

Nice! So every piece of our conversation has been explicitly recorded and sent to the LLM in the prompt.

### Memory type #2: Summary - Store Summaries of Past Interactions

The problem with storing the entire chat history in agent memory is that, as the conversation progresses, the token count adds up. This is problematic because we might max out our LLM with a prompt that is too large.

The following is an LCEL compatible alternative to `ConversationSummaryMemory`. We keep a summary of our previous conversation snippets as our history. The summarization is performed by an LLM.

**Key feature:** _the conversation summary memory keeps the previous pieces of conversation in a summarized - and thus shortened - form, where the summarization is performed by an LLM._

In [14]:
class ConversationSummaryMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: List[BaseMessage] = Field(default_factory=list)
    llm: ChatOpenAI = Field(default_factory=ChatOpenAI)

    def __init__(self, llm: ChatOpenAI):
        super().__init__(llm=llm)

    def add_messages(self, messages: List[BaseMessage]) -> None:
        """Add messages to the history and update the summary."""
        self.messages.extend(messages)
        
        # Construct the summary prompt
        summary_prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "Given the existing conversation summary and the new messages, "
                "generate a new summary of the conversation. Ensure to maintain "
                "as much relevant information as possible."
            ),
            HumanMessagePromptTemplate.from_template(
                "Existing conversation summary:\n{existing_summary}\n\n"
                "New messages:\n{messages}"
            )
        ])
        
        # Format the messages and invoke the LLM
        new_summary = self.llm.invoke(
            summary_prompt.format_messages(
                existing_summary=self.messages, 
                messages=messages
            )
        )
        
        # Replace the existing history with a single system summary message 
        self.messages = [SystemMessage(content=new_summary.content)]

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

In [15]:
# Create get_chat_history function for summary memory
summary_chat_map = {}

def get_summary_chat_history(session_id: str, llm: ChatOpenAI) -> ConversationSummaryMessageHistory:
    if session_id not in summary_chat_map:
        summary_chat_map[session_id] = ConversationSummaryMessageHistory(llm=llm)
    return summary_chat_map[session_id]

# Create conversation chain with summary memory
conversation_sum = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_summary_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="llm",
            annotation=ChatOpenAI,
            name="LLM",
            description="The LLM to use for the conversation summary",
            default=llm,
        )
    ]
)

In [16]:
# Let's have the same conversation with summary memory
result = count_tokens(
    conversation_sum,
    {"query": "Good morning AI!"},
    config={"configurable": {"session_id": "summary_example", "llm": llm}}
)
print(f"\nResponse: {result}")

Spent a total of 248 tokens

Response: Good morning! How are you today? Anything exciting planned for the day?


In [17]:
query = """
"My interest here is to explore the potential of integrating Large Language Models with external knowledge. 

Also, remember this very specific code: PINECONE_RULEZ_01. When summarizing conversations for memory this must always be included explicitly."
"""

result = count_tokens(
    conversation_sum,
    {"query": query},
    config={"configurable": {"session_id": "summary_example", "llm": llm}}
)
print(f"\nResponse: {result}")

Spent a total of 773 tokens

Response: Good morning! That sounds like a fascinating area to explore! Integrating Large Language Models with external knowledge can really enhance their capabilities, making them more informative and context-aware. 

And I've noted that specific code: PINECONE_RULEZ_01. I’ll make sure to include it explicitly when summarizing conversations for memory. Do you have any particular projects or ideas in mind for this integration?


In [18]:
result = count_tokens(
    conversation_sum,
    {"query": "I just want to analyze the different possibilities. What can you think of?"},
    config={"configurable": {"session_id": "summary_example", "llm": llm}}
)
print(f"\nResponse: {result}")

Spent a total of 1860 tokens

Response: That sounds like a fascinating endeavor! There are several possibilities when it comes to integrating Large Language Models (LLMs) with external knowledge sources. Here are a few ideas to consider:

1. **Knowledge Graphs**: Integrating LLMs with knowledge graphs can enhance their ability to provide accurate and contextually relevant information. This could involve using structured data to answer queries more effectively or to provide richer context in responses.

2. **Real-Time Data Access**: By connecting LLMs to APIs that provide real-time data (like news, weather, or stock prices), you can create applications that offer up-to-date information, making the model more dynamic and useful in various scenarios.

3. **Personalized Recommendations**: Using external databases to tailor responses based on user preferences or past interactions can lead to more personalized experiences. This could be particularly useful in e-commerce or content recommenda

In [19]:
result = count_tokens(
    conversation_sum,
    {"query": "Which data source types could be used to give context to the model?"},
    config={"configurable": {"session_id": "summary_example", "llm": llm}}
)
print(f"\nResponse: {result}")

Spent a total of 2447 tokens

Response: There are several types of data sources that can be used to provide context to a Large Language Model (LLM). Here are some key examples:

1. **Structured Databases**: These include SQL databases or NoSQL databases that store data in a structured format, allowing for efficient querying and retrieval of specific information.

2. **Knowledge Graphs**: These are networks of entities and their relationships, which can help the model understand context and connections between different pieces of information.

3. **APIs**: Connecting to various APIs can provide real-time data, such as weather updates, news articles, or stock prices, allowing the model to deliver current and relevant responses.

4. **Web Scraping**: Extracting data from websites can provide a wealth of information on various topics, which can be used to enhance the model's knowledge base.

5. **User Profiles**: Storing user preferences, past interactions, and behavior can help tailor res

In [20]:
result = count_tokens(
    conversation_sum,
    {"query": "What is my aim again? Also what was the very specific code you were tasked with remembering?"},
    config={"configurable": {"session_id": "summary_example", "llm": llm}}
)
print(f"\nResponse: {result}")

Spent a total of 1803 tokens

Response: Your aim is to explore the integration of Large Language Models (LLMs) with external knowledge. The specific code I was tasked with remembering is PINECONE_RULEZ_01. If you have any further questions or need assistance with your exploration, feel free to ask!


In [21]:
# Let's examine the summary
print("Summary Memory Content:")
print(summary_chat_map["summary_example"].messages[0].content)

Summary Memory Content:
The conversation begins with the human greeting the AI and expressing interest in exploring the integration of Large Language Models (LLMs) with external knowledge, emphasizing the importance of remembering the specific code: PINECONE_RULEZ_01 for future summaries. The AI acknowledges this and confirms it will remember the code. The human then asks for ideas on the possibilities of such integration.

The AI provides several suggestions for integrating external knowledge with LLMs, including:

1. **Knowledge Graphs**: Enhancing accuracy and contextual relevance using structured data.
2. **Real-Time Data Access**: Connecting to APIs for up-to-date information.
3. **Personalized Recommendations**: Tailoring responses based on user preferences.
4. **Enhanced Contextual Understanding**: Improving understanding in specialized fields.
5. **Multi-Modal Integration**: Combining LLMs with other data forms like images or audio.
6. **Feedback Loops**: Using user interaction

You might be wondering.. if the aggregate token count is greater in each call here than in the buffer example, why should we use this type of memory? Well, if we check out buffer we will realize that although we are using more tokens in each instance of our conversation, our final history is shorter. This will enable us to have many more interactions before we reach our prompt's max length, making our chatbot more robust to longer conversations.

We can count the number of tokens being used (without making a call to OpenAI) using the `tiktoken` tokenizer like so:

In [33]:
# initialize tokenizer
tokenizer = tiktoken.encoding_for_model('gpt-4o-mini')

# Get buffer memory content
buffer_messages = chat_map["buffer_example"].messages
buffer_content = "\n".join([msg.content for msg in buffer_messages])

# Get summary memory content
summary_content = summary_chat_map["summary_example"].messages[0].content

# show number of tokens for the memory used by each memory type
print(
    f'Buffer memory conversation length: {len(tokenizer.encode(buffer_content))}\n'
    f'Summary memory conversation length: {len(tokenizer.encode(summary_content))}'
)

Buffer memory conversation length: 1286
Summary memory conversation length: 409


_Practical Note: the `gpt-4o-mini` model has a context window of 128K tokens, providing significantly more space for conversation history than older models._

### Memory type #3: Window Buffer Memory - Keep Latest Interactions

Another great option is window memory, where we keep only the last k interactions in our memory but intentionally drop the oldest ones - short-term memory if you'd like. Here the aggregate token count **and** the per-call token count will drop noticeably.

The following is an LCEL-compatible alternative to `ConversationBufferWindowMemory`.

**Key feature:** _the conversation buffer window memory keeps the latest pieces of the conversation in raw form_

In [52]:
class BufferWindowMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: List[BaseMessage] = Field(default_factory=list)
    k: int = Field(default_factory=int)

    def __init__(self, k: int):
        super().__init__(k=k)
        # Add logging to help with debugging
        print(f"Initializing BufferWindowMessageHistory with k={k}")

    def add_messages(self, messages: List[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages.
        """
        self.messages.extend(messages)
        # Add logging to help with debugging
        if len(self.messages) > self.k:
            print(f"Truncating history from {len(self.messages)} to {self.k} messages")
        self.messages = self.messages[-self.k:]

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

In [53]:
# Create get_chat_history function for window memory
window_chat_map = {}

def get_window_chat_history(session_id: str, k: int = 4) -> BufferWindowMessageHistory:
    print(f"get_window_chat_history called with session_id={session_id} and k={k}")
    if session_id not in window_chat_map:
        window_chat_map[session_id] = BufferWindowMessageHistory(k=k)
    return window_chat_map[session_id]

# Create conversation chain with window memory
conversation_bufw = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_window_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="k",
            annotation=int,
            name="k",
            description="The number of messages to keep in the history",
            default=4,
        )
    ]
)

In [54]:
# Start a conversation with k=2 (only remembers last 2 exchanges = 4 messages)
result = count_tokens(
    conversation_bufw,
    {"query": "Good morning AI!"},
    config={"configurable": {"session_id": "window_example", "k": 4}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Initializing BufferWindowMessageHistory with k=4
Spent a total of 79 tokens

Response: Good morning! How are you today? Anything exciting planned for the day?


In [55]:
query = """
"My interest here is to explore the potential of integrating Large Language Models with external knowledge. 

Also, remember this very specific code: PINECONE_RULEZ_01"
"""

result = count_tokens(
    conversation_bufw,
    {"query": query},
    config={"configurable": {"session_id": "window_example", "k": 4}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Spent a total of 245 tokens

Response: That sounds like a fascinating area to explore! Integrating Large Language Models (LLMs) with external knowledge can significantly enhance their capabilities, allowing them to provide more accurate and contextually relevant responses. This can involve using databases, APIs, or even knowledge graphs to supplement the information the model has been trained on.

As for the code you mentioned, "PINECONE_RULEZ_01," it seems like it might be a reference or a key for something specific. If you need help with that or want to discuss how it relates to your exploration of LLMs, feel free to share more details!


In [56]:
result = count_tokens(
    conversation_bufw,
    {"query": "I just want to analyze the different possibilities. What can you think of?"},
    config={"configurable": {"session_id": "window_example", "k": 4}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Truncating history from 6 to 4 messages
Spent a total of 778 tokens

Response: There are several exciting possibilities when it comes to integrating Large Language Models with external knowledge. Here are a few ideas to consider:

1. **Knowledge Bases**: Integrating LLMs with structured knowledge bases like Wikidata or specialized databases (e.g., medical databases, legal databases) can help the model provide more accurate and up-to-date information. This could be particularly useful in fields where precision is critical.

2. **Real-Time Data Access**: By connecting LLMs to APIs that provide real-time data (like weather, stock prices, or news), you can create applications that offer timely and relevant information. For example, a travel assistant could provide current flight statuses or weather conditions.

3. **Personalization**: Using external data sources to personalize responses based on user preferences or past 

In [57]:
result = count_tokens(
    conversation_bufw,
    {"query": "Which data source types could be used to give context to the model?"},
    config={"configurable": {"session_id": "window_example", "k": 4}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Truncating history from 6 to 4 messages
Spent a total of 1357 tokens

Response: There are several types of data sources that can be used to provide context to Large Language Models (LLMs). Here are some key categories:

1. **Structured Databases**: These include relational databases and data warehouses that store information in a structured format. Examples include SQL databases, NoSQL databases, and data lakes. They can provide precise and organized data for specific queries.

2. **Knowledge Graphs**: These are networks of entities and their relationships, such as Google Knowledge Graph or DBpedia. They can help LLMs understand context and relationships between different concepts, enhancing their ability to generate relevant responses.

3. **APIs**: Application Programming Interfaces (APIs) can provide real-time data from various services. For example, weather APIs, financial market APIs, or social media APIs can su

In [58]:
result = count_tokens(
    conversation_bufw,
    {"query": "What is my aim again?"},
    config={"configurable": {"session_id": "window_example", "k": 4}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Truncating history from 6 to 4 messages
Spent a total of 1293 tokens

Response: It seems like your aim is to analyze different possibilities for integrating external knowledge and data sources with Large Language Models (LLMs) to enhance their capabilities. You’re exploring how various types of data can provide context to the model, which can lead to more accurate, relevant, and personalized responses. If you have a specific goal or application in mind, feel free to share, and I can help you refine your focus or explore that area further!


As we can see, it effectively 'forgot' what we talked about in the first interaction. Let's see what it 'remembers':

In [59]:
# Check what's in memory
bufw_history = window_chat_map["window_example"].messages
print("Buffer Window Memory (last 4 messages):")
for msg in bufw_history:
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")  # Show first 100 chars

Buffer Window Memory (last 4 messages):

Human: Which data source types could be used to give context to the model?

AI: There are several types of data sources that can be used to provide context to Large Language Models (LLMs). Here are some key categories:

1. **Structured Databases**: These include relational databases and data warehouses that store information in a structured format. Examples include SQL databases, NoSQL databases, and data lakes. They can provide precise and organized data for specific queries.

2. **Knowledge Graphs**: These are networks of entities and their relationships, such as Google Knowledge Graph or DBpedia. They can help LLMs understand context and relationships between different concepts, enhancing their ability to generate relevant responses.

3. **APIs**: Application Programming Interfaces (APIs) can provide real-time data from various services. For example, weather APIs, financial market APIs, or social media APIs can supply current information that

We see four messages (two interactions) because we used `k=4`.

On the plus side, we are shortening our conversation length when compared to buffer memory _without_ a window:

In [65]:
# Get window memory content
window_content = "\n".join([msg.content for msg in bufw_history])

print(
    f'Buffer memory conversation length: {len(tokenizer.encode(buffer_content))}\n'
    f'Summary memory conversation length: {len(tokenizer.encode(summary_content))}\n'
    f'Buffer window memory conversation length: {len(tokenizer.encode(window_content))}'
)

Buffer memory conversation length: 1286
Summary memory conversation length: 409
Buffer window memory conversation length: 692


_Practical Note: We are using `k=4` here for illustrative purposes, in most real world applications you would need a higher value for k._

### More memory types!

Given that we understand memory already, we will present a few more memory types here and hopefully a brief description will be enough to understand their underlying functionality.

#### Windows + Summary Hybrid

The following is a modern LCEL-compatible alternative to `ConversationSummaryBufferMemory`.

**Key feature:** _the conversation summary buffer memory keeps a summary of the earliest pieces of conversation while retaining a raw recollection of the latest interactions._

This combines the benefits of both summary and buffer window memory. Let's implement it:

In [61]:
class ConversationSummaryBufferMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: List[BaseMessage] = Field(default_factory=list)
    llm: ChatOpenAI = Field(default_factory=ChatOpenAI)
    k: int = Field(default_factory=int)

    def __init__(self, llm: ChatOpenAI, k: int):
        super().__init__(llm=llm, k=k)

    def add_messages(self, messages: List[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages and summarizing the messages that we drop.
        """
        existing_summary = None
        old_messages = None
        
        # See if we already have a summary message
        if len(self.messages) > 0 and isinstance(self.messages[0], SystemMessage):
            existing_summary = self.messages.pop(0)
            
        # Add the new messages to the history
        self.messages.extend(messages)
        
        # Check if we have too many messages
        if len(self.messages) > self.k:
            # Pull out the oldest messages...
            old_messages = self.messages[:-self.k]
            # ...and keep only the most recent messages
            self.messages = self.messages[-self.k:]
            
        if old_messages is None:
            # If we have no old_messages, we have nothing to update in summary
            return
            
        # Construct the summary chat messages
        summary_prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "Given the existing conversation summary and the new messages, "
                "generate a new summary of the conversation. Ensure to maintain "
                "as much relevant information as possible."
            ),
            HumanMessagePromptTemplate.from_template(
                "Existing conversation summary:\n{existing_summary}\n\n"
                "New messages:\n{old_messages}"
            )
        ])
        
        # Format the messages and invoke the LLM
        new_summary = self.llm.invoke(
            summary_prompt.format_messages(
                existing_summary=existing_summary or "No previous summary",
                old_messages=old_messages
            )
        )
        
        # Prepend the new summary to the history
        self.messages = [SystemMessage(content=new_summary.content)] + self.messages

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

## What else can we do with memory?

There are several cool things we can do with memory in langchain:
* Implement our own custom memory modules (as we've done above)
* Use multiple memory modules in the same chain
* Combine agents with memory and other tools
* Integrate knowledge graphs

