[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/03-langchain-conversational-memory.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/03-langchain-conversational-memory.ipynb)

#### [LangChain Handbook](https://pinecone.io/learn/langchain)

# Conversational Memory

Conversational memory is how chatbots can respond to our queries in a chat-like manner. It enables a coherent conversation, and without it, every query would be treated as an entirely independent input without considering past interactions.

The memory allows a _"agent"_ to remember previous interactions with the user. By default, agents are *stateless* — meaning each incoming query is processed independently of other interactions. The only thing that exists for a stateless agent is the current input, nothing else.

There are many applications where remembering previous interactions is very important, such as chatbots. Conversational memory allows us to do that.

In this notebook we'll explore this form of memory in the context of the LangChain library.

We'll start by importing all of the libraries that we'll be using in this example.

In [36]:
!pip install -qU langchain openai tiktoken


[notice] A new release of pip is available: 23.1.2 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [37]:
import inspect

from getpass import getpass
from langchain import OpenAI
from langchain.chains import LLMChain, ConversationChain
from langchain.chains.conversation.memory import (ConversationBufferMemory, 
                                                  ConversationSummaryMemory, 
                                                  ConversationBufferWindowMemory,
                                                  ConversationKGMemory)
from langchain.callbacks import get_openai_callback
import tiktoken

To run this notebook, we will need to use an OpenAI LLM. Here we will setup the LLM we will use for the whole notebook, just input your openai api key when prompted. 

In [38]:
# OPENAI_API_KEY = getpass() # SRA_DEBUGGING: Uncomment once finished testing.
# TODO: SRA_DEBUGGING: Should we switch to env var?

In [39]:
from langchain_openai import ChatOpenAI

# Initialize with a modern model
llm = ChatOpenAI(
    model_name="gpt-3.5-turbo",  # or "gpt-4" or other available models
    temperature=0.7,
    
)

Later we will make use of a `count_tokens` utility function. This will allow us to count the number of tokens we are using for each call. We define it as so:

In [40]:
def count_tokens(chain, query):
    with get_openai_callback() as cb:
        result = chain.run(query)
        print(f'Spent a total of {cb.total_tokens} tokens')

    return result

Now let's dive into **Conversational Memory**.

## What is Memory?

Memory is an agent's capacity of remembering previous interactions with the user (think chatbots). By default, chains are stateless, meaning they treat each incoming query independently. For applications like chatbots, it's crucial to remember previous interactions.

Let's build a conversational chain using LCEL that demonstrates how memory works. We'll break it down step by step:

First, initialize a message history to store our conversations. This is a simple in-memory store that keeps track of all messages

In [41]:

from langchain.memory import ChatMessageHistory
history = ChatMessageHistory()

Create a prompt template that structures our conversation
This template has three parts:
1. A system message that sets the AI's behavior
2. A placeholder for our conversation history
3. The user's current input

In [42]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages([
    # 1. System message sets the behavior
    ("system", "You are a friendly and helpful AI assistant. Be truthful and concise."),
    # 2. MessagesPlaceholder will contain all previous messages
    MessagesPlaceholder(variable_name="chat_history"),
    # 3. The current user input
    ("user", "{input}")
])

Let's examine what our prompt template contains:

First, we see the input variables that our prompt expects:

In [58]:
print(f"prompt input variables: {prompt.input_variables}")

prompt input variables: ['chat_history', 'input']


This shows us that our prompt needs two pieces of information to work:
- `chat_history`: Where our previous conversation messages will go
- `input`: Where the user's current message will go

Next, let's look at the actual structure of our prompt:

In [59]:

# Show the first component (System Message)
print("\n1. System Message:")
print(f"Type: {type(prompt.messages[0]).__name__}")
print(f"Template: {prompt.messages[0].prompt.template}")

# Show the second component (Messages Placeholder)
print("\n2. Messages Placeholder:")
print(f"Type: {type(prompt.messages[1]).__name__}")
print(f"Variable Name: {prompt.messages[1].variable_name}")

# Show the third component (Human Message Template)
print("\n3. Human Message Template:")
print(f"Type: {type(prompt.messages[2]).__name__}")
print(f"Template: {prompt.messages[2].prompt.template}")


1. System Message:
Type: SystemMessagePromptTemplate
Template: You are a friendly and helpful AI assistant. Be truthful and concise.

2. Messages Placeholder:
Type: MessagesPlaceholder
Variable Name: chat_history

3. Human Message Template:
Type: HumanMessagePromptTemplate
Template: {input}


This output reveals the three-part structure of our conversation template:

1. A **System Message** that gives the AI its instructions and personality. This is fixed and doesn't change between conversations.

2. A **Messages Placeholder** named `chat_history` - this is where all previous messages in the conversation will be inserted. This helps the AI remember what was discussed before.

3. A **Human Message Template** that takes the current `{input}` - this is where each new user message will go.

This structure ensures that every conversation includes:
- Consistent AI behavior (via the system message)
- Full conversation context (via chat_history)
- The current user's input (via the input template)

Now let's build our chain using LCEL's composition syntax.

In [60]:
chain = (
    {
        # Extract the user's input from the incoming dictionary
        "input": lambda x: x["input"],
        # Get the current conversation history
        "chat_history": lambda x: history.messages
    }
    | prompt  # Format everything into our prompt template
    | llm     # Send to the LLM for response
)


Using LCEL's `|` operator, we clearly show the flow of data:
   - First, prepare the input data and history
   - Then, format it into our prompt
   - Finally, send it to the LLM for response

Let's use our chain!

In [46]:
# Send a message and get a response
user_input = "Hi! How are you? Also remember the number 42, it's important, or so I've been told."
print(f"User Input: {user_input}")
response = chain.invoke({"input": f"{user_input}"})
print(f"Response: {response.content}")

User Input: Hi! How are you? Also remember the number 42, it's important, or so I've been told.
Response: Hello! I'm here and ready to help. Thank you for sharing the number 42 with me. If you have any questions or need assistance, feel free to ask!


We explicitly update our history after each interaction, giving us full control over what gets remembered.


In [47]:
# Update our conversation history with both messages
history.add_user_message("Hi! How are you?")
history.add_ai_message(response.content)

Now `history` contains both messages and will be included in the next interaction

In [50]:
# Show the total number of messages stored
print(f"Number of messages in history: {len(history.messages)}")

# Show the first message (user's message)
print("\nFirst message (User):")
print(f"Type: {type(history.messages[0]).__name__}")
print(f"Content: {history.messages[0].content}")

# Show the second message (AI's response)
print("\nSecond message (AI):")
print(f"Type: {type(history.messages[1]).__name__}")
print(f"Content: {history.messages[1].content}")

Number of messages in history: 4

First message (User):
Type: HumanMessage
Content: Hi! How are you?

Second message (AI):
Type: AIMessage
Content: Hello! I'm here and ready to help. Thank you for sharing the number 42 with me. If you have any questions or need assistance, feel free to ask!


You can continue the conversation by invoking the chain again - it will automatically include the previous interaction:

In [49]:
# Continue the conversation
user_input =  "What was the number in my first message to you?"
print(f"User Input: {user_input}")
response2 = chain.invoke({"input": f"{user_input}"})
print(f"Response: {response2.content}")
history.add_user_message("What was my first message to you?")
history.add_ai_message(response2.content)

User Input: What was the number in my first message to you?
Response: The number you shared in your first message was 42. If you have any more questions or need assistance, feel free to ask!


We can now check the updated `history`.

In [55]:
# Show the total number of messages stored
print(f"Number of messages in history: {len(history.messages)}")

# Show the first message (user's message)
print("\nFirst message (User):")
print(f"Type: {type(history.messages[0]).__name__}")
print(f"Content: {history.messages[0].content}")

# Show the second message (AI's response)
print("\nSecond message (AI):")
print(f"Type: {type(history.messages[1]).__name__}")
print(f"Content: {history.messages[1].content}")

# Show the first message (user's message)
print("\nThird message (User):")
print(f"Type: {type(history.messages[2]).__name__}")
print(f"Content: {history.messages[2].content}")

# Show the second message (AI's response)
print("\nFourth message (AI):")
print(f"Type: {type(history.messages[3]).__name__}")
print(f"Content: {history.messages[3].content}")

Number of messages in history: 4

First message (User):
Type: HumanMessage
Content: Hi! How are you?

Second message (AI):
Type: AIMessage
Content: Hello! I'm here and ready to help. Thank you for sharing the number 42 with me. If you have any questions or need assistance, feel free to ask!

Third message (User):
Type: HumanMessage
Content: What was my first message to you?

Fourth message (AI):
Type: AIMessage
Content: The number you shared in your first message was 42. If you have any more questions or need assistance, feel free to ask!


## Memory types

In this section we will review several memory types and analyze the pros and cons of each one, so you can choose the best one for your use case.

### Memory in LCEL: Using RunnableWithMessageHistory

The modern approach to conversation memory in LangChain uses `RunnableWithMessageHistory` to maintain a history of the conversation. This can be used in place of the the - now deprecated - `ConversationBufferMemory` class. Both maintain history automatically, without the need for manual user updates.


**Key Features of RunnableWithMessageHistory:**
1. **Complete Message Preservation**
   - Every message is stored exactly as it was sent or received
   - No summarization, modification, or alteration of the content
   - Maintains the full context of sessions

2. **Automatic History Management**
   - New messages are automatically added to the conversation history
   - Previous messages are automatically included in the context for each new interaction
   - Each session maintains its own separate history

Think of it like a perfect secretary who:
- Records every word exactly as spoken
- Automatically keeps track of who said what and when
- Makes sure each sessions's history is available for the next interaction
- Keeps different sessions separate and organized

Let's set up our chain with memory:

In [91]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnableWithMessageHistory
from langchain.memory import ChatMessageHistory

# Create a store to maintain histories
message_histories = {}

def get_session_history(session_id: str):
    if session_id not in message_histories:
        message_histories[session_id] = ChatMessageHistory()
    return message_histories[session_id]

# Create our components
llm = ChatOpenAI()
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a friendly and helpful AI assistant. Be truthful and concise."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}")
])

# Create our chain with history
chain_with_history = RunnableWithMessageHistory(
    prompt | llm,
    get_session_history,  # Use our store-based getter
    input_messages_key="input",
    history_messages_key="chat_history"
)

# Start a conversation session
session_id = "demo_session"

def count_tokens(chain, message):
    """Count tokens used in a conversation turn"""
    response = chain.invoke(
        {"input": message},
        config={"configurable": {"session_id": session_id}}
    )
    return response



Let's try some interactions and examine the token usage:

In [92]:
# Try some conversations
first_user_msg = "Good morning AI! Remember, Remember, the 5th of November! Gumdrops, treacle and pop!"
print(f"First User Input: {first_user_msg}")
response = count_tokens(chain_with_history, f"{first_user_msg}")
print("First Response:", response.content)

# Get the history after the first message
history = chain_with_history.get_session_history(session_id)
print("\nConversation History after first message:")
for message in history.messages:
    print(f"{message.type}: {message.content}")

# Try another message
response = count_tokens(chain_with_history, "I mentioned a date and some foods previously, what were they?")
print("\nSecond response:", response.content)



First User Input: Good morning AI! Remember, Remember, the 5th of November! Gumdrops, treacle and pop!
First Response: Good morning! It seems like you're referencing the UK traditional rhyme "Remember, Remember the 5th of November" related to Guy Fawkes Night. Have a great day!

Conversation History after first message:
human: Good morning AI! Remember, Remember, the 5th of November! Gumdrops, treacle and pop!
ai: Good morning! It seems like you're referencing the UK traditional rhyme "Remember, Remember the 5th of November" related to Guy Fawkes Night. Have a great day!

Second response: You mentioned the date "5th of November" and the foods "gumdrops, treacle, and pop."


Our LLM with message history can remember earlier interactions in the conversation. Let's examine how the conversation is being saved. We can do this by accessing the message history for our session:


In [93]:
# Get the history after the second message
history = chain_with_history.get_session_history(session_id)
print("\nConversation History after second message:")
for message in history.messages:
    print(f"{message.type}: {message.content}")


Conversation History after second message:
human: Good morning AI! Remember, Remember, the 5th of November! Gumdrops, treacle and pop!
ai: Good morning! It seems like you're referencing the UK traditional rhyme "Remember, Remember the 5th of November" related to Guy Fawkes Night. Have a great day!
human: I mentioned a date and some foods previously, what were they?
ai: You mentioned the date "5th of November" and the foods "gumdrops, treacle, and pop."


Nice! So every piece of our conversation is being explicitly recorded and made available to the LLM through the `MessagesPlaceholder` in our prompt.

The main differences from the old approach are:
1. Uses modern LCEL composition with the `|` operator
2. Supports multiple conversation sessions through `session_id`
3. More explicit control over how history is stored and accessed
4. Better integration with other LCEL components

### Memory type #2: Summarization Memory with LCEL

The problem with storing complete conversation history is that as the conversation progresses, the token count of our context history adds up. This is problematic because we might max out our LLM with a prompt that is too large to be processed.
Key feature: Using summarization to maintain a condensed version of the conversation history while keeping token usage under control.
Here's how we can implement this using modern LCEL:

TODO: SRA_DEBUGGING: The following is just using ChatMessageHistory again, and itsn't really a proper substitute for `ConversationSummaryMemory`. The latter doesn't seem to have a proper LCEL substitute for now. See: https://github.com/langchain-ai/langchain/discussions/25904 

In [5]:
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.memory import ChatMessageHistory
from langchain.schema import SystemMessage
import tiktoken

# Create a message history to store our conversations
history = ChatMessageHistory()

# Create our chat prompt template with a system message and history
prompt = ChatPromptTemplate.from_messages([
    SystemMessage(content="You are a helpful AI assistant."),
    MessagesPlaceholder(variable_name="chat_history"),  # Where history is injected
    ("human", "{input}")  # The latest user input
])

# Create our summarization prompt
summary_prompt = ChatPromptTemplate.from_messages([
    ("system", """Progressively summarize the lines of conversation provided, adding onto the previous summary returning a new summary.
    
    EXAMPLE
    Current summary: The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good.
    
    New lines of conversation:
    Human: Why do you think artificial intelligence is a force for good?
    AI: Because artificial intelligence will help humans reach their full potential.
    
    New summary: The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full potential.
    END OF EXAMPLE
    
    Current summary: {current_summary}
    
    New lines of conversation:
    {new_lines}
    
    New summary:""")
])

# Create our LLM
# Initialize with a modern model
llm = ChatOpenAI(
    model_name="gpt-3.5-turbo",  # or "gpt-4" or other available models
    temperature=0,
    
)

# Function to summarize new messages
def summarize_messages(current_summary: str, new_messages: list) -> str:
    # Format new messages into text
    new_lines = "\n".join([
        f"{'Human' if msg.type == 'human' else 'AI'}: {msg.content}" 
        for msg in new_messages
    ])
    
    # Get new summary from LLM
    chain = summary_prompt | llm
    result = chain.invoke({
        "current_summary": current_summary,
        "new_lines": new_lines
    })
    return result.content

# Create our conversation chain
chain = (
    prompt 
    | llm
)

# Helper function to count tokens
def count_tokens(text: str) -> int:
    tokenizer = tiktoken.encoding_for_model('gpt-3.5-turbo')
    return len(tokenizer.encode(text))

# Example usage:
summary = ""  # Start with empty summary
messages_buffer = []  # Buffer for recent messages

def chat(user_input: str) -> str:
    global summary, messages_buffer
    
    # Add user message to history and buffer
    history.add_user_message(user_input)
    messages_buffer.append(history.messages[-1])
    
    # Get AI response
    response = chain.invoke({
        "input": user_input,
        "chat_history": history.messages
    })
    
    # Add AI response to history and buffer
    history.add_ai_message(response.content)
    messages_buffer.append(history.messages[-1])
    
    # If buffer gets too large, summarize older messages
    if len(messages_buffer) > 4:  # Adjust this number as needed
        summary = summarize_messages(summary, messages_buffer)
        messages_buffer = []  # Clear buffer after summarizing
        
    return response.content

This implementation offers several advantages:
1. Uses LCEL's pipe operator (|) for cleaner chain composition
2. Leverages ChatPromptTemplate for better prompt management
3. Separates the summarization logic into a dedicated function
4. Maintains both a summary and a buffer of recent messages
5. Gives you explicit control over when summarization happens

You can use it like this:


In [13]:
# Example conversation, ask about the lyrics to the song 'I Remember' by deadmau5 and Kaskade
first_user_msg = "Good morning AI! Which deadma5 song starts with 'Feeling the past moving in. Letting a new day begin.'?" 
print(f"First User Input: {first_user_msg}")
response = chat(first_user_msg)
print(f"First Response: {response}")

response = response.lower()
if "i remember" in response:
    second_user_msg = "That's correct! Remined me, what were the lyrics I gave you as a clue?" 
else:
    second_user_msg = "That's incorrect! The song is 'I Remember', featuring Kaskade. Remined me, what were the lyrics I gave you as a clue?" 


print(f"\nSecond User Input: {second_user_msg}")
response = chat(second_user_msg)
print(f"Second Response: {response}")


# Check token counts
print(f'\nSummary token count: {count_tokens(summary)}')
print(f'Recent messages token count: {sum(count_tokens(msg.content) for msg in messages_buffer)}')

First User Input: Good morning AI! Which deadma5 song starts with 'Feeling the past moving in. Letting a new day begin.'?
First Response: The deadmau5 song that starts with the lyrics "Feeling the past moving in. Letting a new day begin." is "Strobe."

Second User Input: That's incorrect! The song is 'I Remember', featuring Kaskade. Remined me, what were the lyrics I gave you as a clue?
Second Response: I apologize for the mistake. The lyrics you gave me as a clue were "Feeling the past moving in. Letting a new day begin."

Summary token count: 127
Recent messages token count: 119


*Practical Note: Modern LLMs like GPT-3.5-turbo and GPT-4 have context windows of 16K-128K tokens, but it's still important to manage token usage efficiently, both for cost reasons and to ensure the most relevant context is emphasized.*

TOOD: SRA_DEBUGGING: An attempt at implementing an alternative that uses `RunnableWithMessageHistory`. Check if this approach is okay.
TOOD: SRA_DEBUGGING: Break down and explain each step if it is okay.

In [17]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnableWithMessageHistory
from langchain.memory import ChatMessageHistory

# Create stores for histories and summaries
message_histories = {}
summaries = {}

def get_session_history(session_id: str):
    """Initialize or retrieve session history"""
    if session_id not in message_histories:
        message_histories[session_id] = ChatMessageHistory()
        summaries[session_id] = ""
    return message_histories[session_id]

def print_memory_state(session_id="default"):
    """Print the current state of memory including summary and recent messages"""
    print("\nCurrent Memory State:")
    print("=" * 50)
    
    if session_id in summaries:
        summary = summaries[session_id]
        if summary:
            print("\nSummary:")
            print("-" * 20)
            print(summary)
    
    if session_id in message_histories:
        history = message_histories[session_id]
        print("\nRecent Messages:")
        print("-" * 20)
        for i, msg in enumerate(history.messages, 1):
            print(f"{i}. {msg.type}: {msg.content}")
    else:
        print("\nNo messages in history yet")
    print("=" * 50 + "\n")

# Initialize the LLM
llm = ChatOpenAI(temperature=0.7)

# Create prompts
summary_prompt = ChatPromptTemplate.from_messages([
    ("system", """Summarize the key points from this conversation, maintaining important details and context.
    Current summary: {current_summary}
    New conversation:
    {new_lines}
    New summary:""")
])

conversation_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a friendly and helpful AI assistant. Be truthful and concise."),
    ("system", "Previous conversation summary: {summary}"),
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}")
])

def create_chain():
    # Create base chain
    chain = conversation_prompt | llm
    
    # Wrap with history
    chain_with_history = RunnableWithMessageHistory(
        chain,
        get_session_history,
        input_messages_key="input",
        history_messages_key="chat_history"
    )
    
    return chain_with_history

# Initialize chain
chain = create_chain()

def chat(message: str, session_id: str = "default"):
    """Send a message and get a response, managing history and summarization"""
    # Ensure session exists
    if session_id not in message_histories:
        get_session_history(session_id)
    
    # Get response
    response = chain.invoke(
        {"input": message, "summary": summaries[session_id]},
        {"configurable": {"session_id": session_id}}
    )
    
    # After response, check if we need to summarize
    history = message_histories[session_id]
    if len(history.messages) > 4:  # Summarize after every 4 messages
        messages_to_summarize = history.messages[:-2]  # Keep last 2 messages fresh
        
        # Format messages for summarization
        new_lines = "\n".join([
            f"{'Human' if msg.type == 'human' else 'AI'}: {msg.content}"
            for msg in messages_to_summarize
        ])
        
        # Get new summary
        summary_chain = summary_prompt | llm
        summary_result = summary_chain.invoke({
            "current_summary": summaries[session_id],
            "new_lines": new_lines
        })
        
        # Update summary and trim history
        summaries[session_id] = summary_result.content
        history.messages = history.messages[-2:]
    
    return response

In [18]:
def print_memory_state(session_id="default"):
    print("\nCurrent Memory State:")
    print("=" * 50)
    
    # Print summary if it exists
    summary = summaries.get(session_id, "")
    if summary:
        print("\nSummary:")
        print("-" * 20)
        print(summary)
    
    # Print current messages in history
    history = message_histories[session_id]
    print("\nRecent Messages:")
    print("-" * 20)
    for i, msg in enumerate(history.messages, 1):
        print(f"{i}. {msg.type}: {msg.content}")
    print("=" * 50 + "\n")

# Let's test with a conversation about deadmau5
print("Test 1: Initial Question")
response = chat("Good morning AI! Which deadmau5 song starts with 'Feeling the past moving in. Letting a new day begin.'?")
print(f"Response: {response}")
print_memory_state()

print("\nTest 2: Follow-up Question")
response = chat("That's correct if you said 'I Remember'! Can you remind me what lyrics I quoted?")
print(f"Response: {response}")
print_memory_state()

print("\nTest 3: Adding More Context")
response = chat("Who collaborated with deadmau5 on this song?")
print(f"Response: {response}")
print_memory_state()

print("\nTest 4: Testing Memory")
response = chat("What was the first thing I asked you about?")
print(f"Response: {response}")
print_memory_state()

print("\nTest 5: Adding New Information")
response = chat("The song was released in 2008 as part of deadmau5's album 'Random Album Title'. Remember this fact.")
print(f"Response: {response}")
print_memory_state()

print("\nTest 6: Testing Long-term Memory")
response = chat("When was the song released and in which album?")
print(f"Response: {response}")
print_memory_state()

print("\nTest 7: Adding More Context")
response = chat("The song's music video features time-lapse footage of cityscapes. What were those opening lyrics again?")
print(f"Response: {response}")
print_memory_state()

print("\nTest 8: Final Memory Test")
response = chat("Let's recap: What do you remember about the song 'I Remember' by deadmau5?")
print(f"Response: {response}")
print_memory_state()

Test 1: Initial Question
Response: content='The deadmau5 song that starts with the lyrics "Feeling the past moving in. Letting a new day begin." is "Strobe."' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 31, 'prompt_tokens': 62, 'total_tokens': 93, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-BaShRHJdJrWMCIoMmYBfa7FLuIG4x', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None} id='run--e61658eb-cff0-4bf1-8e63-d542fa7eeba0-0' usage_metadata={'input_tokens': 62, 'output_tokens': 31, 'total_tokens': 93, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}

Current Memory State:

Recent Messages:
--------------------
1. human: Good morni



**Initial State (Tests 1-2)**

In Test 1, we see only the first exchange stored with no summary:
```
Current Memory State:
Recent Messages:
1. human: Good morning AI! Which deadmau5 song starts with 'Feeling the past moving in. Letting a new day begin.'?
2. ai: The deadmau5 song that starts with the lyrics "Feeling the past moving in. Letting a new day begin." is "Strobe."
```

By Test 2, we accumulate 4 messages but still no summary (as expected, since we haven't exceeded our threshold):
```
Recent Messages:
1. human: Good morning AI! Which deadmau5 song starts with 'Feeling the past moving in. Letting a new day begin.'?
2. ai: The deadmau5 song that starts with the lyrics "Feeling the past moving in. Letting a new day begin." is "Strobe."
3. human: That's correct if you said 'I Remember'! Can you remind me what lyrics I quoted?
4. ai: The lyrics you quoted are from the song "I Remember" by deadmau5 and Kaskade.
```

**First Summarization Trigger (Test 3)**

When we add the fifth message, we see our first summarization:
```
Summary:
The human asked the AI about a deadmau5 song with specific lyrics, and the AI initially mentioned "Strobe" as the song, but the human corrected it to be "I Remember." The AI then correctly identified the lyrics quoted by the human from the song "I Remember" by deadmau5 and Kaskade.

Recent Messages:
1. human: Who collaborated with deadmau5 on this song?
2. ai: "I Remember" is a collaboration between deadmau5 and Kaskade.
```

This shows the system correctly:
1. Summarized the older conversation
2. Kept only the most recent exchange
3. Maintained the key correction about the song name

**Memory Integration (Tests 4-5)**

In Test 4, when asked about the first question, the system combines summary and recent context:
```
Summary:
The human asked the AI about a deadmau5 song with specific lyrics, and the AI initially mentioned "Strobe" as the song...

Recent Messages:
3. human: What was the first thing I asked you about?
4. ai: You initially asked about a deadmau5 song with specific lyrics.
```

By Test 5, we see the summary updating with new information:
```
Summary:
The human inquired about the collaboration on the song "I Remember," to which the AI correctly stated it was a collaboration between deadmau5 and Kaskade...

Recent Messages:
1. human: The song was released in 2008 as part of deadmau5's album 'Random Album Title'. Remember this fact.
2. ai: Got it! "I Remember" was released in 2008 as part of deadmau5's album 'Random Album Title'.
```

**Final Memory Integration (Tests 7-8)**

By the end, we see complete integration of information. The final response combines facts from both summary and recent messages:
```
Recent Messages:
3. human: Let's recap: What do you remember about the song 'I Remember' by deadmau5?
4. ai: "I Remember" is a collaborative track by deadmau5 and Kaskade released in 2008 as part of deadmau5's album 'Random Album Title.' The song features the opening lyrics "Feeling the past moving in / Letting a new day begin" and its music video includes time-lapse footage of cityscapes."
```

This final response demonstrates successful retention of:
- Release date and album (from earlier summary)
- Opening lyrics (from recent messages)
- Collaboration information (from earlier summary)
- Music video details (from recent context)

The system is clearly maintaining both long-term memory through summaries and short-term memory through recent messages, exactly as designed. Each summarization preserves key information while keeping the most recent context fully intact.


### Memory type #3: ConversationBufferWindowMemory

Another great option for these cases is the `ConversationBufferWindowMemory` where we will be keeping a few of the last interactions in our memory but we will intentionally drop the oldest ones - short-term memory if you'd like. Here the aggregate token count **and** the per-call token count will drop noticeably. We will control this window with the `k` parameter.

**Key feature:** _the conversation buffer window memory keeps the latest pieces of the conversation in raw form_

In [60]:
conversation_bufw = ConversationChain(
    llm=llm, 
    memory=ConversationBufferWindowMemory(k=1)
)

In [61]:
count_tokens(
    conversation_bufw, 
    "Good morning AI!"
)

Spent a total of 85 tokens


" Good morning! It's a beautiful day today, isn't it? How can I help you?"

In [62]:
count_tokens(
    conversation_bufw, 
    "My interest here is to explore the potential of integrating Large Language Models with external knowledge"
)

Spent a total of 178 tokens


' Interesting! Large Language Models are a type of artificial intelligence that can process natural language and generate text. They can be used to generate text from a given context, or to answer questions about a given context. Integrating them with external knowledge can help them to better understand the context and generate more accurate results. Do you have any specific questions about this integration?'

In [63]:
count_tokens(
    conversation_bufw, 
    "I just want to analyze the different possibilities. What can you think of?"
)

Spent a total of 233 tokens


' There are many possibilities for integrating Large Language Models with external knowledge. For example, you could use external knowledge to provide additional context to the model, or to provide additional training data. You could also use external knowledge to help the model better understand the context of a given text, or to help it generate more accurate results.'

In [64]:
count_tokens(
    conversation_bufw, 
    "Which data source types could be used to give context to the model?"
)

Spent a total of 245 tokens


' Data sources that could be used to give context to the model include text corpora, structured databases, and ontologies. Text corpora provide a large amount of text data that can be used to train the model and provide additional context. Structured databases provide structured data that can be used to provide additional context to the model. Ontologies provide a structured representation of knowledge that can be used to provide additional context to the model.'

In [65]:
count_tokens(
    conversation_bufw, 
    "What is my aim again?"
)

Spent a total of 186 tokens


' Your aim is to use data sources to give context to the model.'

As we can see, it effectively 'fogot' what we talked about in the first interaction. Let's see what it 'remembers'. Given that we set k to be `1`, we would expect it remembers only the last interaction.

We need to access a special method here since, in this memory type, the buffer is first passed through this method to be sent later to the llm.

In [66]:
bufw_history = conversation_bufw.memory.load_memory_variables(
    inputs=[]
)['history']

In [67]:
print(bufw_history)

Human: What is my aim again?
AI:  Your aim is to use data sources to give context to the model.


Makes sense. 

On the plus side, we are shortening our conversation length when compared to buffer memory _without_ a window:

In [68]:
print(
    f'Buffer memory conversation length: {len(tokenizer.encode(conversation_buf.memory.buffer))}\n'
    f'Summary memory conversation length: {len(tokenizer.encode(conversation_sum.memory.buffer))}\n'
    f'Buffer window memory conversation length: {len(tokenizer.encode(bufw_history))}'
)

Buffer memory conversation length: 334
Summary memory conversation length: 219
Buffer window memory conversation length: 26


_Practical Note: We are using `k=2` here for illustrative purposes, in most real world applications you would need a higher value for k._

### More memory types!

Given that we understand memory already, we will present a few more memory types here and hopefully a brief description will be enough to understand their underlying functionality.

#### ConversationSummaryBufferMemory

**Key feature:** _the conversation summary memory keeps a summary of the earliest pieces of conversation while retaining a raw recollection of the latest interactions._

#### ConversationKnowledgeGraphMemory

This is a super cool memory type that was introduced just [recently](https://twitter.com/LangChainAI/status/1625158388824043522). It is based on the concept of a _knowledge graph_ which recognizes different entities and connects them in pairs with a predicate resulting in (subject, predicate, object) triplets. This enables us to compress a lot of information into highly significant snippets that can be fed into the model as context. If you want to understand this memory type in more depth you can check out [this](https://apex974.com/articles/explore-langchain-support-for-knowledge-graph) blogpost.

**Key feature:** _the conversation knowledge graph memory keeps a knowledge graph of all the entities that have been mentioned in the interactions together with their semantic relationships._

In [69]:
# you may need to install this library
# !pip install -qU networkx

In [70]:
conversation_kg = ConversationChain(
    llm=llm, 
    memory=ConversationKGMemory(llm=llm)
)

In [71]:
count_tokens(
    conversation_kg, 
    "My name is human and I like mangoes!"
)

Spent a total of 1565 tokens


" Hi Human! My name is AI. It's nice to meet you. I like mangoes too! Did you know that mangoes are a great source of vitamins A and C?"

The memory keeps a knowledge graph of everything it learned so far.

In [72]:
conversation_kg.memory.kg.get_triples()

[('human', 'human', 'name'), ('human', 'mangoes', 'likes')]

#### ConversationEntityMemory

**Key feature:** _the conversation entity memory keeps a recollection of the main entities that have been mentioned, together with their specific attributes._

The way this works is quite similar to the `ConversationKnowledgeGraphMemory`, you can refer to the [docs](https://python.langchain.com/en/latest/modules/memory/types/entity_summary_memory.html) if you want to see it in action. 

## What else can we do with memory?

There are several cool things we can do with memory in langchain. We can:
* implement our own custom memory module
* use multiple memory modules in the same chain
* combine agents with memory and other tools

If this piques your interest, we suggest you to go take a look at the memory [how-to](https://langchain.readthedocs.io/en/latest/modules/memory/how_to_guides.html) section in the docs!