# Technique 2: Conversation Summary Memory - Step by Step Guide

## Overview

This notebook provides a step-by-step guide to implementing **Conversation Summary Memory** using LangChain's modern LCEL (LangChain Expression Language) pattern.

### What is Summary Memory?

Instead of storing all conversation messages (which can become expensive with long conversations), summary memory:
- Maintains a **running summary** of older messages
- Keeps only **recent messages** in full detail
- Automatically **summarizes** when a threshold is reached
- **Reduces token usage** for long conversations

### Key Benefits
- ✅ Efficient for long conversations
- ✅ Automatically compresses information
- ✅ Can handle very long conversation histories
- ✅ Uses modern LangChain v1.0+ patterns

### Trade-offs
- ⚠️ Some detail may be lost in summarization
- ⚠️ Requires additional LLM calls for summarization
- ⚠️ Summary quality depends on the summarization prompt


## Step 1: Import Required Libraries

First, let's import all the necessary libraries for this implementation.


In [None]:
# Core LangChain imports
from langchain_openai import ChatOpenAI
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Custom summary history implementation
from utils.custom_chat_histories import SummaryChatMessageHistory

# Utilities
from dotenv import load_dotenv
import os
import sys
from typing import Dict

# Token counting utilities
import pathlib
sys.path.append(str(pathlib.Path().absolute().parent))
from utils.token_counter import (
    count_tokens, 
    count_messages_tokens,
    print_token_stats,
    print_token_summary
)

# Load environment variables (for API keys)
load_dotenv()

print("✅ All imports successful!")


## Step 2: Understanding SummaryChatMessageHistory

The `SummaryChatMessageHistory` class is a custom implementation that:
1. Stores all messages internally (`_messages`)
2. Automatically summarizes when a threshold is reached (default: 5 messages)
3. Returns a summary + recent messages when accessed

Let's examine how it works:


In [None]:
# Create a summary LLM (used for summarization)
summary_llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0,  # Low temperature for consistent summarization
    openai_api_key=os.getenv("OPENAI_API_KEY")
)

# Create a summary history instance
summary_history = SummaryChatMessageHistory(summary_llm=summary_llm)

print(f"Summary threshold: {summary_history.summary_threshold} messages")
print(f"Current messages: {len(summary_history._messages)}")
print(f"Current summary: {summary_history.summary or '(empty)'}")


## Step 3: Create Session History Store

We need a way to store and retrieve chat histories for different sessions. This allows multiple conversations to run independently.


In [None]:
# Store for chat message histories (session_id -> history)
store: Dict[str, BaseChatMessageHistory] = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    """
    Get or create a summary chat message history for a session.
    
    This function is called by RunnableWithMessageHistory to retrieve
    the history for a specific session.
    """
    if session_id not in store:
        # Create a new summary LLM for this session
        summary_llm = ChatOpenAI(
            model="gpt-4o",
            temperature=0,  # Low temperature for consistent summarization
            openai_api_key=os.getenv("OPENAI_API_KEY")
        )
        # Create a new SummaryChatMessageHistory instance
        store[session_id] = SummaryChatMessageHistory(summary_llm=summary_llm)
    
    return store[session_id]

# Test the function
test_history = get_session_history("test_session")
print(f"✅ Created history for session: test_session")
print(f"   Type: {type(test_history).__name__}")
print(f"   Messages: {len(test_history.messages)}")


## Step 4: Create the LLM and Prompt Template

Now we'll create the main LLM for conversations and a prompt template that includes a placeholder for message history.


In [None]:
# Initialize the main LLM for conversations
llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0.7,  # Higher temperature for more natural conversations
    openai_api_key=os.getenv("OPENAI_API_KEY")
)

# Create a prompt template with message history placeholder
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant. Have a natural conversation with the user."),
    MessagesPlaceholder(variable_name="history"),  # This will be filled with summary + recent messages
    ("human", "{input}")  # User's current input
])

print("✅ LLM and prompt template created!")
print(f"   LLM Model: {llm.model_name}")
print(f"   Prompt variables: {prompt.input_variables}")


## Step 5: Build the Chain with LCEL

LCEL (LangChain Expression Language) allows us to chain components together using the `|` operator. We'll create a chain that:
1. Takes the prompt template
2. Pipes it to the LLM
3. Wraps it with message history management


In [None]:
# Step 5.1: Create the base chain using LCEL
# The | operator chains the prompt template to the LLM
chain = prompt | llm

print("✅ Base chain created using LCEL")
print("   Chain: prompt | llm")

# Step 5.2: Wrap with message history
# RunnableWithMessageHistory automatically:
# - Retrieves history using get_session_history
# - Adds new messages to history after each call
# - Passes history to the prompt template
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,  # Function to get/create history for a session
    input_messages_key="input",  # Key for user input
    history_messages_key="history",  # Key for message history in prompt
)

print("✅ Chain wrapped with message history")
print("   Now the chain will automatically manage conversation history!")


## Step 6: Test the Implementation

Let's test our implementation with a simple conversation to see how it works.


In [None]:
# Create a new session for testing
session_id = "demo_session"
config = {"configurable": {"session_id": session_id}}

# First message
print("=" * 60)
print("Message 1")
print("=" * 60)
response1 = chain_with_history.invoke(
    {"input": "Hi, I'm Bob and I work as a data scientist"},
    config=config
)
print(f"User: Hi, I'm Bob and I work as a data scientist")
print(f"Agent: {response1.content}\n")

# Check the history
history = get_session_history(session_id)
print(f"Messages in history: {len(history.messages)}")
print(f"Summary: {history.summary or '(not created yet)'}\n")


In [None]:
# Continue the conversation
print("=" * 60)
print("Message 2")
print("=" * 60)
response2 = chain_with_history.invoke(
    {"input": "I specialize in machine learning and deep learning"},
    config=config
)
print(f"User: I specialize in machine learning and deep learning")
print(f"Agent: {response2.content}\n")

history = get_session_history(session_id)
print(f"Messages in history: {len(history.messages)}")
print(f"Summary: {history.summary or '(not created yet)'}\n")


## Step 7: Understanding How Summarization Works

The `SummaryChatMessageHistory` automatically summarizes when the threshold is reached. Let's add more messages to trigger summarization:


In [None]:
# Add more messages to reach the summarization threshold (5 messages)
conversations = [
    "I've been working on NLP projects for 5 years",
    "My favorite programming language is Python",
    "I enjoy working with neural networks"
]

for i, user_input in enumerate(conversations, 1):
    print(f"Message {i+2}: {user_input}")
    response = chain_with_history.invoke(
        {"input": user_input},
        config=config
    )
    print(f"Agent: {response.content[:100]}...\n")
    
    history = get_session_history(session_id)
    print(f"  Messages in history: {len(history.messages)}")
    print(f"  Summary: {history.summary[:100] if history.summary else '(not created yet)'}...")
    print()


## Step 8: Verify Summarization

After reaching the threshold, older messages should be summarized. Let's check:


In [None]:
history = get_session_history(session_id)

print("=" * 60)
print("History Status After Multiple Messages")
print("=" * 60)
print(f"Total messages stored internally: {len(history._messages)}")
print(f"Messages returned (summary + recent): {len(history.messages)}")
print(f"\nSummary:")
print(f"{history.summary}\n" if history.summary else "(No summary yet)\n")

print("Recent messages returned:")
for i, msg in enumerate(history.messages, 1):
    if isinstance(msg, HumanMessage):
        print(f"  {i}. Human: {msg.content[:80]}...")
    else:
        print(f"  {i}. AI: {msg.content[:80]}...")


## Step 9: Test Memory Recall

Now let's test if the agent remembers information from earlier in the conversation (which should be in the summary):


In [None]:
# Ask about information from earlier in the conversation
print("=" * 60)
print("Testing Memory Recall")
print("=" * 60)

questions = [
    "What's my profession?",
    "What programming language do I prefer?",
    "How long have I been working on NLP?"
]

for question in questions:
    print(f"\nUser: {question}")
    response = chain_with_history.invoke(
        {"input": question},
        config=config
    )
    print(f"Agent: {response.content}\n")


## Step 10: Complete Implementation Function

Here's the complete function that combines all the steps:


In [None]:
def create_summary_memory_agent():
    """Create an agent with summary memory using LCEL pattern."""
    
    # Initialize the LLM
    llm = ChatOpenAI(
        model="gpt-4o",
        temperature=0.7,
        openai_api_key=os.getenv("OPENAI_API_KEY")
    )
    
    # Create a prompt template with message history placeholder
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful AI assistant. Have a natural conversation with the user."),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{input}")
    ])
    
    # Create the chain using LCEL
    chain = prompt | llm
    
    # Wrap with message history (summary history)
    chain_with_history = RunnableWithMessageHistory(
        chain,
        get_session_history,
        input_messages_key="input",
        history_messages_key="history",
    )
    
    return chain_with_history

print("✅ Complete implementation function created!")


## Step 11: Full Demonstration with Token Counting

Let's run a complete demonstration that shows token usage:


In [None]:
def demonstrate_summary_memory():
    """Demonstrate summary memory with token counting."""
    print("=" * 60)
    print("Technique 2: Conversation Summary Memory (LCEL Pattern)")
    print("=" * 60)
    print("Using modern LangChain v1.0+ patterns with RunnableWithMessageHistory")
    print()
    
    chain = create_summary_memory_agent()
    session_id = "demo_session_full"
    config = {"configurable": {"session_id": session_id}}
    
    # Simulate a longer conversation
    conversations = [
        "Hi, I'm Bob and I work as a data scientist",
        "I specialize in machine learning and deep learning",
        "I've been working on NLP projects for 5 years",
        "My favorite programming language is Python",
        "What's my profession?",
        "What programming language do I prefer?",
        "How long have I been working on NLP?"
    ]
    
    total_input_tokens = 0
    total_output_tokens = 0
    
    for i, user_input in enumerate(conversations, 1):
        print(f"User: {user_input}")
        
        # Count input tokens (user message + history/summary)
        input_tokens = count_tokens(user_input)
        history = get_session_history(session_id)
        if history.messages:
            input_tokens += count_messages_tokens(history.messages)
        # Add summary tokens if exists
        if hasattr(history, 'summary') and history.summary:
            input_tokens += count_tokens(history.summary)
        total_input_tokens += input_tokens
        
        response = chain.invoke(
            {"input": user_input},
            config=config
        )
        print(f"Agent: {response.content}")
        
        # Count output tokens
        output_tokens = count_tokens(response.content)
        total_output_tokens += output_tokens
        
        # Count current memory tokens (summary + messages)
        history = get_session_history(session_id)
        memory_tokens = count_messages_tokens(history.messages) if history.messages else 0
        if hasattr(history, 'summary') and history.summary:
            memory_tokens += count_tokens(history.summary)
        
        print_token_stats(input_tokens, output_tokens, memory_tokens)
        print()
    
    # Show the stored summary
    print("\n" + "-" * 60)
    print("Stored Summary:")
    print("-" * 60)
    history = get_session_history(session_id)
    if hasattr(history, 'summary') and history.summary:
        print(history.summary)
    else:
        print("(Summary will be created after more messages)")
    print(f"\nRecent Messages: {len(history.messages)}")
    print()
    
    # Show total token usage
    final_memory = count_messages_tokens(history.messages) if history.messages else 0
    if hasattr(history, 'summary') and history.summary:
        final_memory += count_tokens(history.summary)
    
    print_token_summary(
        total_input_tokens, 
        total_output_tokens, 
        final_memory
    )

# Uncomment to run the full demonstration
# demonstrate_summary_memory()


## Summary

### Key Concepts Learned:

1. **SummaryChatMessageHistory**: Custom class that automatically summarizes older messages
2. **RunnableWithMessageHistory**: Wraps a chain to automatically manage message history
3. **LCEL Pattern**: Modern way to chain components using the `|` operator
4. **Session Management**: Using a store dictionary to manage multiple conversation sessions
5. **Token Efficiency**: Summary memory reduces token usage for long conversations

### How It Works:

1. Messages are stored internally in `_messages`
2. When threshold (5 messages) is reached, older messages are summarized
3. The `messages` property returns: `[summary_message] + recent_messages`
4. This allows the LLM to have context without using all tokens

### Next Steps:

- Try adjusting `summary_threshold` in `SummaryChatMessageHistory`
- Experiment with different summarization prompts
- Compare token usage with basic buffer memory