# Technique 1: Basic Conversation Buffer Memory - Step by Step Guide

## Overview

This notebook provides a step-by-step guide to implementing **Basic Conversation Buffer Memory** using LangChain's modern LCEL (LangChain Expression Language) pattern.

### What is Buffer Memory?

Buffer memory is the simplest form of conversational history:
- Stores **all messages** in a buffer
- Passes **complete conversation history** to the LLM on each call
- No compression or summarization
- **Preserves complete context**

### Key Benefits
- ✅ Simple and straightforward
- ✅ Preserves complete conversation context
- ✅ No information loss
- ✅ Uses modern LangChain v1.0+ patterns

### Trade-offs
- ⚠️ Can become expensive with long conversations (more tokens)
- ⚠️ May hit token limits with very long conversations
- ⚠️ No automatic summarization or compression

### Use Case
Short to medium-length conversations where you need complete context.


## Step 1: Import Required Libraries

First, let's import all the necessary libraries for this implementation.


In [None]:
# Core LangChain imports
from langchain_openai import ChatOpenAI
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Utilities
from dotenv import load_dotenv
import os
import sys
from typing import Dict

# Token counting utilities
import pathlib
sys.path.append(str(pathlib.Path().absolute().parent))
from utils.token_counter import (
    count_tokens, 
    count_messages_tokens,
    print_token_stats,
    print_token_summary
)

# Load environment variables (for API keys)
load_dotenv()

print("✅ All imports successful!")


## Step 2: Understanding ChatMessageHistory

The `ChatMessageHistory` class is LangChain's built-in implementation that:
1. Stores all messages in a list
2. Provides methods to add messages (`add_message`)
3. Returns all messages when accessed (`messages` property)

Let's examine how it works:


In [None]:
# Create a simple chat message history instance
history = ChatMessageHistory()

print(f"Type: {type(history).__name__}")
print(f"Current messages: {len(history.messages)}")
print(f"Messages: {history.messages}")

# Add a message
history.add_message(HumanMessage(content="Hello!"))
print(f"\nAfter adding a message:")
print(f"Messages: {len(history.messages)}")
print(f"First message: {history.messages[0].content}")


## Step 3: Create Session History Store

We need a way to store and retrieve chat histories for different sessions. This allows multiple conversations to run independently.


In [None]:
# Store for chat message histories (session_id -> history)
store: Dict[str, BaseChatMessageHistory] = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    """
    Get or create a chat message history for a session.
    
    This function is called by RunnableWithMessageHistory to retrieve
    the history for a specific session. Each session gets its own
    independent conversation history.
    """
    if session_id not in store:
        # Create a new ChatMessageHistory instance for this session
        store[session_id] = ChatMessageHistory()
    
    return store[session_id]

# Test the function
test_history = get_session_history("test_session")
print(f"✅ Created history for session: test_session")
print(f"   Type: {type(test_history).__name__}")
print(f"   Messages: {len(test_history.messages)}")

# Test with another session
test_history2 = get_session_history("another_session")
print(f"\n✅ Created separate history for session: another_session")
print(f"   Messages: {len(test_history2.messages)}")
print(f"   Different instances: {test_history is not test_history2}")


## Step 4: Create the LLM and Prompt Template

Now we'll create the main LLM for conversations and a prompt template that includes a placeholder for message history.


In [None]:
# Initialize the main LLM for conversations
llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0.7,  # Higher temperature for more natural conversations
    openai_api_key=os.getenv("OPENAI_API_KEY")
)

# Create a prompt template with message history placeholder
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant. Have a natural conversation with the user."),
    MessagesPlaceholder(variable_name="history"),  # This will be filled with all previous messages
    ("human", "{input}")  # User's current input
])

print("✅ LLM and prompt template created!")
print(f"   LLM Model: {llm.model_name}")
print(f"   Prompt variables: {prompt.input_variables}")
print(f"   History placeholder: 'history'")


## Step 5: Build the Chain with LCEL

LCEL (LangChain Expression Language) allows us to chain components together using the `|` operator. We'll create a chain that:
1. Takes the prompt template
2. Pipes it to the LLM
3. Wraps it with message history management


In [None]:
# Step 5.1: Create the base chain using LCEL
# The | operator chains the prompt template to the LLM
chain = prompt | llm

print("✅ Base chain created using LCEL")
print("   Chain: prompt | llm")
print("   This chain takes input, formats it with the prompt, and sends it to the LLM")

# Step 5.2: Wrap with message history
# RunnableWithMessageHistory automatically:
# - Retrieves history using get_session_history
# - Adds new messages to history after each call
# - Passes ALL messages to the prompt template (buffer memory)
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,  # Function to get/create history for a session
    input_messages_key="input",  # Key for user input
    history_messages_key="history",  # Key for message history in prompt
)

print("\n✅ Chain wrapped with message history")
print("   Now the chain will automatically:")
print("   1. Retrieve conversation history for the session")
print("   2. Pass ALL messages to the LLM (buffer memory)")
print("   3. Add new messages to history after each call")


## Step 6: Test the Implementation - First Message

Let's test our implementation with a simple conversation to see how buffer memory works.


In [None]:
# Create a new session for testing
session_id = "demo_session"
config = {"configurable": {"session_id": session_id}}

# First message
print("=" * 60)
print("Message 1: Introduction")
print("=" * 60)

response1 = chain_with_history.invoke(
    {"input": "Hi, my name is Alice"},
    config=config
)

print(f"User: Hi, my name is Alice")
print(f"Agent: {response1.content}\n")

# Check the history
history = get_session_history(session_id)
print(f"Messages stored in history: {len(history.messages)}")
print(f"Message breakdown:")
for i, msg in enumerate(history.messages, 1):
    if isinstance(msg, HumanMessage):
        print(f"  {i}. Human: {msg.content}")
    elif isinstance(msg, AIMessage):
        print(f"  {i}. AI: {msg.content}")


## Step 7: Test Memory - Second Message

Now let's send a second message and see how the agent remembers the previous conversation.


In [None]:
print("=" * 60)
print("Message 2: Testing Memory")
print("=" * 60)

response2 = chain_with_history.invoke(
    {"input": "What's my name?"},
    config=config
)

print(f"User: What's my name?")
print(f"Agent: {response2.content}\n")

# Check the history - should now have 4 messages (2 human + 2 AI)
history = get_session_history(session_id)
print(f"Total messages stored: {len(history.messages)}")
print(f"\nFull conversation history:")
for i, msg in enumerate(history.messages, 1):
    if isinstance(msg, HumanMessage):
        print(f"  {i}. Human: {msg.content}")
    elif isinstance(msg, AIMessage):
        print(f"  {i}. AI: {msg.content[:80]}...")


## Step 8: Understanding Buffer Memory Behavior

With buffer memory, **all messages** are stored and passed to the LLM. Let's add more messages and observe how the history grows:


In [None]:
# Add more messages to see buffer memory in action
conversations = [
    "I'm a software engineer. What do I do?",
    "What's my name again?"
]

print("=" * 60)
print("Adding More Messages")
print("=" * 60)

for i, user_input in enumerate(conversations, 1):
    print(f"\nMessage {i+2}: {user_input}")
    response = chain_with_history.invoke(
        {"input": user_input},
        config=config
    )
    print(f"Agent: {response.content[:100]}...\n")
    
    history = get_session_history(session_id)
    print(f"  Total messages in buffer: {len(history.messages)}")
    print(f"  (All messages are stored and passed to the LLM)")


## Step 9: View Complete Conversation History

Let's see the complete conversation history stored in the buffer:


In [None]:
history = get_session_history(session_id)

print("=" * 60)
print("Complete Conversation History (Buffer Memory)")
print("=" * 60)
print(f"Total messages: {len(history.messages)}\n")

for i, msg in enumerate(history.messages, 1):
    if isinstance(msg, HumanMessage):
        print(f"{i}. Human: {msg.content}")
    elif isinstance(msg, AIMessage):
        print(f"{i}. AI: {msg.content[:100]}...")
    print()

print("=" * 60)
print("Key Point: ALL messages are stored and passed to the LLM")
print("This is the 'buffer' - it keeps everything!")
print("=" * 60)


## Step 10: Understanding Token Usage

With buffer memory, token usage grows with each message. Let's see how tokens accumulate:


In [None]:
# Create a fresh session for token counting
session_id_tokens = "token_demo"
config_tokens = {"configurable": {"session_id": session_id_tokens}}

conversations = [
    "Hi, my name is Alice",
    "What's my name?",
    "I'm a software engineer. What do I do?",
    "What's my name again?"
]

print("=" * 60)
print("Token Usage Analysis")
print("=" * 60)

for i, user_input in enumerate(conversations, 1):
    print(f"\n--- Turn {i} ---")
    print(f"User: {user_input}")
    
    # Count input tokens (user message + all history)
    input_tokens = count_tokens(user_input)
    history = get_session_history(session_id_tokens)
    if history.messages:
        input_tokens += count_messages_tokens(history.messages)
    
    response = chain_with_history.invoke(
        {"input": user_input},
        config=config_tokens
    )
    
    output_tokens = count_tokens(response.content)
    memory_tokens = count_messages_tokens(history.messages) if history.messages else 0
    
    print(f"Agent: {response.content[:80]}...")
    print(f"Input tokens: {input_tokens} (user message + {len(history.messages)} previous messages)")
    print(f"Output tokens: {output_tokens}")
    print(f"Memory tokens: {memory_tokens} (all stored messages)")
    
print("\n" + "=" * 60)
print("Notice: Token usage increases with each turn!")
print("This is because ALL messages are included in each request.")
print("=" * 60)


## Step 11: Complete Implementation Function

Here's the complete function that combines all the steps:


In [None]:
def create_buffer_memory_agent():
    """Create an agent with basic buffer memory using modern LCEL pattern."""
    
    # Initialize the LLM
    llm = ChatOpenAI(
        model="gpt-4o",
        temperature=0.7,
        openai_api_key=os.getenv("OPENAI_API_KEY")
    )
    
    # Create a prompt template with message history placeholder
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful AI assistant. Have a natural conversation with the user."),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{input}")
    ])
    
    # Create the chain using LCEL
    chain = prompt | llm
    
    # Wrap with message history (this provides the buffer memory)
    chain_with_history = RunnableWithMessageHistory(
        chain,
        get_session_history,
        input_messages_key="input",
        history_messages_key="history",
    )
    
    return chain_with_history

print("✅ Complete implementation function created!")
print("\nThis function:")
print("  1. Creates an LLM")
print("  2. Creates a prompt template with history placeholder")
print("  3. Chains them together with LCEL")
print("  4. Wraps with RunnableWithMessageHistory for automatic history management")


## Step 12: Full Demonstration with Token Counting

Let's run a complete demonstration that shows token usage:


In [None]:
def demonstrate_buffer_memory():
    """Demonstrate basic buffer memory with token counting."""
    print("=" * 60)
    print("Technique 1: Basic Conversation Buffer Memory (LCEL Pattern)")
    print("=" * 60)
    print("Using modern LangChain v1.0+ patterns")
    print()
    
    chain = create_buffer_memory_agent()
    session_id = "demo_session_full"
    config = {"configurable": {"session_id": session_id}}
    
    # Simulate a conversation
    conversations = [
        "Hi, my name is Alice",
        "What's my name?",
        "I'm a software engineer. What do I do?",
        "What's my name again?"
    ]
    
    total_input_tokens = 0
    total_output_tokens = 0
    
    for i, user_input in enumerate(conversations, 1):
        print(f"User: {user_input}")
        
        # Count input tokens (user message + history)
        input_tokens = count_tokens(user_input)
        history = get_session_history(session_id)
        if history.messages:
            input_tokens += count_messages_tokens(history.messages)
        total_input_tokens += input_tokens
        
        response = chain.invoke(
            {"input": user_input},
            config=config
        )
        print(f"Agent: {response.content}")
        
        # Count output tokens
        output_tokens = count_tokens(response.content)
        total_output_tokens += output_tokens
        
        # Count current memory tokens
        history = get_session_history(session_id)
        memory_tokens = count_messages_tokens(history.messages) if history.messages else 0
        
        print_token_stats(input_tokens, output_tokens, memory_tokens)
        print()
    
    # Show the stored memory
    print("\n" + "-" * 60)
    print("Stored Memory (All Messages):")
    print("-" * 60)
    history = get_session_history(session_id)
    for message in history.messages:
        if isinstance(message, HumanMessage):
            print(f"Human: {message.content}")
        elif isinstance(message, AIMessage):
            print(f"AI: {message.content}")
    print()
    
    # Show total token usage
    final_memory = count_messages_tokens(history.messages) if history.messages else 0
    print_token_summary(
        total_input_tokens, 
        total_output_tokens, 
        final_memory
    )

# Uncomment to run the full demonstration
# demonstrate_buffer_memory()
