# Introduction to Memory in LangChain and LangGraph
In LLM applications, the "memory" component is crucial to maintain conversation state. LangChain's memory module has evolved significantly. Modern LangChain provides several approaches depending on your needs—whether you want to store a full conversation, manage messages with middleware, or build agent-based systems with tools. Understanding these tradeoffs is essential when teaching how LLMs can maintain context over extended interactions.

This notebook demonstrates various memory approaches in modern LangChain and LangGraph, such as:
- RunnableWithMessageHistory (for basic chains)
- LangGraph with checkpointers (for stateful workflows)
- Memory management strategies (trim, delete, summarize [on Thursday])

In [1]:
# Install the required packages
!pip install langchain langchain-openai langgraph



## 1. RunnableWithMessageHistory (Basic Chains)

Key Concepts:
- Session Management: Each conversation gets a unique session_id
- Automatic History: Messages are automatically stored and retrieved
- Simple API: Works with any LCEL chain (chain = prompt | llm)

When to Use:
- Building simple chatbots without tools
- Short to medium-length conversations
- Rapid prototyping and demos
- Applications where full state control isn't needed

When Not to Use:
- Very long conversations (risk of context overflow)
- Complex multi-step workflows
- Applications requiring custom state beyond messages
- Systems needing fine-grained state control

Architecture:
User Input → Chain → LLM → Response
                ↓              ↓
        Session Store ← History Retrieved

### 1.1 - Import Dependencies

In [2]:
# Import the necessary components from LangChain
from langchain_openai import ChatOpenAI  # OpenAI chat model wrapper
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder  # Prompt templates
from langchain_core.runnables.history import RunnableWithMessageHistory  # Memory wrapper
from langchain_community.chat_message_histories import ChatMessageHistory  # In-memory storage
from langchain_core.chat_history import BaseChatMessageHistory  # Base class for history

# Import the userdata module from google.colab to securely access user-specific data.
from google.colab import userdata

# Retrieve the OpenAI API key from Colab's user data.
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

### 1.2 - Initialize the Language Model

In [3]:
# Initialize the language model with a fixed temperature (to ensure consistent responses)
llm = ChatOpenAI(temperature=0.0,
                 model="gpt-4o-mini",
                 openai_api_key=OPENAI_API_KEY)

### 1.3 - Set Up Session Storage

In [4]:
# Create a dictionary to store conversation histories
# In production, you would use a database (Redis, PostgreSQL, etc.)
# Key: session_id (string) → Value: ChatMessageHistory object
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    """
    Retrieve or create a chat history for a given session.

    This function is called by RunnableWithMessageHistory to get the
    conversation history for a specific session_id. If the session
    doesn't exist, it creates a new ChatMessageHistory object.

    Args:
        session_id: Unique identifier for the conversation session

    Returns:
        ChatMessageHistory object containing the conversation history

    IMPORTANT: In production, replace this with a database-backed solution:
        - Redis: Fast, in-memory storage for session data
        - PostgreSQL: Persistent storage with SQL queries
        - MongoDB: Document-based storage for JSON-like data
    """
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

### 1.4 - Create the Prompt Template

In [5]:
# Create a prompt template with a placeholder for conversation history
# The MessagesPlaceholder is where previous messages will be inserted
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),  # System message sets behavior
    MessagesPlaceholder(variable_name="history"),  # Previous conversation goes here
    ("human", "{input}")  # Current user input
])

### 1.5 - Build the Chain with LCEL

In [6]:
# Create the chain using LCEL (LangChain Expression Language)
# The pipe operator (|) chains the prompt and model together
# Data flows: input → prompt (formats) → llm (generates) → output
chain = prompt | llm

### 1.6 - Wrap Chain with Message History


In [7]:
# Wrap the chain with RunnableWithMessageHistory to enable memory
chain_with_history = RunnableWithMessageHistory(
    chain,                          # The chain to wrap
    get_session_history,            # Function to retrieve/create history
    input_messages_key="input",     # Key in input dict that contains user message
    history_messages_key="history"  # Key in prompt where history is inserted
)

How it Works:
1. User calls chain_with_history.invoke({"input": "..."}, config={...})
2. RunnableWithMessageHistory extracts session_id from config
3. Calls get_session_history(session_id) to get conversation history
4. Inserts history into the "history" placeholder in the prompt
5. Runs the chain with the complete prompt
6. Saves the new exchange (user input + AI response) to history
7. Returns the response

### 1.7 - Have a Multi-Turn Conversation

In [8]:
# Conversation Turn #1: Introduction
# The user introduces themselves
chain_with_history.invoke(
    {"input": "Hi, my name is David"},
    config={"configurable": {"session_id": "user_session_1"}}
)

AIMessage(content='Hi David! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 23, 'total_tokens': 33, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaPmwI8Bu6Xfg9yxvPvNdgSFTiBEa', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--97ed0dca-1bd3-4734-814b-12b5045816d9-0', usage_metadata={'input_tokens': 23, 'output_tokens': 10, 'total_tokens': 33, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

In [9]:
# Conversation Turn #2: Simple Question
# Test that the model works for basic queries
chain_with_history.invoke(
    {"input": "What is 1+1?"},
    config={"configurable": {"session_id": "user_session_1"}}
)

AIMessage(content='1 + 1 equals 2.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 48, 'total_tokens': 56, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaPmyYFNqqmOOYGks60WbDPgFMeug', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--6f510782-3745-4796-8fe5-f9e2f892e464-0', usage_metadata={'input_tokens': 48, 'output_tokens': 8, 'total_tokens': 56, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

In [10]:
# Conversation Turn #3: Memory Test
# This tests whether the model remembers information from Turn #1

chain_with_history.invoke(
    {"input": "What is my name?"},
    config={"configurable": {"session_id": "user_session_1"}}
)

AIMessage(content='Your name is David.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 69, 'total_tokens': 74, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaPn0C6znrtzSmdcUgTC21l4ilE8s', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--e90c7ece-a718-4c13-889e-829a5ae3c4d4-0', usage_metadata={'input_tokens': 69, 'output_tokens': 5, 'total_tokens': 74, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

### 1.8 - Inspect the Stored Conversation History

In [11]:
# Print the message history to show the complete conversation
history = store["user_session_1"]

# Iterate through all messages and display them
# Messages alternate: human → ai → human → ai → ...
for i, message in enumerate(history.messages, 1):
    # message.type is either "human" or "ai"
    # message.content is the actual text
    print(f"{i}. {message.type.upper()}: {message.content}")

1. HUMAN: Hi, my name is David
2. AI: Hi David! How can I assist you today?
3. HUMAN: What is 1+1?
4. AI: 1 + 1 equals 2.
5. HUMAN: What is my name?
6. AI: Your name is David.


## 2. LangGraph with Checkpointers (Stateful Workflows)
Key Concepts:
- StateGraph: A graph where each node can read and modify state
- MessagesState: Built-in state schema that tracks conversation messages
- Checkpointer: Persistence layer that saves state snapshots
- Thread ID: Unique identifier for a conversation thread (like session_id)
- Nodes: Functions that process state and return updates
- Edges: Connections that define workflow between nodes


When to Use LangGraph:
- Complex multi-step workflows (agents, pipelines)
- Applications that need to branch based on conditions
- Systems requiring state inspection and debugging
- Production applications needing durable persistence
- Workflows with multiple tools or decision points

When Not to Use:
- Simple single-turn Q&A (use basic chains instead)
- Rapid prototyping where complexity isn't needed
- Very simple chatbots without workflow logic

Architecture:
Basic Chain (Section 1):
  User → Chain → LLM → Response
           ↓
      Session Store

LangGraph (Section 2):
  User → Graph State → Node(s) → LLM → Update State → Response
                         ↓                    ↓
                    Checkpoint         Checkpoint
                    (before)           (after)


Key Differences: LangGraph saves the entire state at each step, enabling:
- Resume from any point
- Time-travel debugging
- Complex branching workflows
- Full state inspection

### 2.1 - Import LangGraph Components

- StateGraph: The main class for building stateful graphs
- MessagesState: Pre-built state schema with "messages" key
- START: Special node representing the entry point to the graph
- InMemorySaver: Checkpointer that stores state in memory (for development)

In [12]:
from langgraph.graph import StateGraph, MessagesState, START
from langgraph.checkpoint.memory import InMemorySaver

### 2.2 - Initialize the Checkpointer


In [13]:
# Create a checkpointer to enable persistent memory
# InMemorySaver stores checkpoints in RAM (lost when program stops)
checkpointer = InMemorySaver()

### 2.3 - Define a Node Function


In [14]:
def call_model(state: MessagesState):
    """
    Node that calls the LLM with the current conversation state.

    In LangGraph, nodes are functions that:
    1. Receive the current state as input
    2. Perform some operation (call LLM, tool, etc.)
    3. Return a dictionary of state updates

    Args:
        state: MessagesState containing "messages" key with conversation history

    Returns:
        Dictionary with "messages" key containing the new AI response

    STATE UPDATE MECHANISM:
    When you return {"messages": [response]}, LangGraph APPENDS the response
    to the existing messages list. This is because MessagesState uses the
    add_messages reducer, which concatenates new messages to the history.
    """
    # Get all messages from state (includes full conversation history)
    messages = state["messages"]

    # Call the LLM with the complete message history
    response = llm.invoke(messages)

    # Return the response as a state update
    # This will be appended to the messages list
    return {"messages": [response]}

# Build the graph
builder = StateGraph(MessagesState)
builder.add_node("call_model", call_model)
builder.add_edge(START, "call_model")

<langgraph.graph.state.StateGraph at 0x7de20eb8bd40>

### 2.4 - Build the State Graph

In [15]:
# Create a StateGraph with MessagesState schema
# MessagesState provides a "messages" key that stores conversation history
builder = StateGraph(MessagesState)

# Add the call_model function as a node named "call_model"
# Node names are strings; they're used to reference nodes in edges
builder.add_node("call_model", call_model)

# Add an edge from START to call_model
# This means: when the graph starts, go to the call_model node
builder.add_edge(START, "call_model")

<langgraph.graph.state.StateGraph at 0x7de20ebbc8f0>

### 2.5 - Compile the Graph with Checkpointer
What Compile Does:
1. Validates the graph structure (no loops, all nodes reachable)
2. Optimizes execution order
3. Integrates the checkpointer for state persistence
4. Returns a runnable object with .invoke(), .stream(), etc.

Checkpointer Behavior:
- Before executing: Loads previous state from checkpoint (if exists)
- After each node: Saves a new checkpoint with updated state
- Each checkpoint has a unique ID and timestamp
- You can "time travel" by loading any previous checkpoint


In [16]:
# Compile with checkpointer - this enables short-term memory
graph = builder.compile(checkpointer=checkpointer)

### 2.6 - First Conversation in Thread 1
What Happens during Invoke:
1. Check if checkpoint exists for thread_1 (no - this is first message)
2. Initialize state: {"messages": [HumanMessage("Hi! I'm David")]}
3. Save checkpoint (step -1)
4. Execute call_model node
5. Node returns: {"messages": [AIMessage("...")]}
6. Merge into state: {"messages": [HumanMessage, AIMessage]}
7. Save checkpoint (step 0)
8. Return final state

In [17]:
# Configuration specifies which thread (conversation) to use
# thread_id is like session_id but for stateful workflows
config1 = {"configurable": {"thread_id": "thread_1"}}

# First interaction: User introduces themselves
user_content = "Hi! I'm David"
response1 = graph.invoke(
    {"messages": [{"role": "user", "content": user_content}]},
    config1
)

print(f"User: {response1['messages'][-2].content}")  # -2 = second to last (user message)
print(f"Assistant: {response1['messages'][-1].content}")  # -1 = last (AI response)

User: Hi! I'm David
Assistant: Hi David! How can I assist you today?


### 2.7 - Continue Conversation in Same Thread
What Happens during Second Invoke:
1. Check if checkpoint exists for thread_1 (YES - load it)
2. Load previous state: {"messages": [HumanMessage("Hi! I'm David"), AIMessage("...")]}
3. Merge new input: {"messages": [...previous..., HumanMessage("What's my name?")]}
4. Save checkpoint
5. Execute call_model node with FULL history
6. Node sees all 3 messages and can reference "David" from first message
7. Save final checkpoint
8. Return state with all 4 messages

In [18]:
# Second interaction in same thread: Test if model remembers name
user_content = "What's my name?"
response2 = graph.invoke(
    {"messages": [{"role": "user", "content": user_content}]},
    config1  # SAME thread_id as before
)

print(f"User: {response2["messages"][-2].content}")
print(f"Assistant: {response2['messages'][-1].content}\n")
print(f"\nTotal messages in thread_1: {len(response2['messages'])}")

User: What's my name?
Assistant: Your name is David! How can I help you today?


Total messages in thread_1: 4


### 2.8 - New Thread in Isolated Memory
What Happens in New Thread:
1. Check if checkpoint exists for thread_2 (NO - new thread)
2. Initialize fresh state: {"messages": [HumanMessage("What's my name?")]}
3. Execute call_model node
4. Node only sees ONE message (no previous context)
5. AI cannot answer because it has no previous context
6. Save checkpoint for thread_2


In [19]:
user_content = "What's my name?"

# New thread - memory is isolated
# Second conversation in thread "thread_2"
config2 = {"configurable": {"thread_id": "thread_2"}}

response3 = graph.invoke(
    {"messages": [{"role": "user", "content": user_content}]},
    config2 # DIFFERENT thread_id
)
print(f"Thread 2 - User: {response3['messages'][-2].content}")
print(f"Assistant: {response3['messages'][-1].content}")

Thread 2 - User: What's my name?
Assistant: I'm sorry, but I don't have access to personal information about you unless you've shared it in this conversation. How can I assist you today?


### 2.9: Return to Thread 1 for Memory Persistence


In [20]:
user_content = "What's my name?"

# Continue conversation in same thread
response2 = graph.invoke(
    {"messages": [{"role": "user", "content": user_content}]},
    config1
)
print(f"User: {response2["messages"][-2].content}")
print(f"Assistant: {response2['messages'][-1].content}\n")

User: What's my name?
Assistant: Your name is David. If there's anything else you'd like to discuss or ask, feel free!



1. Thread Isolation:
   - Each thread_id has its own separate conversation history
   - Changes in one thread don't affect other threads
   - Like having multiple separate chat windows

2. Checkpoint Persistence:
   - State is automatically saved after each step
   - Can resume conversations at any time
   - Full history is preserved across invocations

3. State Accumulation:
   - Messages accumulate in the state with each interaction
   - Each invoke adds to the existing history (doesn't replace)
   - Full context is available to the model

4. LangGraph vs Basic Chains:
   - Basic chains: Session-based, simpler API
   - LangGraph: Thread-based, full state control, better for complex workflows

## 3. Trim Messages (Managing Context Window)
The Context Windown Problem:
- GPT-4o-mini: ~128K tokens context window
- Long conversation: May have 50+ exchanges = potentially 100K+ tokens
- What happens when you exceed the limit?
  → Error: "This model's maximum context length is X tokens..."
  → Solution: Trim messages to stay under the limit

Key Concepts:
- Context Window: Maximum number of tokens the LLM can process at once
- Trimming: Keeping only the N most recent messages
- RemoveMessage: Special message type that deletes messages from state
- Permanent Deletion: Unlike just reading fewer messages, we delete from checkpoint

Trimming Strategies:
1. Keep last N messages (what we'll demonstrate)
2. Keep last N tokens (more precise but requires token counting)
3. Keep first message + last N messages (preserves system prompt)
4. Smart trimming (keep important messages, remove filler)

When to Use Trimming:
- Long-running conversations (customer support, tutoring)
- When recent context is most important
- When you want to control memory usage
- When conversation history isn't critical

Tradeoffs:
- Pros: Stays within context limits, reduces token costs, faster processing
- Cons: Loses conversation history, may forget important early context

Comparison with Other Strategies:
- Trimming: Fast, simple, but loses information
- Summarization (Section 4): Preserves key info but costs more tokens
- External memory: Store full history in database, retrieve relevant parts

### 3.1 - Import RemoveMessage
- RemoveMessage is a special message type that signals deletion
- When you return {"messages": [RemoveMessage(id=msg_id)]}, LangGraph removes that message from the state

IMPORTANT: RemoveMessage only works with MessagesState because MessagesState uses the add_messages reducer which handles RemoveMessage specially.

In [21]:
from langchain_core.messages import trim_messages, RemoveMessage

### 3.2 - Define Node with Trimming Logic


In [22]:
def call_model_with_trimming(state: MessagesState):
    """
    Node that trims messages before calling model AND updates state.

    This function implements a simple trimming strategy:
    - Keep only the last 4 messages (2 user + 2 assistant exchanges)
    - Permanently delete older messages from the state
    - Call the LLM with only the recent context

    TRIMMING LOGIC BREAKDOWN:
    1. Check if we have more than 4 messages
    2. If yes: Keep last 4, create RemoveMessage for the rest
    3. Call LLM with trimmed context
    4. Return deletions + new response

    Args:
        state: MessagesState with full conversation history

    Returns:
        Dictionary with messages to delete and new AI response
    """
    messages = state["messages"]

    # Only keep last 4 messages (2 exchanges) to demonstrate trimming
    if len(messages) > 4:
        # Trim to keep only recent messages
        trimmed = messages[-4:]

        # Delete old messages from state permanently
        messages_to_delete = [RemoveMessage(id=m.id) for m in messages[:-4]]

        # Call model with trimmed messages
        response = llm.invoke(trimmed)

        # Return both the new response and deletions
        return {"messages": messages_to_delete + [response]}
    else:
        # Not enough messages to trim yet
        response = llm.invoke(messages)
        return {"messages": [response]}

### 3.3 - Build Graph with Trimming

In [23]:
# Build a new graph with our trimming node
trim_builder = StateGraph(MessagesState)
trim_builder.add_node("call_model", call_model_with_trimming)
trim_builder.add_edge(START, "call_model")

# Compile with a NEW checkpointer (separate from previous examples)
trim_graph = trim_builder.compile(checkpointer=InMemorySaver())

# Use a unique thread_id for this demonstration
config = {"configurable": {"thread_id": "trim_thread"}}

### 3.4 - Build Up Conversation History

In [24]:
print("Building up conversation history...")
# Exchange 1: User introduces themselves
trim_graph.invoke({"messages": [{"role": "user", "content": "Hi, my name is David"}]}, config)

# Exchange 2: Discuss Python
trim_graph.invoke({"messages": [{"role": "user", "content": "I love Python programming"}]}, config)

# Exchange 3: Discuss teaching
trim_graph.invoke({"messages": [{"role": "user", "content": "I also enjoy teaching"}]}, config)

# Exchange 4: Request a poem
trim_graph.invoke({"messages": [{"role": "user", "content": "Write me a poem about NLP"}]}, config)

Building up conversation history...


{'messages': [AIMessage(content="That's great to hear, David! Python is a versatile and powerful programming language. What do you enjoy most about it? Are you working on any specific projects or learning something new?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 36, 'prompt_tokens': 35, 'total_tokens': 71, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaPnAD956YM9Q7I4j2eQMsE789gg5', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--3fb354e8-ece6-4168-9ddf-99b2d54a68e4-0', usage_metadata={'input_tokens': 35, 'output_tokens': 36, 'total_tokens': 71, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}),

### 3.5 - Test Memory After Trimming

In [25]:
# Now ask about early conversation - it will be trimmed out
response = trim_graph.invoke({"messages": [{"role": "user", "content": "What's my name?"}]}, config)
print(f"\nAfter 4 exchanges, asking: What's my name?")
print(f"Assistant: {response['messages'][-1].content}")


After 4 exchanges, asking: What's my name?
Assistant: I'm sorry, but I don't have access to personal information about users unless you share it with me during our conversation. If you'd like to tell me your name, I'd be happy to use it!


### 3.6 - Verify State Contents

In [26]:
# Get the current state to see what's actually stored
trim_state = trim_graph.get_state(config)
print(f"\nTotal messages in checkpoint: {len(trim_state.values['messages'])}")
print("\nActual messages stored:")

for i, msg in enumerate(trim_state.values['messages'], 1):
    print(f"  {i}. {msg.type}: {msg.content[:70]}...")


Total messages in checkpoint: 5

Actual messages stored:
  1. ai: That's wonderful! Teaching can be incredibly rewarding, especially whe...
  2. human: Write me a poem about NLP...
  3. ai: In the realm where words take flight,  
A dance of language, pure deli...
  4. human: What's my name?...
  5. ai: I'm sorry, but I don't have access to personal information about users...


1. Permanent Deletion:
   - RemoveMessage permanently deletes from checkpoint
   - Different from just reading fewer messages
   - Reduces memory/storage usage

2. Context Window Management:
   - Keeps conversation within token limits
   - Prevents "context length exceeded" errors
   - Makes responses faster (less to process)

3. Tradeoffs: Efficiency vs Memory:
   - Gains: Faster, cheaper, stays within limits
   - Cost: Loses conversation history, may forget important info

4. When to Use:
   - Long-running conversations
   - When recent context matters most
   - When you need strict token control
   - Customer support (recent issue more important than history)

5. Alternatives to Consider:
   - Summarization (On Thursday): Keeps compressed history
   - External memory: Store full history, retrieve relevant parts
   - Smart trimming: Keep important messages, remove filler