# Add Memory to Chatbot

In order to make your application chat back and forth with users, you need to store prior conversations and relevant context. LLMs are stateless and hence each time you prompt, a new response is generated. To provide historical information to the model, you need a memory system that can keep track of previous conversations and context.

A simple way to build a chatbot memory system is to store and reuse the history of all chat interactions between the user and the model. The state of this system can be stored as a list of messages, updated by appending recent messages after each turn, appeneded into the prompt by inserting the messages into the prompt.

In [4]:
from langchain_core.prompts import ChatPromptTemplate
# from langchain_openai import ChatOpenAI
# from langchain_ollama import ChatOllama
from langchain_mistralai import ChatMistralAI
import os

prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful assistant. Answer all questions to the best 
        of your ability."""),
    ("placeholder", "{messages}"),
])

if 'MISTRAL_API_KEY' not in os.environ:
    raise ValueError('MISTRAL_API_KEY required to work with ChatMistralAI')

# model = ChatOpenAI()
# model = ChatOllama(model='gemma3:latest', temperature=0)
model = ChatMistralAI(
    model='mistral-small-latest',
    temperature=0
)

chain = prompt | model

response = chain.invoke({
    "messages": [
        ("human","""Translate this sentence from English to French: I love 
            programming."""),
        ("ai", "J'adore programmer."),
        ("human", "What did you just say?"),
    ],
})
response.content

'I translated the sentence "I love programming" from English to French as "J\'adore programmer."'

Some of the challenges with this approach to store interactions are below.
- You need to update memory after each interaction atomically.
- You'll have to store these memories in durable storage like RDBMs.
- You also have to control how many and which messages are stored for later use and how many of those are used for new interactions.
- You'll wan to inspect and modify state outside a call to an LLM.

## LangGraph Intro

LangGraph was designed to enable developers to implement multiactor, multistep, stateful cognitive architectures called graphs. 
Multiple LLMs or even in combination with web search, we can build complex systems. An application with multiple actors need a coordiantion layer to do these things.
- Define the actors involved and how they hand off work to each other (edges of that graph).
- Schedule execution of each actor at an appropriate time - in parallel if needed - with deterministic results.

When each actor hand off to another, you need to understand back-and-forth between multiple actors. You need to know what order it happens in, how many times each actor is called, etc. Communication across steps require tracking state otherwise when you call the LLM actor the second time, you will get the same result as the first time. It's helpful to pull this state out of each actors and have all actors collaborate on updating a single central state. With single state, you can 
- snapshot and store the central state during or after each computation
- pause and resume execution, which makes it easy to recover from errors.
- implement human-in-the-loop controls

Each graph is made up of below components.
- State: The data received from outside the application, modified and produced by the application while it's running
- Nodes: Each step is called a node. Nodes are Python functions which receive the current state as input and can return an updte to that state
- Edges: The connection between nodes. Edges determine the path taken from the first node to the last or they can be fixed or conditional.

LangGraph offers utilities to visualize these graphs and numerous features to debug their workings while developing them. These graphs can then be deployed to serve production workloads at scale. 

You can add LangGraph using `uv` with below command.

```shell
uv add langgraph
```

### Create StateGraph

First, create `StateGraph`.

For defining graph, first define the state of the graph. The state consists of the shape, schema of the graph state and a reducer functions which specify how to update the state. In this case, the state is a dictionary with single key `messages`. The `messages` key is annotated with `add_messages` reducer function. This function tells LangGraph to append new messages to the existing list.

State keys without an annotation will be overwritten by each update, storing the most recent value. You can also use your own reducer which receive two arguments:
1. current state
2. The next value being written to the state
and it returns the next state which is the result of merging the current state with the new value.


In [5]:
from typing import Annotated, TypedDict

from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages


class State(TypedDict):
    # Messages have the type "list". The `add_messages` 
    # function in the annotation defines how this state should 
    # be updated (in this case, it appends new messages to the 
    # list, rather than replacing the previous messages)
	messages: Annotated[list, add_messages]

builder = StateGraph(State)

Every `node` you define will receive the current `State` as input and return a value that updates that state. Next, add `chatbot` node. Nodes represent units of work which are typically just functions.

In [6]:
# from langchain_ollama import ChatOllama

# model = ChatOllama(
#     model = 'gemma3:latest',
#     temperature=0.1
# )
from langchain_mistralai import ChatMistralAI

model = ChatMistralAI(
    model = 'mistral-small-latest',
    temperature=0.1
)

def chatbot(state: State):
    answer = model.invoke(state['messages'])
    return {'messages': [answer]}

# The first argument is the unique node name
# The second argument is the function or Runnable to run
builder.add_node("chatbot", chatbot)

<langgraph.graph.state.StateGraph at 0x11a9fdc70>

This node receives current state, calls LLM and returns an update to the state containing the new message by the LLM. Finally, add the edges.

In [7]:
builder.add_edge(START, 'chatbot')
builder.add_edge('chatbot', END)

graph = builder.compile()

This tells the graph where to start its work each time you run it. This instructs the graph where it should exit (LangChain will stop execution once there's no more nodes to run). It compiles the graph into a runnable object with `invoke` and `stream` methods.

You can also draw a visual representation of the graph using below code.

```python
graph.get_graph().draw_mermaid_png()
```

You can run it graph with `stream()` method.

In [8]:
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage

input = {"messages": [HumanMessage('hi!')]}
for chunk in graph.stream(input):
    print(chunk)

{'chatbot': {'messages': [AIMessage(content='Hello! ðŸ˜Š How can I help you today?', additional_kwargs={}, response_metadata={'token_usage': {'prompt_tokens': 5, 'total_tokens': 18, 'completion_tokens': 13}, 'model_name': 'mistral-small-latest', 'model': 'mistral-small-latest', 'finish_reason': 'stop'}, id='run--1a6cd2d2-3891-4945-a58e-a267fdcfae9e-0', usage_metadata={'input_tokens': 5, 'output_tokens': 13, 'total_tokens': 18})]}}


The graph has input of the same shape as the defined `State` object.

## Adding Memory to StateGraph

LangGraph has built-in persistence. You will recompile your graph, attaching a checkpointer (which is a storage adapter for LangChain). LangChain ships with a base class that any user can subclass to create an adapter for their favorite database. LangChain also ships with several adapters maintained by LangChain.
- An in-memory adapter
- SQLite Adapter
- Postgres adapter

In [9]:
from langgraph.checkpoint.memory import MemorySaver

graph = builder.compile(checkpointer=MemorySaver())

This returns a runnable objet with same methods, but now it stores the state at the end of each step, so every invocation after the first doesn't start from a blank state. Every time the graph is called, it starts by using the checkpointer to fetch the most recent saved state and combines the new input with the previous state.

In [10]:
thread1 = {"configurable": {"thread_id": "1"}}
result_1 = graph.invoke(
    { "messages": [HumanMessage("hi, my name is Jack!")] }, 
    thread1
)
result_2 = graph.invoke(
    { "messages": [HumanMessage("what is my name?")] }, 
    thread1
)

In LangChain, each interaction belongs to a particular history of interactions called threads. Threads are created automatically when first used. The threads allow LLM application to be used by multiple users with independent conversations which are never mixed up.

First time you call the `chatbot` node with single message, it returns another message and both of these are saved in the state. The second time you execute the graph on same thread, the `chatbot` node is called with three messages.

You can inspect and update the state directly.

In [11]:
graph.get_state(thread1)

StateSnapshot(values={'messages': [HumanMessage(content='hi, my name is Jack!', additional_kwargs={}, response_metadata={}, id='c52cc6cc-61de-4187-9888-3f1a692b0634'), AIMessage(content='Hi Jack! Nice to meet you. ðŸ˜Š How can I help you today?', additional_kwargs={}, response_metadata={'token_usage': {'prompt_tokens': 10, 'total_tokens': 29, 'completion_tokens': 19}, 'model_name': 'mistral-small-latest', 'model': 'mistral-small-latest', 'finish_reason': 'stop'}, id='run--b7b42cb5-a0f3-4dc8-b707-6f30b16f54c3-0', usage_metadata={'input_tokens': 10, 'output_tokens': 19, 'total_tokens': 29}), HumanMessage(content='what is my name?', additional_kwargs={}, response_metadata={}, id='86d9411c-b81e-48e3-951b-f76e52d265d3'), AIMessage(content='Your name is **Jack**â€”thatâ€™s what you told me! ðŸ˜Š\n\n(Unless youâ€™d like to change it or clarify anything, of course!)', additional_kwargs={}, response_metadata={'token_usage': {'prompt_tokens': 36, 'total_tokens': 70, 'completion_tokens': 34}, 'mo

You can update the state like this.

In [12]:
graph.update_state(thread1, {'messages': [HumanMessage("I like LLMs!")]})

{'configurable': {'thread_id': '1',
  'checkpoint_ns': '',
  'checkpoint_id': '1f0e5b2d-0d37-6560-8005-80a66adc0a06'}}

Above code added one message to the list of messages in the state.

In [13]:
graph.get_state(thread1)

StateSnapshot(values={'messages': [HumanMessage(content='hi, my name is Jack!', additional_kwargs={}, response_metadata={}, id='c52cc6cc-61de-4187-9888-3f1a692b0634'), AIMessage(content='Hi Jack! Nice to meet you. ðŸ˜Š How can I help you today?', additional_kwargs={}, response_metadata={'token_usage': {'prompt_tokens': 10, 'total_tokens': 29, 'completion_tokens': 19}, 'model_name': 'mistral-small-latest', 'model': 'mistral-small-latest', 'finish_reason': 'stop'}, id='run--b7b42cb5-a0f3-4dc8-b707-6f30b16f54c3-0', usage_metadata={'input_tokens': 10, 'output_tokens': 19, 'total_tokens': 29}), HumanMessage(content='what is my name?', additional_kwargs={}, response_metadata={}, id='86d9411c-b81e-48e3-951b-f76e52d265d3'), AIMessage(content='Your name is **Jack**â€”thatâ€™s what you told me! ðŸ˜Š\n\n(Unless youâ€™d like to change it or clarify anything, of course!)', additional_kwargs={}, response_metadata={'token_usage': {'prompt_tokens': 36, 'total_tokens': 70, 'completion_tokens': 34}, 'mo

## Modifying Chat History

You can modify chat history in three ways: trimming, filtering and merging messages.

### 1. Trimming Messages

LLMs have limited context windows. The final prompt sent to the model shouldn't exceed the limit of context window. If you have long chat history and want to load them into prompt, you may need to trim messages. LangChain provides built-in `trim_messages` that includes various strategies to do this.

Below example retrieves the last `max_tokens` in the list of messages by setting a strategy to `"last"`.

In [19]:
from langchain_core.messages import SystemMessage, trim_messages
# from langchain_ollama import ChatOllama
from langchain_mistralai import ChatMistralAI

trimmer = trim_messages(
    max_tokens=25,
    strategy="last",
    # token_counter=ChatOllama(model="gemma3:latest"),
    token_counter=ChatMistralAI(model='mistral-small-latest'),
    include_system=True,
    allow_partial=False,
    start_on="human",
)

messages = [
    SystemMessage(content="you're a good assistant"),
    HumanMessage(content="hi! I'm bob"),
    AIMessage(content="hi!"),
    HumanMessage(content="I like vanilla ice cream"),
    AIMessage(content="nice"),
    HumanMessage(content="what's 2 + 2"),
    AIMessage(content="4"),
    HumanMessage(content="thanks"),
    AIMessage(content="no problem!"),
    HumanMessage(content="having fun?"),
    AIMessage(content="yes!"),
]

trimmer.invoke(messages)

[SystemMessage(content="you're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='thanks', additional_kwargs={}, response_metadata={}),
 AIMessage(content='no problem!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='having fun?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='yes!', additional_kwargs={}, response_metadata={})]

The `strategy` controls whether to start from the beginning or the end of the list. The `token_counter` is an LLM which will be used to count tokens using tokenizer appropriate to the model. The parameter `start_on` ensures that we never remove an `AIMessage` without also removing a corresponding `HumanMessage` from the chat history.

### 2. Filtering Messages

LangChain's `filter_messages` makes it easy to filter the chat history by type, ID or name.

In [20]:
from langchain_core.messages import (
    AIMessage,
    HumanMessage,
    SystemMessage,
    filter_messages,
)

messages = [
    SystemMessage("you are a good assistant", id="1"),
    HumanMessage("example input", id="2", name="example_user"),
    AIMessage("example output", id="3", name="example_assistant"),
    HumanMessage("real input", id="4", name="bob"),
    AIMessage("real output", id="5", name="alice"),
]

filter_messages(messages, include_types="human")

[HumanMessage(content='example input', additional_kwargs={}, response_metadata={}, name='example_user', id='2'),
 HumanMessage(content='real input', additional_kwargs={}, response_metadata={}, name='bob', id='4')]

You can see that this tends to keep `HumanMessage`. Below code, on the other hand, exclude users and IDs and include message types.

In [22]:
filter_messages(messages, exclude_names=["example_user", "example_assistant"])

filter_messages(
    messages, 
    include_types=[HumanMessage, AIMessage], 
    exclude_ids=["3"]
)

[HumanMessage(content='example input', additional_kwargs={}, response_metadata={}, name='example_user', id='2'),
 HumanMessage(content='real input', additional_kwargs={}, response_metadata={}, name='bob', id='4'),
 AIMessage(content='real output', additional_kwargs={}, response_metadata={}, name='alice', id='5')]

You could also use `filter_messages` as a declarative syntax.

In [23]:
model = ChatMistralAI(model='mistral-small-latest')
filter_ = filter_messages(exclude_names=['example_user', 'example_assistant'])
chain = filter_ | model

### 3. Merging Consecutive Messages

LangChain's `merge_message_runs` makes it easy to merge consecutive messages of the same type.

In [24]:
from langchain_core.messages import (
    AIMessage,
    HumanMessage,
    SystemMessage,
    merge_message_runs,
)

messages = [
    SystemMessage("you're a good assistant."),
    SystemMessage("you always respond with a joke."),
    HumanMessage(
        [{"type": "text", "text": "i wonder why it's called langchain"}]
    ),
    HumanMessage("and who is harrison chasing anyway"),
    AIMessage(
        '''Well, I guess they thought "WordRope" and "SentenceString" just 
        didn\'t have the same ring to it!'''
    ),
    AIMessage("""Why, he's probably chasing after the last cup of coffee in the 
        office!"""),
]

merge_message_runs(messages)

[SystemMessage(content="you're a good assistant.\nyou always respond with a joke.", additional_kwargs={}, response_metadata={}),
 HumanMessage(content=[{'type': 'text', 'text': "i wonder why it's called langchain"}, 'and who is harrison chasing anyway'], additional_kwargs={}, response_metadata={}),
 AIMessage(content='Well, I guess they thought "WordRope" and "SentenceString" just \n        didn\'t have the same ring to it!\nWhy, he\'s probably chasing after the last cup of coffee in the \n        office!', additional_kwargs={}, response_metadata={})]

In [25]:
model = ChatMistralAI(model='mistral-small-latest')
merger = merge_message_runs()
chain = merger | model