# Build a chatbot with memory

If you build a chatbot with Langchain, you have several ways to incorporate `memory`.
- RunnableWithMessageHistory: For more details, you can check this [page](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html). There is a tutorial which you can follow [here](https://python.langchain.com/v0.2/docs/tutorials/chatbot/)
- BaseChatMessageHistory
- LangGraph persistence: https://langchain-ai.github.io/langgraph/concepts/persistence/


In this notebook, we will use LangGraph persistence to `persist memory`.

**Do not have any folder named as langchain in your project, this will make you receive `module 'langchain' has no attribute 'verbose'`**

In [2]:
from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage, AIMessage

## 1. Build a basic chatbot

In this step, we build a basic model by using `langchain_ollama`. You will notice by default, the LLM model does not have memory of previous conversation. To make it aware of previous conversation, we need to add the previous conversation into the context of the current query.


In [3]:
model_name = "steamdj/llama3.1-cpu-only"
model = ChatOllama(model=model_name,temperature=0)

In [4]:
name = "pengfei"
hm1=f"Hi! I'm {name}!"
am1=f"Hello {name}! How can I assist you today?"
hm2="What's my name?"

In [5]:
model.invoke([HumanMessage(content=f"{hm1}")])

AIMessage(content="Nice to meet you, Pengfei! How's your day going so far? Is there something on your mind that you'd like to chat about or is this just a friendly hello?", additional_kwargs={}, response_metadata={'model': 'steamdj/llama3.1-cpu-only', 'created_at': '2025-01-21T09:40:50.989692469Z', 'done': True, 'done_reason': 'stop', 'total_duration': 88136500980, 'load_duration': 34434082812, 'prompt_eval_count': 18, 'prompt_eval_duration': 5332000000, 'eval_count': 39, 'eval_duration': 48328000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-ad10f293-e961-42d5-ac2c-d899de42e336-0', usage_metadata={'input_tokens': 18, 'output_tokens': 39, 'total_tokens': 57})

In [6]:
model.invoke([HumanMessage(content=f"{hm2}")])

AIMessage(content="I'm happy to chat with you, but I don't actually know your name. This is the beginning of our conversation, and we haven't had a chance to discuss it yet! If you'd like to share your name with me, I'd be delighted to learn it.", additional_kwargs={}, response_metadata={'model': 'steamdj/llama3.1-cpu-only', 'created_at': '2025-01-21T09:41:18.168121829Z', 'done': True, 'done_reason': 'stop', 'total_duration': 24299335557, 'load_duration': 346888718, 'prompt_eval_count': 15, 'prompt_eval_duration': 2397000000, 'eval_count': 57, 'eval_duration': 21532000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-3ed26120-12e3-45bd-864f-5375dd7d3681-0', usage_metadata={'input_tokens': 15, 'output_tokens': 57, 'total_tokens': 72})

You can notice that, the model has no memory about our conversation. To add memory to the LLM model, we need to build the conversation history manually.

In [11]:
model.invoke([
              HumanMessage(content=f"{hm1}"),
              AIMessage(content=f"{am1}"),
              HumanMessage(content=f"{hm2}")
              ],
              )

AIMessage(content='Your name is Pengfei!', additional_kwargs={}, response_metadata={'model': 'steamdj/llama3.1-cpu-only', 'created_at': '2025-01-21T09:02:30.620387429Z', 'done': True, 'done_reason': 'stop', 'total_duration': 46436931813, 'load_duration': 17402189426, 'prompt_eval_count': 45, 'prompt_eval_duration': 24298000000, 'eval_count': 8, 'eval_duration': 3649000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-3c110ec0-1721-42b7-8318-4bd55d6dc44b-0', usage_metadata={'input_tokens': 45, 'output_tokens': 8, 'total_tokens': 53})

And now we can see that the model give us a good response!

This is the basic idea underpinning a chatbot's ability to interact conversationally. So how do we best implement this?

## 2. Message persistence

[LangGraph](https://langchain-ai.github.io/langgraph/) is a library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. It implements a built-in persistence layer, making it ideal for chat applications that support multiple conversational turns.

You can visit their [pypi page](https://pypi.org/project/langgraph/) or [github page](https://github.com/langchain-ai/langgraph) for more information.

In this tutorial, we will embed our chat model into a minimal LangGraph application which allows us to automatically persist the message history, simplifying the development of multi-turn applications.

LangGraph comes with a simple in-memory checkpointer, which we use below. See its documentation for more detail, including how to use different persistence backends (e.g., SQLite or Postgres).

In [7]:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

# Define a new graph
workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
    response = model.invoke(state["messages"])
    return {"messages": response}


# Define the (single) node in the graph
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

# Add memory
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

In [8]:

config1 = {"configurable": {"thread_id": "abc123"}}

In [9]:
# check the query message
print(f"query message: {hm1}")

input_messages = [HumanMessage(hm1)]
output = app.invoke({"messages": input_messages}, config1)
output["messages"][-1].pretty_print()  # output contains all messages in state


Nice to meet you, Pengfei! How's your day going so far? Is there something on your mind that you'd like to chat about or is this just a friendly hello?


In [10]:
# check the query message
print(f"query message: {hm2}")

# now let's check if the chatbot remembers my name or not
input_messages = [HumanMessage(hm2)]
output = app.invoke({"messages": input_messages}, config1)
output["messages"][-1].pretty_print()


query message: What's my name?

Your name is Pengfei!


In [11]:
# check the query message
print(f"query message: {hm2}")

# define a new config
config2 =  {"configurable": {"thread_id": "abc234"}}

# now let's check if the chatbot remembers my name or not
input_messages = [HumanMessage(hm2)]
output = app.invoke({"messages": input_messages}, config2)
output["messages"][-1].pretty_print()

query message: What's my name?

I'm happy to chat with you, but I don't actually know your name. This is the beginning of our conversation, and we haven't had a chance to discuss it yet! If you'd like to share your name with me, I'd be delighted to learn it.


You can notice that with a new config, the conversation memory stats from zero. Because the chat thread is different from the config1. Now let's check if the memory is still there with config1.

In [12]:
# check the query message
print(f"query message: {hm2}")

# now let's check if the chatbot remembers my name or not
input_messages = [HumanMessage(hm2)]
output = app.invoke({"messages": input_messages}, config1)
output["messages"][-1].pretty_print()

query message: What's my name?

Your name is Pengfei! (I remember from earlier!)


You can notice the chatbot still remember my name with the right thread id. It proves that the conversation history are persisted in the backend database.

In [None]:
## 3. Async support