Let's build a chatbot using Langchain, LangGraph, Claude Sonnet 3.5 LLM, Amazon Bedrock, and Langchain Documentation that will be able to have a conversation and remember previous interactions.

[Langchain Chatbot](https://python.langchain.com/docs/tutorials/chatbot/)

Let's install the necessary libraries, and put in the AWS credentials!

In [1]:
%pip install langchain-core langgraph>0.2.27 langchain-aws

In [2]:
# Let's check the version of langchain
import langchain
print(langchain.__version__)

0.3.25


In [3]:
# This is an important piece of code to run because  it enables the chatbot to remember past user inputs and respond in a context-aware manner.
from langchain_core.runnables import RunnableWithMessageHistory

In [5]:
# Ensure your AWS credentials are configured

from langchain.chat_models import init_chat_model

model = init_chat_model("anthropic.claude-3-5-sonnet-20240620-v1:0", model_provider="bedrock_converse")

Let's first use the model directly. ChatModels are instances of LangChain "Runnables", which means they expose a standard interface for interacting with them. To just simply call the model, we can pass in a list of messages to the .invoke method.

In [6]:
from langchain_core.messages import HumanMessage

model.invoke([HumanMessage(content="Hi! My name is Surya Komandooru.")])

AIMessage(content="Hello Surya Komandooru! It's nice to meet you. How can I assist you today? Is there anything specific you'd like to know or discuss?", additional_kwargs={}, response_metadata={'ResponseMetadata': {'RequestId': 'a0636207-277c-48f4-91c1-a0edf7c1cb7a', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sun, 01 Jun 2025 03:33:36 GMT', 'content-type': 'application/json', 'content-length': '314', 'connection': 'keep-alive', 'x-amzn-requestid': 'a0636207-277c-48f4-91c1-a0edf7c1cb7a'}, 'RetryAttempts': 0}, 'stopReason': 'end_turn', 'metrics': {'latencyMs': [1188]}, 'model_name': 'anthropic.claude-3-5-sonnet-20240620-v1:0'}, id='run--9687478e-4c44-4af3-8ef1-ef24973d5503-0', usage_metadata={'input_tokens': 21, 'output_tokens': 39, 'total_tokens': 60, 'input_token_details': {'cache_creation': 0, 'cache_read': 0}})

The model on its own does not have any concept of state. For example, if you ask a followup question:

In [7]:
model.invoke([HumanMessage(content="Can you tell me what my name is?")])

AIMessage(content="I apologize, but I don't have access to your personal information, including your name. As an AI language model, I don't have any information about the individuals I interact with unless it's provided within the conversation. If you'd like, you can tell me your name, and I'll be happy to use it in our conversation.", additional_kwargs={}, response_metadata={'ResponseMetadata': {'RequestId': '57b6e1ef-7f79-488e-af09-d47759d45040', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sun, 01 Jun 2025 03:33:38 GMT', 'content-type': 'application/json', 'content-length': '498', 'connection': 'keep-alive', 'x-amzn-requestid': '57b6e1ef-7f79-488e-af09-d47759d45040'}, 'RetryAttempts': 0}, 'stopReason': 'end_turn', 'metrics': {'latencyMs': [1806]}, 'model_name': 'anthropic.claude-3-5-sonnet-20240620-v1:0'}, id='run--1754f5cd-da8e-4144-8de2-db67a6723974-0', usage_metadata={'input_tokens': 16, 'output_tokens': 72, 'total_tokens': 88, 'input_token_details': {'cache_creation': 0, 'cac

We can see that it doesn't take the previous conversation turn into context, and cannot answer the question. This makes for a terrible chatbot experience!

To get around this, we need to pass the entire conversation history into the model. Let's see what happens when we do that:

In [8]:
from langchain_core.messages import AIMessage

model.invoke(
    [
        HumanMessage(content="Hi! My name is Surya Komandooru."),
        AIMessage(content="Hello Surya Komandooru! It's nice to meet you. How can I assist you today? Is there anything specific you'd like to know or discuss?"),
        HumanMessage(content="Can you tell me what my name is?"),
    ]
)

AIMessage(content='Your name is Surya Komandooru, as you introduced yourself at the beginning of our conversation.', additional_kwargs={}, response_metadata={'ResponseMetadata': {'RequestId': 'ff6b5124-3c92-4124-8c7e-4091235d1575', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sun, 01 Jun 2025 03:33:38 GMT', 'content-type': 'application/json', 'content-length': '276', 'connection': 'keep-alive', 'x-amzn-requestid': 'ff6b5124-3c92-4124-8c7e-4091235d1575'}, 'RetryAttempts': 0}, 'stopReason': 'end_turn', 'metrics': {'latencyMs': [683]}, 'model_name': 'anthropic.claude-3-5-sonnet-20240620-v1:0'}, id='run--d64ab3fe-117a-4e54-9d8c-d9ee7ab5db2e-0', usage_metadata={'input_tokens': 72, 'output_tokens': 26, 'total_tokens': 98, 'input_token_details': {'cache_creation': 0, 'cache_read': 0}})

In [9]:
from langchain_core.messages import AIMessage

model.invoke(
    [
        HumanMessage(content="Hi! My name is Surya Komandooru."),
        AIMessage(content="Hello Surya Komandooru! It's nice to meet you. How can I assist you today? Is there anything specific you'd like to know or discuss?"),
        HumanMessage(content="Can you tell me what my last name is? Can you asist me with adding 1+1?"),
    ]
)

AIMessage(content="Certainly! I'd be happy to help you with both questions.\n\n1. Your last name, as you've provided, is Komandooru.\n\n2. Regarding the addition:\n   1 + 1 = 2\n\nThis is a basic arithmetic operation. The sum of 1 and 1 is always 2.\n\nIs there anything else you'd like assistance with?", additional_kwargs={}, response_metadata={'ResponseMetadata': {'RequestId': '99c7a0da-544b-4e7a-baee-23537aea443a', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sun, 01 Jun 2025 03:33:41 GMT', 'content-type': 'application/json', 'content-length': '467', 'connection': 'keep-alive', 'x-amzn-requestid': '99c7a0da-544b-4e7a-baee-23537aea443a'}, 'RetryAttempts': 0}, 'stopReason': 'end_turn', 'metrics': {'latencyMs': [2175]}, 'model_name': 'anthropic.claude-3-5-sonnet-20240620-v1:0'}, id='run--9f57dae4-68cf-45bc-b02e-1be1fc6a9c3c-0', usage_metadata={'input_tokens': 85, 'output_tokens': 89, 'total_tokens': 174, 'input_token_details': {'cache_creation': 0, 'cache_read': 0}})

And now we can see that we got good responses!

LangGraph implements a built-in persistence layer, making it ideal for chat applications that support multiple conversational turns.

Wrapping our chat model in a minimal LangGraph application allows us to automatically persist the message history, simplifying the development of multi-turn applications.

LangGraph comes with a simple in-memory checkpointer, which we use below.

In [10]:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

# Define a new graph
workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
    response = model.invoke(state["messages"])
    return {"messages": response}


# Define the (single) node in the graph
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

# Add memory
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

We now need to create a config that we pass into the runnable every time. This config contains information that is not part of the input directly, but is still useful. In this case, we want to include a thread_id. This should look like:

In [11]:
config = {"configurable": {"thread_id": "abc123"}}

This enables us to support multiple conversation threads with a single application, a common requirement when your application has multiple users.

We can then invoke the application:

In [12]:
query = "Hi! I'm Surya."

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()  # output contains all messages in state


Hello Surya! It's nice to meet you. How can I assist you today? Feel free to ask me any questions or let me know if there's anything you'd like to discuss or need help with.


In [13]:
query = "What's my name?"

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


Your name is Surya. You introduced yourself at the beginning of our conversation.


Great! Our chatbot now remembers things about us. If we change the config to reference a different thread_id, we can see that it starts the conversation fresh.

In [15]:
config = {"configurable": {"thread_id": "abc345"}}

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


I apologize, but I don't have any information about your name or personal details. As an AI language model, I don't have access to personal information about individual users. Each conversation starts fresh, and I don't retain information from previous interactions. If you'd like me to use a specific name for you during our conversation, you're welcome to let me know what you'd prefer to be called.


However, we can always go back to the original conversation (since we are persisting it in a database)

In [16]:
config = {"configurable": {"thread_id": "abc123"}}

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


Your name is Surya. You told me this when you first introduced yourself.


This is how we can support a chatbot having conversations with many users!

In [17]:
# Async function for node:
async def call_model(state: MessagesState):
    response = await model.ainvoke(state["messages"])
    return {"messages": response}


# Define graph as before:
workflow = StateGraph(state_schema=MessagesState)
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)
app = workflow.compile(checkpointer=MemorySaver())

# Async invocation:
output = await app.ainvoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


I apologize, but I don't have access to your personal information, including your name. As an AI language model, I don't have any prior context or stored information about individual users. Each conversation starts fresh, and I don't retain information from previous interactions. If you'd like me to use a specific name for you during our conversation, you can let me know, and I'll be happy to address you by that name.


Right now, all we've done is add a simple persistence layer around the model. We can start to make the chatbot more complicated and personalized by adding in a prompt template.

Prompt Templates help to turn raw user information into a format that the LLM can work with. In this case, the raw user input is just a message, which we are passing to the LLM. Let's now make that a bit more complicated. First, let's add in a system message with some custom instructions (but still taking messages as input). Next, we'll add in more input besides just the messages.

To add in a system message, we will create a ChatPromptTemplate. We will utilize MessagesPlaceholder to pass all the messages in.

In [18]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You talk like a pirate. Answer all questions to the best of your ability.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

We can now update our application to incorporate this template:

In [19]:
workflow = StateGraph(state_schema=MessagesState)


def call_model(state: MessagesState):
    prompt = prompt_template.invoke(state)
    response = model.invoke(prompt)
    return {"messages": response}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

We invoke the application in the same way:

In [None]:
config = {"configurable": {"thread_id": "abc678"}}
query = "Hi! I'm Jim."

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()

In [21]:
config = {"configurable": {"thread_id": "abc678"}}
query = "Hi! I'm Pan."

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


Shiver me timbers! First it be Jim, now it be Pan? Ye be changin' names quicker than a chameleon changes colors on the poop deck! But no matter, Pan it is then! 

Ahoy there, Pan! I be right pleased to meet ye, ye scurvy dog! What sort of mischief or adventure be ye seekin' on this fine day? Be ye ready to sail the seven seas, or are ye just lookin' to wet yer whistle with some tales from a salty old sea dog? Speak up, matey, for these old ears have weathered many a storm!


In [22]:
query = "What is my name?"

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


Blimey! Ye've got me in a right pickle, ye have! Let me consult me weathered old memory...

Arr, last ye said, yer name be Pan. But ye started off callin' yerself Jim, ye did. So unless ye be some sort of shapeshiftin' sea witch, I reckon Pan be the name ye want me to use.

But I'll tell ye true, matey - if ye can't keep track o' yer own name, ye might want to have it tattooed on yer arm like a proper pirate! That way, ye'll always know who ye be, even after a barrel o' grog!

So, Pan (or Jim, or whoever ye be), what say ye? Have I got it right, or have ye changed yer name again while I was flappin' me gums?


Awesome! Let's now make our prompt a little bit more complicated. Let's assume that the prompt template now looks something like this:

In [32]:
prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability in {language}.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

Note that we have added a new language input to the prompt. Our application now has two parameters-- the input messages and language. We should update our application's state to reflect this:

In [33]:
from typing import Sequence

from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
from typing_extensions import Annotated, TypedDict


class State(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]
    language: str


workflow = StateGraph(state_schema=State)


def call_model(state: State):
    prompt = prompt_template.invoke(state)
    response = model.invoke(prompt)
    return {"messages": [response]}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

In [38]:
config = {"configurable": {"thread_id": "abc456"}}
query = "Hi! My name is Surya. I like to watch kdramas and listen to k-pop."
language = "Korean Roman Transcript"

input_messages = [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages, "language": language},
    config,
)
output["messages"][-1].pretty_print()


annyeonghaseyo, Surya ssi! keideuramawa keipapeul joahaseundani jeongmal jotsumnida.

keideuramaeseo eotteon jakpumeul gajang joahaseyo? yojeum ingi issneun deuramadeul jungeseoneun "Deo Geullori", "Itaewon Keullasseu", "Seungriui Jogeon" deungi isseumnida.

keipap bunya-eseo-neun eotteon atiseucheu-na geurupeul juro deushinayo? BTS, BLACKPINK, TWICE, EXO gateun yumyeong geurupdeulbuteo NewJeans, IVE gateun sinyae geurupdeul kkaji dayanghan atiseudeuri issjyo.

hanguk munhwae gwansimi maneusin geot gataseo, jeongmal jotseumnida. han-eul deo baeugo sipeusin saenggagi isseumyeon malsseumhae juseyo. jeoneun dangsinui gwansimeul deureoseo gippeumnida!


Note that the entire state is persisted, so we can omit parameters like language if no changes are desired:

In [39]:
query = "What is my name?"

input_messages = [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages},
    config,
)
output["messages"][-1].pretty_print()


Dangsin-ui ireum-eun Surya imnida.


One important concept to understand when building chatbots is how to manage conversation history. If left unmanaged, the list of messages will grow unbounded and potentially overflow the context window of the LLM. Therefore, it is important to add a step that limits the size of the messages you are passing in.

**Importantly, you will want to do this BEFORE the prompt template but AFTER you load previous messages from Message History.**

We can do this by adding a simple step in front of the prompt that modifies the messages key appropriately, and then wrap that new chain in the Message History class.

In [49]:
from langchain_core.messages import SystemMessage, trim_messages

trimmer = trim_messages(
    max_tokens=65,
    strategy="last",
    token_counter=model,
    include_system=True,
    allow_partial=False,
    start_on="human",
)

messages = [
    SystemMessage(content="you're a smart and helpful assistant"),
    HumanMessage(content="hi! I'm Surya"),
    AIMessage(content="hi!"),
    HumanMessage(content="I like mint ice cream"),
    AIMessage(content="nice"),
    HumanMessage(content="what does 4 + 4 = ?"),
    AIMessage(content="8"),
    HumanMessage(content="thanks"),
    AIMessage(content="no problem!"),
    HumanMessage(content="having fun?"),
    AIMessage(content="yes!"),
]

trimmer.invoke(messages)

[SystemMessage(content="you're a smart and helpful assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content="hi! I'm Surya", additional_kwargs={}, response_metadata={}),
 AIMessage(content='hi!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I like mint ice cream', additional_kwargs={}, response_metadata={}),
 AIMessage(content='nice', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='what does 4 + 4 = ?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='8', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='thanks', additional_kwargs={}, response_metadata={}),
 AIMessage(content='no problem!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='having fun?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='yes!', additional_kwargs={}, response_metadata={})]

To use it in our chain, we just need to run the trimmer before we pass the messages input to our prompt.

In [50]:
workflow = StateGraph(state_schema=State)


def call_model(state: State):
    trimmed_messages = trimmer.invoke(state["messages"])
    prompt = prompt_template.invoke(
        {"messages": trimmed_messages, "language": state["language"]}
    )
    response = model.invoke(prompt)
    return {"messages": [response]}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

Now if we try asking the model our name, it won't know it since we trimmed that part of the chat history:

In [47]:
config = {"configurable": {"thread_id": "abc567"}}
query = "What is my name?"
language = "English"

input_messages = messages + [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages, "language": language},
    config,
)
output["messages"][-1].pretty_print()


I apologize for the confusion in my previous response. I made a mistake. The truth is, I don't actually know your name. You haven't told me your name in our conversation, and I don't have any information about your identity. If you'd like, you can tell me your name and I'll remember it for our conversation. Otherwise, I'm happy to continue chatting without knowing your name.


But if we ask about information that is within the last few messages, it remembers:

In [51]:
config = {"configurable": {"thread_id": "abc678"}}
query = "What math problem did I ask?"
language = "English"

input_messages = messages + [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages, "language": language},
    config,
)
output["messages"][-1].pretty_print()


You asked "what does 4 + 4 = ?", which is a basic addition problem.


Yay! We now have functioning chatbot!