# Introduction

## Build a Conversational Web Search ReAct Agent with NVIDIA NIMS

This playbook demonstrates how to enable tool calling with memory on Llama 3.1 70B Instruct model using LangGraph, Tavily. Model inference is completed by API via NVIDIA NIMs. We will first explore implementing each feature/component individually. We explore implementing only persistent memory on LangGraph, then creating a ReAct agent with only Tavily web search tool -- and then lastly how both concepts can be combined for a ReAct agent with conversational memory and web search capabilities.

**Notebook Goals:**

Build a ReAct Agent with
1. Conversational Memory (remembers context, names, roles)
2. Web Search Tool (real-time search using Tavily)
3. NVIDIA Llama 3.1 70B Instruct for inference
4. LangGraph to orchestrate reasoning, memory, and tool use


## Key Components and Why They're Used

| Component              | Purpose                                                        |
|------------------------|----------------------------------------------------------------|
| LangGraph              | Orchestrates state transitions, memory, and tool use           |
| MemorySaver            | Saves state so the agent remembers conversation                |
| TavilySearchResults    | Real-time web search API for factual augmentation              |
| create_react_agent()   | Prebuilt function to create a LangGraph-powered ReAct agent with memory + tools |
| ChatNVIDIA             | Calls NVIDIA's Llama 3.1 70B via API to generate responses     |


# Setup

## Instructions to Configure Jupyter Notebook Environment

**Step 1: Set Up Your Accounts and API Keys (free)**

- Generate a NVIDIA NIMs API Key: [Log in to your account, navigate to your account settings, and generate an API key](https://build.nvidia.com/meta/llama-3_1-70b-instruct).
- Generate a LangSmith API Key: [Log into your account and generate an API key](https://smith.langchain.com/settings)
- Generate a Tavily Web Search Key: [Log into your account and generate an API key](https://app.tavily.com/)

**Step 2: Download the packages**

- Create a jupyter notebook kernel with the required packages (in requirements.txt)

For running the notebook locally:

In [2]:
# !pip install --upgrade pip

In [2]:
!pip install -r requirements.txt

Collecting aiohappyeyeballs==2.6.1 (from -r requirements.txt (line 1))
  Obtaining dependency information for aiohappyeyeballs==2.6.1 from https://files.pythonhosted.org/packages/0f/15/5bf3b99495fb160b63f95972b81750f18f7f4e02ad051373b669d17d44f2/aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata
  Using cached aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata (5.9 kB)
Collecting aiohttp==3.11.16 (from -r requirements.txt (line 2))
  Obtaining dependency information for aiohttp==3.11.16 from https://files.pythonhosted.org/packages/34/23/eedf80ec42865ea5355b46265a2433134138eff9a4fea17e1348530fa4ae/aiohttp-3.11.16-cp311-cp311-macosx_11_0_arm64.whl.metadata
  Using cached aiohttp-3.11.16-cp311-cp311-macosx_11_0_arm64.whl.metadata (7.7 kB)
Collecting aiosignal==1.3.2 (from -r requirements.txt (line 3))
  Obtaining dependency information for aiosignal==1.3.2 from https://files.pythonhosted.org/packages/ec/6a/bc7e17a3e87a2985d3e8f4da4cd0f481060eb78fb08596c42be62c90a4d9/aiosignal-1.3.2-py2.py3-no

For running the notebook on colab:

In [1]:
# !pip install langgraph>=0.2.0 langchain==0.3.18 langchain-community==0.3.17 langchain-core==0.3.49 langchain-nvidia-ai-endpoints==0.3.9 langchain-text-splitters==0.3.6


zsh:1: 0.2.0 not found


In [1]:
!ipython kernel install --user --name=venv_jup

Installed kernelspec venv_jup in /Users/jewel/Library/Jupyter/kernels/venv_jup


Import necessary packages

In [1]:
import os
import json
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph.message import add_messages

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver

from langchain_community.tools.tavily_search import TavilySearchResults
from langgraph.prebuilt import create_react_agent

Loads keys for:
1. NVIDIA NIMs
2. LangSmith (for logging and dashboards)
3. Tavily (web search)

In [2]:
# os.environ["NVIDIA_API_KEY"] = "<INSERT NVIDIA API KEY>"

os.environ["NVIDIA_API_KEY"] = "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"

Conversation history can be accessed in the Langsmith/Langchain dashboard when using this endpoint

In [1]:
# os.environ["LANGCHAIN_TRACING_V2"]="true"
# os.environ["LANGCHAIN_ENDPOINT"]="https://api.smith.langchain.com"
# os.environ["LANGSMITH_API_KEY"]="<INSERT LANGSMITH API KEY>"
# os.environ["LANGCHAIN_PROJECT"]="<INSERT ANY PROJECT NAME>"

os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_ENDPOINT"]="https://api.smith.langchain.com"
os.environ["LANGSMITH_API_KEY"]="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"
os.environ["LANGCHAIN_PROJECT"]="project_name_1"


# os.environ["TAVILY_API_KEY"]="<INSERT TAVILY API KEY>"

os.environ["TAVILY_API_KEY"]="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"

NameError: name 'os' is not defined

For more information on LangGraph visit their [Quickstart Guide](https://langchain-ai.github.io/langgraph/tutorials/introduction/#requirements).

## LLM Initialisation

In [4]:
# Generate response
llm = ChatNVIDIA(
    model="nvdev/meta/llama-3.1-70b-instruct",
    nvidia_api_key=os.environ["NVIDIA_API_KEY"],
)

In [6]:
llm

ChatNVIDIA(base_url='https://integrate.api.nvidia.com/v1', model='nvdev/meta/llama-3.1-70b-instruct')

# 🛠️ First Capability: Memory-Augmented Chatbot (Persistent Memory with LangGraph)

Exploring implementing conversational memory into the LangGraph graphing framework.

Let's build a chatbot that remembers user input (e.g., name), using LangGraph and MemorySaver.



This `State` class is a typed dictionary for storing a conversation’s history. The `messages` key is annotated so new AI or user messages get appended, enabling a memory-based ReAct agent to retain context and maintain an ongoing conversation.

In [7]:
class State(TypedDict):
    messages: Annotated[list, add_messages]

We’re creating a graph that leverages our `State` class to remember all user and AI messages

In [8]:
graph_builder = StateGraph(State)

Define a function that takes the current conversation state and returns a new AI-generated message by invoking the language model on the existing message history.


In [9]:
def chatbot(state: State):
    return {"messages": [llm.invoke(state["messages"])]}

These lines add the chatbot function as a node in the LangGraph workflow, then define the execution flow to start at START, run the chatbot node, and proceed to END after the chatbot generates a response.

In [10]:
# Adding the chatbot node to our graph with a unique name.
graph_builder.add_node("chatbot", chatbot)

# Defining the flow: Start → Chatbot
graph_builder.add_edge(START, "chatbot")

# After the chatbot node, the workflow goes to the End.
graph_builder.add_edge("chatbot", END)

<langgraph.graph.state.StateGraph at 0x122471850>

Initializes a `MemorySaver` to persist conversation state between interactions and compiles the LangGraph workflow with this memory checkpointing, enabling the agent to retain and update its message history across turns.

In [11]:
memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory)

This code initializes a unique session (thread_id), feeds the user's message into the LangGraph workflow, streams the AI's response in real time, and prints the latest message in the conversation

In [12]:
# Identifier for the specific chat
thread_id = "memory_only_2"

config = {"configurable": {"thread_id": thread_id}}

user_input = "Hi! My name is Will"

# Initial state (with user's message) is fed into the workflow
# Chatbot node takes the current conversation and generates a response using llm.invoke, and then appends that response to "messages" list
events = graph.stream(
    {"messages": [{"role": "user", "content": user_input}]},
    config,
    stream_mode="values",
)
for event in events:
    event["messages"][-1].pretty_print()


Hi! My name is Will

Nice to meet you, Will! Is there something I can help you with or would you like to chat for a bit?


This code takes different kinds of objects—like AI or user messages—and turns them into easy-to-read JSON so we can see what’s going on inside the chatbot’s memory.

In [13]:
# # Function to format the output into pretty print
# # Defining custom JSON serialisable to print the indents for the JSON response
# def custom_serializer(obj):
#     # If the object has a method to convert to a dict, use it.
#     if hasattr(obj, "dict"):
#         return obj.dict()
#     # If it's a HumanMessage-like object, try to convert it manually.
#     # Adjust the attribute names as needed for your specific object.
#     if hasattr(obj, "role") and hasattr(obj, "content"):
#         return {"role": obj.role, "content": obj.content}
#     # Fallback: convert to string.
#     return str(obj)

def custom_serializer(obj):
    """
    Converts the given object into a JSON-serializable dictionary or string.
    1. If the object has a dict() method, use that.
    2. If it has 'role' and 'content' attributes (Langchain messages), map them to a dict.
    3. Otherwise, return its string representation.
    """
    if hasattr(obj, "dict"):
        return obj.dict()
    if hasattr(obj, "role") and hasattr(obj, "content"):
        return {"role": obj.role, "content": obj.content}
    return str(obj)

Showcasing the conversational memory property as it is able to remember the user's name.

Why this works:
- MemorySaver stores messages. The LLM sees the whole conversation and answers accordingly.

In [14]:
user_input = "Remember my name?"

events = graph.stream(
    {"messages": [{"role": "user", "content": user_input}]},
    config,
    stream_mode="values",
)
for event in events:
    event["messages"][-1].pretty_print()


Remember my name?

Your name is Will. I'll remember it for our conversation, so feel free to forget and have a nice chat.


### Explainability into how the state changes

`messages` contains the conversational history as a list of dictionaries, with each **content** key storing the message sent by the AI or human. By looking into the memory of the current conversation (`"thread_id": memory_only`), we can observe that the list of AI and human messages are the same ones we've seen before.

In [15]:
print("\n====================== What is happening behind the scenes:======================")
print(json.dumps(memory.get(config), indent=2, default=custom_serializer))


{
  "v": 2,
  "ts": "2025-05-08T06:53:47.232764+00:00",
  "id": "1f02bd93-0a25-641e-8004-ca6cb44da52a",
  "channel_versions": {
    "__start__": "00000000000000000000000000000005.0.5455720153875686",
    "messages": "00000000000000000000000000000006.0.8058260632204819",
    "branch:to:chatbot": "00000000000000000000000000000006.0.3594041242282897"
  },
  "versions_seen": {
    "__input__": {},
    "__start__": {
      "__start__": "00000000000000000000000000000004.0.5037110255252699"
    },
    "chatbot": {
      "branch:to:chatbot": "00000000000000000000000000000005.0.310394043256624"
    }
  },
  "channel_values": {
    "messages": [
      {
        "content": "Hi! My name is Will",
        "additional_kwargs": {},
        "response_metadata": {},
        "type": "human",
        "name": null,
        "id": "9a4ff422-582e-4844-9253-5fb86fca49e1",
        "example": false
      },
      {
        "content": "Nice to meet you, Will! Is there something I can help you with or would you 

Let's take a closer look into how `messages` changes throughout the conversation.

Running the cell below executes 2 runs of the conversation which includes logging of `messages` to study its changes.

<details>
  <summary><strong>More info</strong></summary>

Similar to the previous `chatbot` function built before, `chatbot_with_logging` is built like this:

- The LLM is invoked with the current state of `messages`.

- **new_message**: Single AI message you receive based on the conversational memory

- The **new_message** will be appended to `messages` due to the previously declared `add_messages` annotation in State.

</details>

![image](img/flowchart.png)

In [16]:
def chatbot_with_logging(state: State):
    print("\n")
    print("Current conversation history:", state["messages"])
    print("\n")
    new_message = llm.invoke(state["messages"])
    print("New message from LLM:")
    print(json.dumps(new_message, indent=2, default=custom_serializer))
    print("\n")
    return {"messages": [new_message]}

graph_builder_with_logging = StateGraph(State)

# Adding the chatbot node to our graph with a unique name.
graph_builder_with_logging.add_node("chatbot", chatbot_with_logging)

# Defining the flow: Start → Chatbot
graph_builder_with_logging.add_edge(START, "chatbot")

# After the chatbot node, the workflow goes to the End.
graph_builder_with_logging.add_edge("chatbot", END)

memory_with_logging = MemorySaver()
graph_with_logging = graph_builder_with_logging.compile(checkpointer=memory_with_logging)

# Should be different from the previous in order to get a fresh conversation
thread_id = "memory_only_with_logging_2"

config = {"configurable": {"thread_id": thread_id}}

user_input = "Hi! My name is Will"

events = graph_with_logging.stream(
    {"messages": [{"role": "user", "content": user_input}]},
    config,
    stream_mode="values",
)
for event in events:
    event["messages"][-1].pretty_print()

user_input_2 = "Remember my name?"

events_2 = graph_with_logging.stream(
    {"messages": [{"role": "user", "content": user_input_2}]},
    config,
    stream_mode="values",
)
for event in events_2:
    event["messages"][-1].pretty_print()


Hi! My name is Will


Current conversation history: [HumanMessage(content='Hi! My name is Will', additional_kwargs={}, response_metadata={}, id='c0e7da92-0c83-42a2-b450-e5effbc2af91')]


New message from LLM:
{
  "content": "Nice to meet you, Will! Is there something I can help you with or would you like to chat for a bit?",
  "additional_kwargs": {},
  "response_metadata": {
    "role": "assistant",
    "content": "Nice to meet you, Will! Is there something I can help you with or would you like to chat for a bit?",
    "token_usage": {
      "prompt_tokens": 16,
      "total_tokens": 41,
      "completion_tokens": 25
    },
    "finish_reason": "stop",
    "model_name": "nvdev/meta/llama-3.1-70b-instruct"
  },
  "type": "ai",
  "name": null,
  "id": "run-e7550a04-02c0-44a8-af05-2b1e770bc426-0",
  "example": false,
  "tool_calls": [],
  "invalid_tool_calls": [],
  "usage_metadata": {
    "input_tokens": 16,
    "output_tokens": 25,
    "total_tokens": 41
  },
  "role": "assistant"
}



KeyboardInterrupt: 

In [17]:
print("\n====================== What is happening behind the scenes:======================")
print(json.dumps(memory_with_logging.get(config), indent=2, default=custom_serializer))


{
  "v": 2,
  "ts": "2025-04-30T09:32:15.814621+00:00",
  "id": "1f025a60-0e2c-68fa-8004-c865bb2e14a3",
  "channel_versions": {
    "__start__": "00000000000000000000000000000005.0.3756749851617761",
    "messages": "00000000000000000000000000000006.0.5323631868781991",
    "branch:to:chatbot": "00000000000000000000000000000006.0.31351555008916043"
  },
  "versions_seen": {
    "__input__": {},
    "__start__": {
      "__start__": "00000000000000000000000000000004.0.3446049711246755"
    },
    "chatbot": {
      "branch:to:chatbot": "00000000000000000000000000000005.0.9332388485926778"
    }
  },
  "channel_values": {
    "messages": [
      {
        "content": "Hi! My name is Will",
        "additional_kwargs": {},
        "response_metadata": {},
        "type": "human",
        "name": null,
        "id": "20f84808-78e8-47f7-b3f2-478e61517f9f",
        "example": false
      },
      {
        "content": "Nice to meet you, Will! Is there something I can help you with or would yo

/var/folders/18/95745h4n0hl94h03fr01c1jc0000gn/T/ipykernel_2269/196634802.py:22: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
  return obj.dict()


<summary><strong>What do we observe?</strong></summary>

Human and generated AI messages are appended to `messages` in chronological order as a list in HumanMessage or AIMessage Langchain object accordingly.

This graph depicts how `messages` changes, which is then stored using `MemorySaver()` by `StateGraph` from LangGraph.


### Optional: Using .invoke instead of .stream

Use stream when you would like the output in chunks, usually when displaying the frontend chatbot interactively. Streaming is not apparent in this jupyter notebook implementation.

In [25]:
thread_id = "testing_invoke"

config = {"configurable": {"thread_id": thread_id}}

result = graph.invoke(
    {"messages": [{"role": "user", "content": "What is the capital of France?"}]},
    config=config,
)
print(result["messages"][-1].content)

The capital of France is Paris.


# 🕵️ Second Capability: ReAct Agent with Tavily tool (No Memory)

Let's now explore how to implement the Tavily web search tool with LangGraph's create_react_agent. No conversational memory will be implemented yet, we will only be executing a single run to showcase the web search capability.

In [17]:
search = TavilySearchResults(max_results=5, include_images=True)
tools = [search]
agent = create_react_agent(llm, tools)

response = agent.invoke(
    {"messages": [{"role": "user", "content": "What are some tools for education?"}]})

print("\n================================== Ai Message ==================================")
print(response["messages"][-1].content)

print(json.dumps(response, indent=2, default=custom_serializer))




Based on the search results, here are some tools for education:

1. Educational Tools (EduTools) - a portal that centralizes digital resources and applications to support teaching and learning.
2. Bloomz - an interactive app that helps teachers and schools share updates, events, and photos with parents securely.
3. Plickers - a tool used to assess students' knowledge, assign homework, and record grades.
4. Printed materials like textbooks, workbooks, handouts, and worksheets.
5. Projectors and screens to display slideshows, videos, animations, and multimedia content.
6. Visual aids like posters, charts, graphs, and diagrams.
7. Audio-visual materials such as educational videos, documentaries, podcasts, and audio recordings.
8. Models, puzzles, blocks, and other hands-on materials, especially in subjects like mathematics and science.
9. Video conferencing tools and learning management systems.
10. Digital feedback tools.
11. Canva - a graphic design platform that empowers educators to 


<summary><strong>What do we observe?</strong></summary>

The above shows the following results from the Tavily tool calling, which will give you the top 5 web results (specified 5 in `max_results`).

-  `images`: Images from the website links which can be used as thumbnail if required.
- `results`: Specific information for each of the 5 web results, such as `url`, `title`, `content` (brief description), `score` (relevancy scoring).
- `content`: LLM answer to the query based on the web search results.

# 🧠 Third Capability: ReAct Agent with Memory + Tavily

Now, we are implementing the ReAct agent with both the Tavily web search tool and conversational memory.

This was the previously created `State` (structured Dict).

In [19]:
class State(TypedDict):
    # The "messages" key is a list.
    # The `add_messages` annotation tells our system to **append** new messages
    # to this list instead of replacing the entire list.
    messages: Annotated[list, add_messages]

In order to use create_react_agent, we will need to use an additional `is_last_step` parameter. The is_last_step flag and remaining_steps integer coordinate the decision-making process in a ReAct pipeline—particularly for managing when the Tavily tool calls should cease.

In [18]:
class TavilyState(TypedDict):
        messages: Annotated[list, add_messages]
        is_last_step: str
        remaining_steps: int

In [19]:
# Creating a new function to print the stream, taking into account the tool calling different printing (unable to do automatic pretty_print)
def print_stream(graph, inputs, config):
    for s in graph.stream(inputs, config, stream_mode="values"):
        message = s["messages"][-1]
        if isinstance(message, tuple): # To enable tool calling printing
            print(message)
        else:
            message.pretty_print()

Lastly, we are creating the react agent which implements the LangGraph graphing logic previously explored within the function.

We do not have to explicitly create our own StateGraph now, it is implemented within the pre-built `create_react_agent` function.

This enables us to pass in the memory in the same way as before using `MemorySaver()`.

The Tavily tool is also implemented for web search capabilities, creating a ReAct agent with conversational memory + web search capabilities.

![flowchart-tavily.png](img/flowchart-tavily.png)

In [20]:
search = TavilySearchResults(max_results=5, include_images=True)
tools = [search]
memory_tavily = MemorySaver()

graph_with_tavily = create_react_agent(llm, tools, state_schema=TavilyState, checkpointer=memory_tavily)



In [21]:
thread_id = "memory_and_tavily_2"

In [22]:
config = {"configurable": {"thread_id": thread_id}}

inputs = {"messages": [("user", "Hi, I'm Will! Nice to meet you. I'm a teacher.")]}
print_stream(graph_with_tavily, inputs, config)


Hi, I'm Will! Nice to meet you. I'm a teacher.

Nice to meet you too, Will! It's great to hear that you're a teacher. What subject do you teach, if I might ask?


The LLM determines whether tool calling is required. It has correctly decided not to call Tavily as web search capabilities are not required.

Let's look at what's happening behind the scenes.

In [23]:
print("\n====================== What is happening behind the scenes:======================")
print(json.dumps(memory_tavily.get(config), indent=2, default=custom_serializer))


{
  "v": 2,
  "ts": "2025-05-08T07:06:29.407678+00:00",
  "id": "1f02bdaf-6ecf-6dd0-8001-c9e7e3a4abbe",
  "channel_versions": {
    "__start__": "00000000000000000000000000000002.0.6920170914791764",
    "messages": "00000000000000000000000000000003.0.2788241362496966",
    "branch:to:agent": "00000000000000000000000000000003.0.8998049767662768"
  },
  "versions_seen": {
    "__input__": {},
    "__start__": {
      "__start__": "00000000000000000000000000000001.0.11357132653854018"
    },
    "agent": {
      "branch:to:agent": "00000000000000000000000000000002.0.6002830501976488"
    }
  },
  "channel_values": {
    "messages": [
      {
        "content": "Hi, I'm Will! Nice to meet you. I'm a teacher.",
        "additional_kwargs": {},
        "response_metadata": {},
        "type": "human",
        "name": null,
        "id": "dd149ef1-4398-4da2-be15-f99f5420e276",
        "example": false
      },
      {
        "content": "Nice to meet you too, Will! It's great to hear that y

<summary><strong>What do we observe?</strong></summary>

In the second element of `messages` (the AI message), you can observe `"tool_calls": []`. This correctly shows no tools were called in this run.

Let's now test the conversational memory property. Will the LLM be able to remember the user's job without the user mentioning it a second time? And, will it also be able to search for relevant web articles using Tavily?

In [26]:
inputs = {"messages": [("user", "Can you give me some articles on how AI can help me with my job?")]}
print_stream(graph_with_tavily, inputs, config)


Can you give me some articles on how AI can help me with my job?

<|python_tag|>tavily_search_results_json{"query": "AI for educators"}


Great! From Tool Message, you can see that the Tavily tool is called and it returns some web results.

Let's take a closer look.

In [25]:
print("\n====================== What is happening behind the scenes:======================")
print(json.dumps(memory_tavily.get(config), indent=2, default=custom_serializer))


{
  "v": 2,
  "ts": "2025-05-08T07:07:49.671462+00:00",
  "id": "1f02bdb2-6c44-65fe-8004-ca1a06ed29c1",
  "channel_versions": {
    "__start__": "00000000000000000000000000000005.0.276390468609507",
    "messages": "00000000000000000000000000000006.0.4662012362191299",
    "branch:to:agent": "00000000000000000000000000000006.0.5937524713461495"
  },
  "versions_seen": {
    "__input__": {},
    "__start__": {
      "__start__": "00000000000000000000000000000004.0.6277740333844445"
    },
    "agent": {
      "branch:to:agent": "00000000000000000000000000000005.0.08908218451655314"
    }
  },
  "channel_values": {
    "messages": [
      {
        "content": "Hi, I'm Will! Nice to meet you. I'm a teacher.",
        "additional_kwargs": {},
        "response_metadata": {},
        "type": "human",
        "name": null,
        "id": "dd149ef1-4398-4da2-be15-f99f5420e276",
        "example": false
      },
      {
        "content": "Nice to meet you too, Will! It's great to hear that yo

<summary><strong>What do we observe?</strong></summary>

Now, in the same `"tool_calls"` parameter we were observing before, you can see that it is not empty. `tavily_search_results_json` specifies the name of the tool that has been called correctly when asked for web articles. The query `AI for teachers` show that there was conversational memory that the user's job was a teacher from the first Human message sent before.