<a href="https://www.nvidia.com/dli"> <img src="images/nvidia_header.png" style="margin-left: -30px; width: 300px; float: left;"> </a>

# Build Your First Agent in LangGraph

Welcome! In this notebook you will learn the basics of how to use agents in LangGraph by building an agent that is an expert on any topic of your choosing. We enable the agent to have this expert knowledge by implementing retrieval-augmented generation (RAG); what this means is that when your agent recieves a query, it will first retrieve relevant content from external data (for example, a web page) before generating a response. This ensures that all responses are grounded in real-time content, rather than a LLM's internal knowledge that it gained during training. In addition to retrieving data from an external knowledge source, our agent will also be able to search the Internet (via Tavily) and Wikipedia.

Throughout this process, you will learn the basics of building agentic workflows, including concepts like tool-calling and agentic graphs.

## 0. Setup

Let's first install LangGraph. We will also install related add-on packages for document processing, embedding, and other integrations that our agent will need.

> **Note:** All dependencies for this notebook are managed via `pyproject.toml`. Run `uv sync` in the `module-1-fundamentals/` directory before starting this notebook.

## 1. Create Your Knowledge Base
First, take a moment to pick out the topic your agent will be able to answer questions about. In this example, we picked NVIDIA's GTC DC conference. Then, pick a few public web pages that contain information about your selected topic. For example, these could be the Wiki page to your favorite video game, the Wikipedia page on Cats, your personal website, etc. For best results, we don't recommend mixing multiple different topics.

Our agent will reference these web pages to try to answer the user's query. If these pages don't supply the necessary information, it will fall back to using internet or Wikipedia search.

Paste the URLs of your selected pages below. **Note: it is very important that the pages you select contain substantial written text rather than images, links, or other non-text elements.**

In [None]:
# Paste your URLs here
raw_urls = [ 
    "https://www.nvidia.com/gtc/dc/?ncid=pa-srch-goog-131-prsp-txt-en-us-3-l7-gos-bus&_bt=772787826392&_bk=nvidia%20gtc%20dc&_bm=p&_bn=g&_bg=188557822681&gad_source=1&gad_campaignid=22981589250&gbraid=0AAAAAD4XAoG7_nse0aK9mviFl5Z5RoNNX&gclid=Cj0KCQjw0Y3HBhCxARIsAN7931VWvSOUh6erHQf6IOfaX5UGgtQEm8DdzQlp5pzfH8AAgRmFHO9sJpMaAgkzEALw_wcB",
    "https://www.nvidia.com/gtc/dc/keynote/?ncid=pa-srch-goog-131-prsp-txt-en-us-7-l2-gos-sl2&_bt=772702593727&_bk=gtc%20dc&_bm=p&_bn=g&_bg=188557822681&gad_source=1&gad_campaignid=22981589250&gbraid=0AAAAAD4XAoG7_nse0aK9mviFl5Z5RoNNX&gclid=Cj0KCQjw0Y3HBhCxARIsAN7931XPCMy9dIpKn3SpUtg-uWUuoXz_7kQRfxah0flh9dMqs_qa5DTAaBMaAvPsEALw_wcB"
]

urls = [u.strip() for u in raw_urls if u.strip()]
print(f"Using {len(urls)} URL(s): {urls}")

### 1.1 Load Web Content into Documents

Now, we need to load in these web pages in a format that the agent can use later on.

**Document loaders** in LangChain are utility functions that ingest content from various sources (websites, PDFs, databases, etc.) and convert them into a standardized `Document` format. This makes it easy to work with content from different sources in a consistent way -- whether you're loading from a web page, a local file, or an API.

Each `Document` object contains:
- `page_content`: The actual text content
- `metadata`: Information about the source (URL, title, etc.)

In particular, **WebBaseLoader** is a document loader that fetches and parses web pages. It:
- Downloads the HTML content from a URL
- Extracts the main text content (stripping out most HTML tags, scripts, styles)
- Returns the content as LangChain `Document` objects with metadata (like the source URL)

This standardized format is what allows LangChain to process, split, embed, and search across content from different sources.

In [None]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document

Note: you may get a warning that reads "USER_AGENT environment variable not set, consider setting it to identify your requests." You can ignore this for the duration of this tutorial. The warning appears because when **WebBaseLoader** fetches web pages, it sends a "User-Agent" string that identifies what's making the request (like how browsers identify themselves as "Chrome" or "Firefox"). Right now it's using a generic one, but it's good practice to set a custom one when you're scraping websites in production.

In [None]:
# Load your websites as documents
docs = []
for u in urls:
    loader = WebBaseLoader(u)
    docs.extend(loader.load())

print(f"Loaded {len(docs)} documents")

Sanity Check: Let's take a quick look at what we loaded. This will show the first 1000 characters of the first document.

In [None]:
docs[0].page_content.strip()[:1000]

### 1.2 Chunk and index documents

Next, we’ll split the texts into smaller chunks and make the chunks searchable by indexing them and storing them in an in-memory vector store.


#### Splitting Documents into Chunks

Let's import Langchain's **RecursiveCharacterTextSplitter** tool, which breaks down the documents we loaded into smaller, manageable chunks. Splitting into chunks is important because:
- LLMs have token limits -- they can't process entire web pages at once
- Smaller chunks mean each chunk is more specific, and therefore when we search through and use these chunks in the RAG process, it's more likely that we pull only relevant information
  

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter


The `RecursiveCharacterTextSplitter` works by:
- Trying to split on natural boundaries (paragraphs, then sentences, then words)
- Respecting a maximum `chunk_size` (`from_tiktoken_encoder` indicates that chunk size is measured in tokens)
- Adding `chunk_overlap` between chunks so context isn't lost at boundaries

In [None]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=800, chunk_overlap=150 # Use 800 tokens per chunk, with a 150 token overlap
)

doc_splits = text_splitter.split_documents(docs) if docs else []
print(f"Created {len(doc_splits)} chunks")

#### Creating a Vector Store for Semantic Search

Great! Now that we have our documents in chunks, we need a way to search through them. Instead of keyword matching, we'll use **semantic search** -- finding relevant chunks based on meaning, not just exact word matches.

Langchain's **FastEmbedEmbeddings** is a neural network that maps text into numerical vectors, also known as **embeddings**. The neural network (aka "embedding model") has been trained in such a way that phrases with similar meaning end up with similar numerical vectors -- even if they use completely different words. For example, "AI conference" and "machine learning event" would have similar vectors. You might hear people describe this similarity as being "close together in vector space."

In [None]:
from langchain_community.embeddings import FastEmbedEmbeddings

Now we need a place to store our embeddings (the vectors).

**FAISS** (Facebook AI Similarity Search) is a vector database built by Meta that efficiently stores and searches these embeddings. It's:
- Fast -- optimized for similarity search across millions of vectors
- In-memory -- great for prototyping
- Free and open-source

In [None]:
from langchain_community.vectorstores import FAISS

Finally, we need a way to access the vector database. To do this, we will use a **retriever**, which is a standardized search interface. When a user asks a question, the retriever will find the top `k` most relevant document chunks based on semantic similarity. These chunks are then given to the LLM to provide grounded context for answering questions. In this example, we've set `k=4`.

Note that in the code below, we are use Langchain's `.as_retriever` to *wrap* the FAISS vectorstore and create that standard search interface. We could actually search in the vector database directly (without the retriever interface), by doing something like `results = vectorstore.similarity_search("my question", k=4)`. But generally, we use the `as_retriever` wrapper for interface consistency since it works with any vector database (FAISS, Pinecone, Chroma, etc.) and plugs in nicely to RAG workflows in Langchain.

In [None]:
embeddings = FastEmbedEmbeddings() # Initiate FastEmbedEmbeddings embedding model
vectorstore = FAISS.from_documents(doc_splits, embeddings) # Initiate FAISS vectorstore and turn our split up documents into vector embeddings
retriever = vectorstore.as_retriever(search_kwargs={"k": 4}) # Initiate retriever object

print("Retriever ready" if retriever else "No retriever (no docs loaded)")

This entire process of creating embeddings for document chunks and building them into a searchable structure is referred to as document **indexing**.

## 2. Create the Retriever Tool

A core step of building agents is to give them access to **tools** -- the way by which the LLM can interact with the real world in real-time, performing actions that go beyond its training data. For example, some tools we could give to an LLM include the ability to use a calculator, search the web, call an API to fetch the local weather, or retrieve information from a vector database (like what we're building here!) Think of this as enabling the LLM to have access to the tools we humans use on a daily basis.

Agents are powerful because they can autonomously decide *when* to use a tool based on the input query. Advanced reasoning agents can even chain multiple tools together to solve complex problems. For example, the agent could first search a vectorstore to retrieve information, then use a calculator tool to process numbers from that information, and finally use an Internet search tool to get some extra details.

Under the hood, a tool is just a function that the agent calls. We give agents access to tools in a standardized way -- for each tool, we must configure:
- A description of what the tool does, so the agent knows when to use it, and what output it should expect to receive back
- A description of what inputs the tool expects to receive (the function parameters), so it can call the tool properly

**`create_retriever_tool`** is a LangChain utility that wraps retrievers in a standardized tool interface, giving the retriever search interface a proper name and description so that the agent knows when to use it and how.

In [None]:
from langchain.tools.retriever import create_retriever_tool

**You should change the tool name and description below to match the type of websites you have selected.**

In [None]:
retriever_tool = create_retriever_tool(
    retriever, # Our retriever interface, created previously
    "retrieve_info_about_GTC_conference", # Tool name -- customize this to match your use case
    "Search and return relevant passages from GTC DC related websites.", # Tool description -- customize this to match your use case
)

Let's manually test our tool. Note that the output may look messy, but worry not! -- the LLM is surprisingly capable at extracting relevant content. In this example, we query about GTC DC. **You should change the query to ask a question that your selected web pages can answer.**

Under the hood, the retriever is making a vector out of our query phrase, finding the most similar embeddings in our vector database, and returning the results. How neat!

In [None]:
retriever_tool.invoke({"query": "Who are the speakers at GTC DC?"})

## 3. Adding an Internet Search Tool

In addition to our retrieval tool, we'll give the agent access to real-time internet search. This allows the agent to find current information beyond what's in our static vector database -- for example, breaking news, recent updates, or topics not covered in our documents.

We'll use **Tavily**, a search API optimized for AI agents. LangChain provides a pre-built wrapper called `TavilySearch` that makes integration simple.

Now, go to [Tavily](https://app.tavily.com/home) and sign up for a free account to generate your API key. Add your API key below.

In [None]:
from langchain_tavily import TavilySearch
import os

tavily_search = TavilySearch(max_results=3) # Define the search tool

Let's wrap our search interface as a LangChain tool to give it a proper name and description.

In [None]:
from langchain.tools import Tool

internet_search_tool = Tool(
    name="internet_search",
    description=(
        "Search the live internet for current, time-sensitive information. Returns diverse but less curated results from various sources across the web. "
        "Best for: breaking news, recent events (2024-2025), upcoming schedules, latest announcements, and any information that requires real-time updates. "
        "Use this when recency is critical."
    ),
    func=lambda query: tavily_search.invoke({"query": query})
)

Let's test it out.

In [None]:
internet_search_tool.invoke({"query": "When is GTC DC in 2025"})

## 4. Adding a Wikipedia Search Tool

Similar to the internet search tool, let's also give our agent access to a Wikipedia search tool. This tool will return more comprehensive results and well-established facts about topics, but results may not be as up to date as the internet search tool, and topics are limited to ones which have their own Wikipedia page.

In [None]:
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

wiki_search_tool = Tool(
    name="wikipedia_search", 
    description=(
        "Search Wikipedia for definitions and background information about specific people, places, organizations, or concepts. "
        "Use this ONLY for basic informational questions like 'what is X', 'who is X', 'tell me about X', or 'give me information about X' "
        "where X is a well-known proper noun or established concept. "
        "Examples: 'What is quantum computing?', 'Who is Marie Curie?', 'Tell me about the Roman Empire'"
    ),
    func=wikipedia.run
)

Let's test it out! **Note that this tool is prone to confusing certain topics together and pulling the wrong Wikipedia page. You may have to experiment with prompt engineering to get it to identify your intended page**

In [None]:
wiki_search_tool.invoke({"query": "Nvidia GTC DC"})

## 5. Creating the LangGraph

Now that we have our tools, let's put together the agentic system using **LangGraph**. LangGraph workflows are built from two core concepts: nodes and edges.

- **Nodes**: Processing steps that do work (call LLMs, execute tools, transform data). 
- **Edges**: Connections between nodes that control data flow
  - **Regular edges**: Direct connections, "after Node A, always go to Node B"
  - **Conditional edges**: Decision points: "go to Node B if X, otherwise go to Node C"

Note that the distinction to categorize a piece of logic as a node versus as an edge is somewhat flexible. Generally if it involves heavy processing or LLM calls, make it a node. If it's lightweight routing logic (if/else), use a conditional edge. Also, definitionally, edges cannot write to the graph's State (which tracks information as the workflow runs), while nodes can -- more on State later. 

### 5.1 Our Workflow

Here's a quick breakdown of the agentic workflow we will build, visualized in the flow chart below.

A user query comes in. A **conditional edge** asks: "Should I use the retrieval tool?"

- *If no*: Go to the **Generate node** and use the LLM's general knowledge to answer. Workflow finishes.
   
- *If yes*: Go to the **Retrieval Tool node**, which calls our `retriever_tool`
   

After `retriever_tool` retrieves documents, another **conditional edge** asks: "Is the retrieved context relevant?"
   
- *If yes*: Go to **Generate node** and use the retrieved context to answer. Workflow finishes.
   
- *If no*: Go to **Search Tool node** and use `search_tool` to find information from the internet, then go to **Generate node** for the final answer.

![Alt text](images/intro-to-agents_diagram.png)

### 5.2 Extending This Workflow

Notice that in this example, after **Search Tool node** finishes, we don't check relevance of the search results returned. But if we wanted to, we could add another **conditional edge** to check relevance, just like how we did after the **Retrieval Tool node**. If the relevance checker fails, we could even send the original user query back into a "Rewrite Query" node that uses an LLM to improves the query clarity, so that next time the retrieval tool or search tool is run, it will hopefully find more relevant results.

This is a great demonstration of the power of LangGraph, which lies in its composability -- you can build arbitrarily complex workflows by connecting nodes and edges in whatever way makes sense for your use case. **Well-engineered agentic workflows bring together the determinism necessary for production workloads, and the creative flexibility and intelligence of generative AI.** For more advanced patterns like query rewriting and multi-step reasoning, check out [LangChain's Agentic RAG tutorial](https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_agentic_rag/).

### 5.3 State & Functions

To create a LangGraph workflow, we first need to define the graph structure. **StateGraph** is the most common type of graph in LangGraph (there's also `MessageGraph` for chat-specific workflows, but `StateGraph` is more flexible). It's called "StateGraph" because it manages a **state dictionary** that flows through your workflow. Every `StateGraph` has a`START` and `END` which refer to special entry and exit points for the graph.


In [None]:
from langgraph.graph import StateGraph, START, END

#### How State Works

Every graph has a `state` -- a dictionary that gets passed from node to node via edges. Think of it as shared memory that nodes can read from and write to.

For example, right after the user query is input, our state might look like this:
```python
state = {
    "query": "What speakers are at GTC?",
    "context": [],      # Any context we retrieve goes here
    "answer": "",       # The final answer goes here
}
```

Let's define our `state` dictionary.

In [None]:
from typing import TypedDict, Optional, Literal, List

class State(TypedDict):
    query: str                                                       # Original user question
    context_source: Optional[Literal["retrieval","search","none"]]   # Context source (retrieval or search)
    context_chunks: List[str]                                        # Context
    answer: str                                                      # Final answer

#### Defining Functions
All **nodes** and **conditional edges** are defined by a function.

**Node** functions process data and update the state. For example:
```python
def retrieval_node(state):
    # Do retrieval stuff
    return {"context_chunks": [...]}  # Fills in the context part of state with what was retrieved

def generate_node(state):
    # Generate answer
    return {"answer": "..."}  # Fills in the final answer part of state
```

**Conditional edge** functions read state and decide where to route. For example, this conditional edge will check if the "documents" key in the state has been written to or not. If it has been filled, the function returns the "docs_found" flag, otherwise it returns the "no_docs" flag. Later steps in the workflow will use this flag to make routing decisions.
```python
def route_after_retrieve(state):
    if state["documents"]:
        return "docs_found"
    else:
        return "no_docs"
```

#### Assembling the Graph
Once we have defined any nodes and conditional edge functions necessary, we need to link them all together into a graph. To do this, we use the `add_node`, `add_edge`, and `add_conditional_edge` methods.

For example, here we are adding "retrieve", "generate", and "search" nodes using the `add_node` method.

```python
graph = StateGraph(State) # Initialize the graph

graph.add_node("retrieve", retrieval_node) # Add a "retrieve" node, which runs a retrieval_node function
graph.add_node("generate", generate_node)  # Add a "generate" node, which runs a generate_node function
graph.add_node("search", search_node) # Add a "search" node, which runs a search_node function
...
```

We also need to add edges. If we want to unconditionally go from one node to another, we would add a Direct Edge using `add_edge`. For example, if we always want to generate an answer after the search tool (regardless of what the search tool returns) we could add a direct edge between the "search" and "generate" nodes. Note that nodes are always identified by their unique string name.
```python
graph.add_edge("search", "generate")
```

Meanwhile, if we want conditional logic, we could use `add_conditional_edges`. For example, after the "retrieve" node, we want conditional logic to go to the "generate" node or the "search" node depending on if relevant documents were found during retrieval. The actual logic for checking relevance should already be implemented in the `route_after_retrieve` function; all the conditional edge needs to do is specify which flag corresponds to which node to route to. In this case, if the `route_after_retrieve` function returned `docs_found`, then we should route to the "generate" node. If the `route_after_retrieve` function returned the `no_docs` flag, then we should route to the "search" node.

```python
graph.add_conditional_edges(
    "retrieve", # Begin at this node
    route_after_retrieve, # The function that actually does the relevance checking logic (implementation not shown yet)
    {
        "docs_found": "generate",  # If route_after_retrieve returned "docs_found" flag, go to "generate" node
        "no_docs": "search"        # If route_after_retrieve returned "no_docs" flag, go to "search" node
    }
)
```
Note that the flag defined in our `route_after_retrieve` function is a string in this case, but it can be an int, bool, etc. Generally, strings are encouraged for code readability.

### 5.4 Implementing What We Learned

#### Functions for Nodes

Let's revisit the flowchart of our workflow from above and actually implement it. Based on the flowchart, we can see that there will be a total of three functions for nodes (blue boxes)
- `retrieve_node`: calls the retriever tool
- `search_node`: calls the search tool
- `generate_node`: all LLM response generation tasks

In this example, for code clarity, we choose to handle all LLM response generation in a single node, regardless of if the response is using retrieval or search context. Alternatively, we could also break down the `generate_node` into three different nodes, for example: `generate_node`, `generate_with_retrieval`, and `generate_with_search`.


Let's implement the **"generate"** node function first. This node will handle all LLM response generation, regardless of what context is used (if any).

We will use NVIDIA's API endpoints to access a llama-3.1-70b model. Get a NVIDIA API key [here](https://build.nvidia.com/meta/llama-3_1-70b-instruct) (click "View Code" in the top right, and then "Generate API Key"). Set your key in the cell below.

In [None]:
import os
from dotenv import find_dotenv, load_dotenv
load_dotenv(find_dotenv())

Let's make a helper method for making LLM calls, which the `generate_node` function can use. Later on, when we make our search node and our conditional edges, we will also call this helper method.

In [None]:
from langchain_openai import ChatOpenAI

llm_client = ChatOpenAI(
    base_url="https://integrate.api.nvidia.com/v1",
    api_key=os.environ["NVIDIA_API_KEY"],
    model="meta/llama-3.1-70b-instruct"
)

Next, we need to create an **LLM client**, which is a Python object that provides an interface for interacting with the LLM by handling authentication, request formatting and handling the LLM's response.

In [None]:
def call_llm(prompt: str, temperature: float = 0.0) -> str:
    response = llm_client.invoke(
        [{"role": "user", "content": prompt}],
        temperature=temperature,
    )
    return response.content

Now let's code the actual `generate_node` function. This function feeds the `query` and `context_chunks` state fields into the LLM as input, in order to generate the final answer. Notice that if the `context_chunks` field contains context, we use a prompt which explicitly asks the LLM reference the context. If the `context_chunks` field is empty, we use a different prompt.

In [None]:
def generate_node(state: State) -> dict:
    query = state["query"] # reference State for the user query
    context = "\n\n".join(state.get("context_chunks", []))[:3000] # reference State for context, if any
    
    # Build prompt
    if context:
        prompt = f"""Use the following context to answer the question.
        Question: {query}
        Context: {context[:3000]}
        Answer in 2-3 sentences:"""
    else: # No context given
        prompt = f"""Answer the following question based on your knowledge. Be concise (2-3 sentences).
        Question: {query}
        Answer:"""
    
    # Generate answer
    answer = call_llm(prompt)
    return {"answer": answer}

Let's try it out. Suppose we have some example State such as `example_state` where we have some user query and accompanying context. Watch `generate_node` generate an answer based on the context.

In [None]:
example_state_1 = {
    "query": "Who are the speakers for GTC DC 2025?",
    "context_chunks": [
        "<li>Roland Busch, President and CEO, Siemens AG</li>"
        "li>Young Liu, Chairman and CEO, Foxconn</li>"
    ]
}

output = generate_node(example_state_1)
print(output)

Now let's tackle the **"search"** node. This node should be able to use both the `internet_search_tool` and the `wiki_search_tool` to find context. We'll give it the freedom to decide which of the two tools it wants to use, depending on which one it feels will best provide better context for answering the user query.

To enable this, we need to define an array of available `tools`. Then, we'll use a **ReAct** agent, which is a special type of agent that has tool-calling abilities. ReAct is a concept that stands for "Reasoning and Acting" -- the agent follows a loop where it:
1. Reasons about what information it needs
2. Acts by calling a tool
3. Observes the result
4. Repeats if needed, or synthesizes a final answer

We can reuse the LLM client we made earlier, with one extra step: we have to **bind** our tools to the LLM client using `.bind_tools()` to give the LLM access to the tool schemas (names, descriptions, and parameters).

Then, we pass both the bound LLM and the tools array into `create_react_agent`, which is LangGraph's prebuilt module that handles the ReAct reasoning loop for us: if the LLM outputs that it *wants* to call a tool, then the `create_react_agent` module will do the actual tool execution and feed the tool output back to the LLM, then repeat this loop until the LLM outputs that it has enough information to provide a final answer.

In the implementation below, we've included verbose print statements so that you can observe the ReAct loop happening before the LLM arrives at its final answer.

In [None]:
from langgraph.prebuilt import create_react_agent

tools = [internet_search_tool, wiki_search_tool] # Define available tools
llm_with_tools = llm_client.bind_tools(tools) # Bind tools to llm client

def search_node(state: State) -> dict:
    query = state["query"] # Read user query from State
    
    agent = create_react_agent(llm_with_tools, tools) # Create ReAct agent
    result = agent.invoke({ # Prompt the agent to find context
        "messages": [{"role": "user", "content": f"Find comprehensive information to answer this question: {query}"}]
    })


    ################## Print statement to show ReAct Loop ##################
    print("=== FULL REACT LOOP ===\n")
    for i, message in enumerate(result["messages"]):
        print(f"--- Step {i+1} ---")
        print(f"Type: {type(message).__name__}")
        
        # Show tool calls if they exist
        if hasattr(message, "tool_calls") and message.tool_calls:
            print(f"TOOL CALLS: {message.tool_calls}")
        
        # Show content if it exists
        if hasattr(message, "content") and message.content:
            content = str(message.content)
            if len(content) > 500:
                print(f"Content (truncated): {content[:500]}...")
            else:
                print(f"Content: {content}")
        
        print()
    ########################################################################
    
    search_result = []
    for message in result["messages"]: # Extracting all text content fromt the ReAct loop, so we can use it as context
        if hasattr(message, "content") and message.content:
            search_result.append(message.content)
    
    return {
        "context_source": "search",
        "context_chunks": search_result
    
    }

Let's test it! Since our question includes a recent year, the agent should choose to use the internet search tool to retrieve date-sensitive information.

In [None]:
example_state_2 = {
    "query": "Who are the speakers for GTC DC 2025?",
}

output = search_node(example_state_2)
print("\n\n=== Output that gets written back to the state as the context: ===")
print(output)

Now, let's ask a question about established knowledge or encyclopedic information. We expect the agent to choose the Wikipedia tool this time.

In [None]:
example_state_3 = {
    "query": "What is the GTC DC conference?",
}

output = search_node(example_state_3)
print("\n\n=== Output that gets written back to the state as the context: ===")
print(output)

Next, let's implement the **"retrieve"** node. This node is more straightforward. It simply calls the retrieve tool we made earlier.

In [None]:
def retrieve_node(state: State) -> dict:
    query = state["query"] # Read user query from State
    retrieval_result = retriever_tool.invoke({"query": query}) # Call the retriever tool
        
    return { # Update context fields in State
        "context_source": "retrieval",
        "context_chunks": [retrieval_result], # the retrieval_result response from the retriever tool is a string, so we put it into a list to be consistent with the context_chunks definition in State
    }

Note that both the **"retrieve"** and "**search"** nodes directly update the state’s `context_source` and `context_chunks` fields so that they always reflect the most recently executed node. `context_chunks` will then be used by the **"generate"** node to craft the final answer. `context_source` is not used in our workflow, but we keeping it for logging purposes.

#### Functions for Conditional Edges

Moving on to make the conditional edges of our workflow, referencing the flowchart again we see that there should be two conditional edge functions:
- `should_retrieve`: Will retrieving documents help in answering this question (should we do retrieval?)
- `check_relevance`: Is the retrieved context relevant for answering the question?

Let's implement them. Both use an LLM to decide on routing.

**For the `should_retrieve` method, you should alter the prompt to fit your specific document content (in the example, the documents are about GTC DC)**.

In [None]:
def should_retrieve(state: State) -> Literal["should_retrieve", "should_NOT_retrieve"]:
    query = state["query"]

    # Alter this prompt to fit your document content
    prompt = f"""Does this question require retrieving specific information from documents about GTC DC, or can it be answered with general knowledge?
             Question: {query}
             Reply with ONLY 'retrieve' or 'generate' (no explanation):"""
    
    decision = call_llm(prompt).strip().lower()
    if "retrieve" in decision:
        return "should_retrieve"
    else:
        return "should_NOT_retrieve"


def check_relevance(state: State) -> Literal["not_relevant", "is_relevant"]:
    query = state["query"]
    #context = state.get("context_chunks", "")
    context = "\n\n".join(state.get("context_chunks", []))
    
    if not context:
        return "not_relevant"
        
    prompt = f"""Are these retrieved documents relevant to answering the question?
            Question: {query}
            Documents: {context[:2000]}
            Reply with ONLY 'yes' or 'no':"""
    
    decision = call_llm(prompt).strip().lower()
    if "yes" in decision:
        return "is_relevant"
    else:
        return "not_relevant"

Let's try it out. Below, we call `should_retrieve` with an example query that likely does not require document retrieval. Our conditional edge function therefore returns `should_not_retrieve`.

In [None]:
example_state_4 = {
    "query": "What is 2+2?"
}
print(should_retrieve(example_state_4))

Now let's try out `check_relevance`.

We call it with an example state that has a query and some pre-populated context that is unhelpful towards answering the query. Our conditional edge function therefore returns `not_relevant`.

In [None]:
example_state_5 = {
    "query": "What is the GTC DC venue address?",
    "context_chunks": [
        "All kittens are born with blue eyes"
        "their eyes lack the pigment melanin, which produces brown, green, or blue eye colors"
    ]
}

print(check_relevance(example_state_5))

#### Building the Complete LangGraph

Finally, we can put all the nodes and edges together. 

In [None]:
workflow = StateGraph(State) # Initialize the graph

# First, add all nodes
workflow.add_node("retrieve", retrieve_node)
workflow.add_node("search", search_node)
workflow.add_node("generate", generate_node)

# Then, define the flow of the workflow from START to END
# Connect START to the conditional edge -- Should we retrieve context from the web page documents?
workflow.add_conditional_edges(
    START,
    should_retrieve,  # Decision function
    {
        "should_NOT_retrieve": "generate", # If should_NOT_retrieve, go to generate node. Workflow finishes.
        "should_retrieve": "retrieve"      # If should_retrieve, go to retrieve node
    }
)

# After retrieve node, check relevance of the context retrieved
workflow.add_conditional_edges(
    "retrieve",
    check_relevance,  # Decision function
    {
        "is_relevant": "generate",  # If is_relevant, go to generate node. Workflow finishes.
        "not_relevant": "search"    # If not_relevant, go to search node
    }
)

# After search node, always go to generate node. Workflow finishes.
workflow.add_edge("search", "generate")

# After any generate node, workflow finishes.
workflow.add_edge("generate", END)

# Compile the graph
graph = workflow.compile()

## 6. Run the Agent

And we're done! Now we can ask any question, and our agent will use retrieval-augmented generation and tool-calling to provide a response.

**To summarize:** the agent first decides whether to retrieve context from our web page documents, or just generate an answer based on its internal knowledge. If the web page content retrieval is selected, but the retrieval doesn't provide relevant context, the agent will fall back to using a search tool. It has the freedom to pick between using Wikipedia search and internet search, depending on which it thinks will return the better context for the user's question.

Below is a utility function to call the agentic system and print verbose logging. 

In [None]:
def ask_question(question: str, history=None): # We have a history parameter that is not actually implemented, but it could be implemented later to have persistent conversation history between messages
    print(f"\n{'='*60}")
    print(f"Question: {question}")
    print(f"{'='*60}\n")

    initial_state = { # Set initial state
        "query": question,
        "context_source": None,
        "context_chunks": [], 
        "answer": ""
    }
    
    final_state = graph.invoke(initial_state) # Run the graph
    
    # Print final answer!
    print(f"\n{'='*60}")
    print(f"Context Source: {final_state['context_source']}")
    print(f"Answer:\n{final_state['answer']}")
    print(f"{'='*60}\n")
    print(f"Context Used:\n{final_state['context_chunks']}")

    response = f"{final_state['answer']}\n\n_Source: {final_state['context_source']}_"
    return response

### Test the Agent

Let's try it out! Try to trigger a response that requires retrieval. In this example, we ask a question about GTC DC.

**Make sure to edit the question below so that it's about a topic you included in your web page knowledge base, to trigger the agent to choose retrieval.**

In [None]:
# Example usage - should be retrieval
ask_question("Who are the speakers at NVIDIA GTC DC 2025?")

Now let's try to trigger the agent to use search. In this example, we ask about GTC DC 2024, but our web page knowledge base only includes information about GTC DC 2025, so the agent needs to fall back on searching instead.

**Make sure to edit the question below so that the answer cannot be found by looking at your web page knowledge base.** Pay attention to whether the agent chose to use a internet search tool, or the wikipedia tool (you can determine this under the "FULL REACT LOOP" print statements).

In [None]:
# Example usage - should be search
ask_question("In which month was GTC DC 2024 held?")

Finally, let's try a question that requires no context at all; this question can be answered solely using the model internal knowledge that it learned during training.

**To test the no-context behavior, make sure to double check that the input question does not coincide with the topic you used for your web page knowledge base**

In [None]:
# Example usage - should be None
ask_question("Why are kittens' eyes blue")

### Closing Remarks

Our agent demonstrates a mix of traditional engineering and agentic autonomy. For our retreival and generate nodes, we use deterministic logic to create a predictable workflow. These nodes always execute their assigned tool calls, and they're not given a choice to "try again" or pick which tool to call. Meanwhile, the conditional edges in our workflow utilize an LLM to making routing decisions, so we harness the non-deterministic reasoning capabilities of the LLM -- but the final pathway to take is still scoped (there are only two possible paths to take at each decision point). On the other hand, the search node is given complete freedom to use a ReAct agent that can reason about which tool to use, and call tools on its own until it reaches a satisfactory answer.

The key to designing a useful agentic workflow lies in deciding where to apply autonomy and where to enforce determinism, in order to harness the intelligence of generative AI without creating a messy and unpredictable workflow that is unsuitable for production use cases.

### Gradio User Interface

Finally, we can even spin up a quick **gradio** interface to chat with our agent! Gradio is a Python library that lets you quickly create web-based user interfaces for AI applications -- especially useful for demos and prototypes.

In [None]:
import gradio as gr

gr.ChatInterface(ask_question).launch( # Launch the gradio user interface
    share=True, # Creates a shareable public URL for your interface
    server_name="0.0.0.0",
)

## Next Steps

Continue to the next notebook: [03_Surge_of_Agents.ipynb](03_Surge_of_Agents.ipynb) to explore different agent frameworks including OpenAI, LangChain, LangGraph, and CrewAI.