# Bringing Your Own LangGraph Agent to NeMo Agent Toolkit

In this notebook, we'll show you how to integrate an existing LangGraph agent with the NeMo Agent Toolkit (NAT).

You'll learn how to wrap LangGraph agents so they work smoothly with NAT. This lets you take advantage of NAT features like MCP compatibility, observability, optimization, and profiling in your existing LangGraph agent systems without refactoring your existing code.

**Key Difference**: Unlike traditional LangChain agents that use the `AgentExecutor` pattern, LangGraph uses a **graph-based architecture** with nodes and edges, providing more flexibility and control over agent execution flow.


# Table of Contents
- [0.0) Setup](#setup)
  - [0.1) Prerequisites](#prereqs)
  - [0.2) API Keys](#api-keys)
  - [0.3) Installing NeMo Agent Toolkit](#installing-nat)
- [1.0) Defining an 'Existing' LangGraph Agent](#defining-existing-agent)
- [2.0) Existing Agent Migration](#migration)
  - [2.1) Migration Part 1: Transforming Your Existing Agent into a Workflow](#migration-part-1)
  - [2.2) Migration Part 2: Making Your Agent Configurable](#migration-part-2)
  - [2.3) Migration Part 3: Integration with NeMo Agent Toolkit](#migration-part-3)
  - [2.4) Migration Part 4: A Zero-Code Configuration](#migration-part-4)
- [3) Next Steps](#next-steps)

<span style="color:rgb(0, 31, 153); font-style: italic;">Note: In Google Colab use the Table of Contents tab to navigate.</span>


<a id="setup"></a>
# 0.0) Setup


<a id="prereqs"></a>
## 0.1) Prerequisites


- **Platform:** Linux, macOS, or Windows
- **Python:** version 3.11, 3.12, or 3.13
- **Python Packages:** `pip`


<a id="api-keys"></a>
## 0.2) API Keys


For this notebook, you will need the following API keys to run all examples end-to-end:

- **NVIDIA Build:** You can obtain an NVIDIA Build API Key by creating an [NVIDIA Build](https://build.nvidia.com) account and generating a key at https://build.nvidia.com/settings/api-keys
- **Tavily:** You can obtain a Tavily API Key by creating a [Tavily](https://www.tavily.com/) account and generating a key at https://app.tavily.com/home

Then you can run the cell below:


In [None]:
import getpass
import os

# For local NIM deployment, you may not need an NVIDIA API key
# If your local NIM requires authentication, set it here
if "NVIDIA_API_KEY" not in os.environ:
    # For local NIM, you can use a placeholder or skip this
    nvidia_api_key = getpass.getpass("Enter your NVIDIA API key (or press Enter for local NIM): ")
    os.environ["NVIDIA_API_KEY"] = nvidia_api_key or "not-needed-for-local-nim"

if "TAVILY_API_KEY" not in os.environ:
    tavily_api_key = getpass.getpass("Enter your Tavily API key: ")
    os.environ["TAVILY_API_KEY"] = tavily_api_key


## üöÄ Using Local NIM Deployment

This notebook has been configured to use a **local NVIDIA NIM** deployment instead of the NVIDIA Build API.

**Your Local NIM Configuration:**
- **Endpoint**: `http://0.0.0.0:8000/v1`
- **Model**: `meta-llama/llama-3.1-8b-instruct`

**To verify your local NIM is running:**
```bash
curl -X 'POST' \
  'http://0.0.0.0:8000/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "meta-llama/llama-3.1-8b-instruct",
    "messages": [{"role":"user", "content":"Hello!"}],
    "max_tokens": 64
  }'
```

**Benefits of Local NIM:**
- ‚úÖ No API key required (or use local auth)
- ‚úÖ Lower latency (no internet roundtrip)
- ‚úÖ Better privacy (data stays local)
- ‚úÖ No rate limits
- ‚úÖ Cost savings (no per-token charges)


### Configuration Changes for Local NIM

The following configurations have been updated throughout this notebook to use your local NIM:

**1. Config Files Updated:**
- `langgraph_agent_workflow/configs/config.yml`
- `langgraph_agent_workflow/configs/config_8b.yml`
- `langgraph_agent_workflow/configs/config_with_profiling.yml`

**2. Key Changes Made:**

```yaml
llms:
  nim_llm:
    _type: nim
    model_name: meta-llama/llama-3.1-8b-instruct  # Changed from meta/llama-3.3-70b-instruct
    base_url: http://0.0.0.0:8000/v1              # Added: Points to your local NIM
    temperature: 0.2
    max_tokens: 2048
```

**3. For Direct Code Usage (Python scripts):**

If you're creating standalone scripts, use:

```python
from langchain_nvidia_ai_endpoints import ChatNVIDIA

llm = ChatNVIDIA(
    model="meta-llama/llama-3.1-8b-instruct",
    base_url="http://0.0.0.0:8000/v1",  # Your local NIM endpoint
    temperature=0.2,
    max_tokens=2048,
    api_key="not-needed-for-local-nim"  # Use placeholder or actual key if required
)
```

**4. Testing Your Local NIM:**

Run this cell to verify connectivity:


In [None]:
# Test your local NIM connection
import requests
import json

try:
    response = requests.post(
        'http://0.0.0.0:8000/v1/chat/completions',
        headers={
            'accept': 'application/json',
            'Content-Type': 'application/json'
        },
        json={
            "model": "meta-llama/llama-3.1-8b-instruct",
            "messages": [{"role": "user", "content": "Say 'Hello from local NIM!' in one sentence."}],
            "max_tokens": 64
        },
        timeout=10
    )
    
    if response.status_code == 200:
        result = response.json()
        print("‚úÖ Local NIM is running!")
        print(f"Response: {result['choices'][0]['message']['content']}")
    else:
        print(f"‚ùå Error: Status {response.status_code}")
        print(response.text)
        
except requests.exceptions.ConnectionError:
    print("‚ùå Cannot connect to local NIM at http://0.0.0.0:8000")
    print("Please ensure your NIM is running.")
except Exception as e:
    print(f"‚ùå Error: {e}")


### üìù Summary of Changes for Local NIM Usage

**What Changed:**

| Component | Original (NVIDIA Build) | Updated (Local NIM) |
|-----------|------------------------|---------------------|
| **Endpoint** | `https://integrate.api.nvidia.com/v1` | `http://0.0.0.0:8000/v1` |
| **Model Name** | `meta/llama-3.3-70b-instruct` | `meta-llama/llama-3.1-8b-instruct` |
| **API Key** | Required from NVIDIA Build | Optional/Placeholder |
| **Network** | Internet required | Local only |

**Additional Notes:**

1. **Model Availability**: Ensure your local NIM has the `meta-llama/llama-3.1-8b-instruct` model loaded
2. **Multiple Models**: If you have other models in your NIM, you can change `model_name` to any available model
3. **Authentication**: If your local NIM requires authentication, set the `api_key` parameter appropriately
4. **Network**: Replace `0.0.0.0` with `localhost` or specific IP if needed

**Checking Available Models:**

```bash
curl http://0.0.0.0:8000/v1/models
```

This will list all models available in your local NIM deployment.


<a id="installing-nat"></a>
## 0.3) Installing NeMo Agent Toolkit


The recommended way to install NAT is through `pip` or `uv pip`.

First, we will install `uv` which offers parallel downloads and faster dependency resolution.


### Installing Dependencies

**Note**: If you encounter `No module named pip` errors, the cells below will automatically fix this by:
1. Ensuring pip is installed in your environment
2. Using the correct Python interpreter from your kernel

**Alternative methods** if you continue to have issues:
- Option 1: Run in terminal: `python3 -m ensurepip --default-pip`
- Option 2: Reinstall the virtual environment
- Option 3: Use system Python instead of venv


In [None]:
# Diagnostic: Check your Python environment
import sys
import os

print(f"Python executable: {sys.executable}")
print(f"Python version: {sys.version}")
print(f"Virtual env: {os.getenv('VIRTUAL_ENV', 'Not in a virtual environment')}")

# Try to import pip
try:
    import pip
    print(f"‚úÖ pip is installed (version {pip.__version__})")
except ImportError:
    print("‚ùå pip is not installed - will be fixed in next cell")


In [None]:
import sys
!{sys.executable} -m ensurepip --default-pip
!{sys.executable} -m pip install uv


Note: you may need to restart the kernel to use updated packages.


In [None]:
# Alternative: If the above doesn't work, run this in your terminal:
# cd to your project directory, then run:
# python3 -m venv .venv --system-site-packages
# source .venv/bin/activate  # On macOS/Linux
# .venv\Scripts\activate     # On Windows
# python -m ensurepip --upgrade
# pip install jupyter ipykernel
# python -m ipykernel install --user --name=.venv

print("If you're still having issues, run the commands above in your terminal,")
print("then restart the Jupyter kernel and select the .venv kernel.")


NeMo Agent toolkit can be installed through the PyPI `nvidia-nat` package.

There are several optional subpackages available for NAT. The `langchain` subpackage contains useful components for integrating and running with [LangChain](https://python.langchain.com/docs/introduction/) and [LangGraph](https://langchain-ai.github.io/langgraph/).

**Note**: LangGraph is part of the LangChain ecosystem and is included when you install `nvidia-nat[langchain]`. This single installation provides both LangChain and LangGraph dependencies.

Since LangGraph will be used later in this notebook, let's install NAT with the optional `langchain` subpackage:


In [40]:
%%bash
uv pip show -q "nvidia-nat-langchain"
if [ $? -ne 0 ]; then
    uv pip install "nvidia-nat[langchain]"
else
    echo "nvidia-nat[langchain] is already installed"
fi


nvidia-nat[langchain] is already installed


Let's verify that both LangChain and LangGraph are available:


<a id="defining-existing-agent"></a>
# 1.0) Defining an 'Existing' LangGraph Agent

In this case study, we will use a simple, self-contained LangGraph agent as a proxy for your 'existing' agent. This agent comes equipped with a search tool that is capable of retrieving context from the internet using the Tavily API.

**Key Difference from LangChain**: Unlike traditional LangChain agents that use the `AgentExecutor` pattern, LangGraph uses a **graph-based architecture** with nodes and edges. This provides more flexibility and control over the agent's execution flow.

The cell below defines a simple LangGraph agent with a string input query.


In [None]:
%%writefile langgraph_agent.py
import os
import sys

from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_tavily import TavilySearch
from langgraph.prebuilt import create_react_agent

def existing_agent_main():
    if len(sys.argv) < 2:
        print("Usage: python langgraph_agent.py \"Your question here\"")
        sys.exit(1)
    user_input = sys.argv[1]

    # Initialize a tool to search the web
    search = TavilySearch(
        max_results=5,
        api_key=os.getenv("TAVILY_API_KEY")
    )

    # Initialize a LLM client (using local NIM)
    llm = ChatNVIDIA(
        model="meta-llama/llama-3.1-8b-instruct",
        base_url="http://0.0.0.0:8000/v1",
        temperature=0.2,
        max_tokens=2048,
        api_key=os.getenv("NVIDIA_API_KEY", "not-needed-for-local-nim")
    )

    # Create tools list
    tools = [search]

    # Create a LangGraph ReAct agent using the prebuilt function
    # This creates a StateGraph with agent and tool nodes automatically
    graph = create_react_agent(
        model=llm,
        tools=tools,
    )

    # Invoke the agent with a user query
    # LangGraph uses message-based state
    response = graph.invoke({"messages": [("user", user_input)]})

    # Extract and print the final response
    final_message = response["messages"][-1]
    print(final_message.content)

if __name__ == "__main__":
    existing_agent_main()



Overwriting langgraph_agent.py


There are three main components to this LangGraph agent:

* **a web search tool (Tavily)** - for retrieving information from the internet

* **an LLM (Llama 3.1 8B via local NIM)** - for reasoning and generating responses

* **a graph-based agent system (LangGraph's `create_react_agent`)** - for orchestrating the agent's execution

The agent is constructed using LangGraph's `create_react_agent` function, which automatically creates a state graph with:
- An **agent node** that calls the LLM
- **Tool nodes** for executing tools
- **Conditional edges** for routing between agent and tools

We pass the requested input into the graph and get a response back through the message state.

All of the components in use come from LangGraph/LangChain, but any other framework or example could also work.

Next we will run this sample agent to validate that it works.


**Note on LangGraph vs LangChain Parameters**: 

We're using the same parameters as the LangChain example (`max_results=2`, `temperature=0.0`, `max_completion_tokens=1024`) for consistency. However, you may notice that LangGraph's `create_react_agent` sometimes produces different quality responses compared to LangChain's `AgentExecutor` with identical settings.

This is because:
- **Different default system prompts** between the frameworks
- **Different agent execution patterns** (graph-based vs. executor-based)
- **Different tool result handling** in the reasoning loop

If you see incomplete responses like *"was not specified in the search results"*, you can improve this by:
```python
# Increase search results for more context
search = TavilySearch(max_results=5, api_key=os.getenv("TAVILY_API_KEY"))

# Slightly higher temperature and more tokens (using local NIM)
llm = ChatNVIDIA(
    model="meta-llama/llama-3.1-8b-instruct", 
    base_url="http://0.0.0.0:8000/v1",
    temperature=0.2, 
    max_tokens=2048
)

# Create the agent
tools = [search]
graph = create_react_agent(model=llm, tools=tools)
```

Let's test the basic version first to see how it performs:


In [42]:
!python langgraph_agent.py "Who won the last World Cup?"


  llm = ChatNVIDIA(


The current World Cup holder is the Argentina national team, who defeated the France national team in the 2022 World Cup final in Qatar with a score of 3-3 (4-2 pens).


<a id="migration"></a>
# 2.0) Existing Agent Migration

<a id="migration-part-1"></a>
## 2.1) Migration Part 1: Transforming Your Existing Agent into a Workflow

NAT supports users bringing their own agent into the framework. As the primary entrypoint for agent execution is a NAT Workflow. For the first pass at NAT migration we will create a new workflow:


In [43]:
!nat workflow create langgraph_agent_workflow


Workflow 'langgraph_agent_workflow' already exists.
[0m[0m

Now that we've created a workflow directory for a new agent, we will continue by migrating the agent's functional code into the new workflow. In the next cell, we have adapted the agent code from the `def existing_agent_main()` into a new method `def langgraph_agent_workflow_function()` which encapsulates the exact same functionality, but is decorated and registered for NAT workflow compatibility.


In [None]:
%%writefile langgraph_agent_workflow/src/langgraph_agent_workflow/langgraph_agent_workflow.py
import logging

from pydantic import Field

from nat.builder.builder import Builder
from nat.builder.framework_enum import LLMFrameworkEnum
from nat.builder.function_info import FunctionInfo
from nat.cli.register_workflow import register_function
from nat.data_models.function import FunctionBaseConfig

logger = logging.getLogger(__name__)


class LangGraphAgentWorkflowFunctionConfig(FunctionBaseConfig, name="langgraph_agent_workflow"):
    pass


@register_function(config_type=LangGraphAgentWorkflowFunctionConfig, framework_wrappers=[LLMFrameworkEnum.LANGCHAIN])
async def langgraph_agent_workflow_function(_config: LangGraphAgentWorkflowFunctionConfig, _builder: Builder):
    import os

    from langchain_nvidia_ai_endpoints import ChatNVIDIA
    from langchain_tavily import TavilySearch
    from langgraph.prebuilt import create_react_agent

    # Initialize a tool to search the web
    search = TavilySearch(
        max_results=5,
        api_key=os.getenv("TAVILY_API_KEY")
    )

    # Initialize a LLM client (using local NIM)
    llm = ChatNVIDIA(
        model="meta-llama/llama-3.1-8b-instruct",
        base_url="http://0.0.0.0:8000/v1",
        temperature=0.2,
        max_tokens=2048,
        api_key=os.getenv("NVIDIA_API_KEY", "not-needed-for-local-nim")
    )

    # Create tools list
    tools = [search]

    # Create a LangGraph ReAct agent using the prebuilt function
    # This creates a StateGraph with agent and tool nodes automatically
    graph = create_react_agent(
        model=llm,
        tools=tools,
    )

    async def _response_fn(input_message: str) -> str:
        response = graph.invoke({"messages": [("user", input_message)]})
        final_message = response["messages"][-1]
        return final_message.content

    yield FunctionInfo.from_fn(_response_fn, description="A simple LangGraph agent capable of basic internet search")


Overwriting langgraph_agent_workflow/src/langgraph_agent_workflow/langgraph_agent_workflow.py


As you can see above, this is almost the exact same code as your 'existing' LangGraph agent, but has been refactored to fit within a NAT function registration.

The only differences are:
1. The definition of a closure function `_response_fn` which captures the instantiated graph and uses that to invoke the agent and return the response
2. The use of the `@register_function` decorator
3. The async function signature for NAT compatibility

**Key Difference from LangChain Migration**:
- LangChain agents use `agent_executor.invoke({"input": ..., "chat_history": []})` and return `response["output"]`
- LangGraph agents use `graph.invoke({"messages": [("user", ...)]})` and return `response["messages"][-1].content`

We can also simplify the workflow configuration to:


In [None]:
%%writefile langgraph_agent_workflow/configs/config.yml
workflow:
  _type: langgraph_agent_workflow
  
# Note: Using local NIM deployment


Overwriting langgraph_agent_workflow/configs/config.yml


Then we can run the new workflow:


In [46]:
!nat run --config_file langgraph_agent_workflow/configs/config.yml --input "Who won the last World Cup?"


2025-12-04 11:05:01 - INFO     - nat.cli.commands.start:192 - Starting NAT from config file: 'langgraph_agent_workflow/configs/config.yml'

Configuration Summary:
--------------------
Workflow Type: langgraph_agent_workflow
Number of Functions: 0
Number of Function Groups: 0
Number of LLMs: 0
Number of Embedders: 0
Number of Memory: 0
Number of Object Stores: 0
Number of Retrievers: 0
Number of TTC Strategies: 0
Number of Authentication Providers: 0

2025-12-04 11:05:16 - INFO     - nat.front_ends.console.console_front_end_plugin:102 - --------------------------------------------------
[32mWorkflow Result:
['The current World Cup holder is the Argentina national team, who defeated the France national team in the 2022 World Cup final in Qatar with a score of 3-3 (4-2 pens).'][39m
--------------------------------------------------
[0m[0m

<a id="migration-part-2"></a>
## 2.2) Migration Part 2: Making Your Agent Configurable

Now that we have a working NAT workflow, let's make it more configurable. We'll parameterize the model name, temperature, and other settings so they can be controlled through the YAML configuration file.

This makes the agent more flexible and easier to experiment with different configurations without changing the code.

**Improving Response Quality**: If you noticed vague responses in the previous run, we'll address this by:
- Increasing `max_search_results` from 2 to 5 (more context)
- Raising `temperature` from 0.0 to 0.2 (less conservative reasoning)
- Increasing `max_tokens` from 1024 to 2048 (fuller responses)
- Enabling `verbose` mode to see the agent's reasoning process


In [None]:
%%writefile langgraph_agent_workflow/src/langgraph_agent_workflow/langgraph_agent_workflow.py
import logging


from pydantic import Field

from nat.builder.builder import Builder
from nat.builder.framework_enum import LLMFrameworkEnum
from nat.builder.function_info import FunctionInfo
from nat.cli.register_workflow import register_function
from nat.data_models.function import FunctionBaseConfig

logger = logging.getLogger(__name__)


class LangGraphAgentWorkflowFunctionConfig(FunctionBaseConfig, name="langgraph_agent_workflow"):
    """Configuration for the LangGraph agent workflow."""
    model_name: str = Field(
        default="meta-llama/llama-3.1-8b-instruct",
        description="The name of the LLM model to use"
    )
    base_url: str = Field(
        default="http://0.0.0.0:8000/v1",
        description="Base URL for the local NIM endpoint"
    )
    temperature: float = Field(
        default=0.2,
        description="Temperature for LLM sampling",
        ge=0.0,
        le=1.0
    )
    max_tokens: int = Field(
        default=2048,
        description="Maximum number of completion tokens in the response",
        gt=0
    )
    max_search_results: int = Field(
        default=5,
        description="Maximum number of search results to retrieve",
        gt=0,
        le=10
    )
    verbose: bool = Field(
        default=True,
        description="Enable verbose logging"
    )


@register_function(config_type=LangGraphAgentWorkflowFunctionConfig, framework_wrappers=[LLMFrameworkEnum.LANGCHAIN])
async def langgraph_agent_workflow_function(config: LangGraphAgentWorkflowFunctionConfig, _builder: Builder):
    import os

    from langchain_nvidia_ai_endpoints import ChatNVIDIA
    from langchain_tavily import TavilySearch
    from langgraph.prebuilt import create_react_agent

    if config.verbose:
        logger.info(f"Initializing LangGraph agent with model: {config.model_name}")

    # Initialize a tool to search the web
    search = TavilySearch(
        max_results=config.max_search_results,
        api_key=os.getenv("TAVILY_API_KEY")
    )

    # Initialize a LLM client with configurable parameters (using local NIM)
    llm = ChatNVIDIA(
        model=config.model_name,
        base_url=config.base_url,
        temperature=config.temperature,
        max_tokens=config.max_tokens,
        api_key=os.getenv("NVIDIA_API_KEY", "not-needed-for-local-nim")
    )

    # Create tools list
    tools = [search]

    # Create a LangGraph ReAct agent
    graph = create_react_agent(
        model=llm,
        tools=tools,
    )

    async def _response_fn(input_message: str) -> str:
        """Execute the LangGraph agent and return the response."""
        if config.verbose:
            logger.info(f"Processing input: {input_message}")
        
        response = graph.invoke({"messages": [("user", input_message)]})
        final_message = response["messages"][-1]
        
        if config.verbose:
            logger.info(f"Generated response: {final_message.content}")
        
        return final_message.content

    yield FunctionInfo.from_fn(_response_fn, description="A configurable LangGraph agent capable of internet search")


Overwriting langgraph_agent_workflow/src/langgraph_agent_workflow/langgraph_agent_workflow.py


Now we can create a more detailed configuration file that takes advantage of these parameters:


In [48]:
%%writefile langgraph_agent_workflow/configs/config.yml
workflow:
  _type: langgraph_agent_workflow
  model_name: meta/llama-3.3-70b-instruct
  temperature: 0.2
  max_completion_tokens: 2048
  max_search_results: 5
  verbose: true


Overwriting langgraph_agent_workflow/configs/config.yml


We need to reinstall the workflow for the changes to take effect:


In [49]:
!nat workflow reinstall langgraph_agent_workflow


Reinstalling workflow 'langgraph_agent_workflow'...


Workflow 'langgraph_agent_workflow' reinstalled successfully.
[0m[0m

Let's test the updated configurable workflow:


In [50]:
!nat run --config_file langgraph_agent_workflow/configs/config.yml --input "Who won the last World Cup?"


2025-12-04 11:05:22 - INFO     - nat.cli.commands.start:192 - Starting NAT from config file: 'langgraph_agent_workflow/configs/config.yml'
2025-12-04 11:05:22 - INFO     - langgraph_agent_workflow.langgraph_agent_workflow:53 - Initializing LangGraph agent with model: meta/llama-3.3-70b-instruct

Configuration Summary:
--------------------
Workflow Type: langgraph_agent_workflow
Number of Functions: 0
Number of Function Groups: 0
Number of LLMs: 0
Number of Embedders: 0
Number of Memory: 0
Number of Object Stores: 0
Number of Retrievers: 0
Number of TTC Strategies: 0
Number of Authentication Providers: 0

2025-12-04 11:05:22 - INFO     - langgraph_agent_workflow.langgraph_agent_workflow:81 - Processing input: Who won the last World Cup?
2025-12-04 11:05:24 - INFO     - langgraph_agent_workflow.langgraph_agent_workflow:87 - Generated response: The current World Cup holder is the Argentina national team. They defeated the France national team in the 2022 World Cup final in Qatar with a sc

<a id="migration-part-3"></a>
## 2.3) Migration Part 3: Integration with NeMo Agent Toolkit

Now let's take it a step further and integrate the LangGraph agent with other NAT components. We can use NAT's built-in LLM management and make the agent use NAT-managed LLMs instead of directly instantiating them.

This provides several benefits:
- Better observability and tracing
- Consistent LLM usage across workflows
- Easy model switching through configuration
- Integration with NAT's profiling and optimization tools


In [51]:
%%writefile langgraph_agent_workflow/src/langgraph_agent_workflow/langgraph_agent_workflow.py
import logging

from pydantic import Field

from nat.builder.builder import Builder
from nat.builder.framework_enum import LLMFrameworkEnum
from nat.builder.function_info import FunctionInfo
from nat.cli.register_workflow import register_function
from nat.data_models.component_ref import LLMRef
from nat.data_models.function import FunctionBaseConfig

logger = logging.getLogger(__name__)


class LangGraphAgentWorkflowFunctionConfig(FunctionBaseConfig, name="langgraph_agent_workflow"):
    """Configuration for the LangGraph agent workflow integrated with NAT."""
    llm_name: LLMRef = Field(
        description="Reference to the NAT-managed LLM to use for the agent"
    )
    max_search_results: int = Field(
        default=2,
        description="Maximum number of search results to retrieve",
        gt=0,
        le=10
    )
    verbose: bool = Field(
        default=False,
        description="Enable verbose logging"
    )


@register_function(config_type=LangGraphAgentWorkflowFunctionConfig, framework_wrappers=[LLMFrameworkEnum.LANGCHAIN])
async def langgraph_agent_workflow_function(config: LangGraphAgentWorkflowFunctionConfig, builder: Builder):
    import os

    from langchain_tavily import TavilySearch
    from langgraph.prebuilt import create_react_agent

    if config.verbose:
        logger.info(f"Initializing LangGraph agent with NAT-managed LLM: {config.llm_name}")

    # Get the LLM from NAT's builder with LangChain wrapper
    llm = await builder.get_llm(config.llm_name, wrapper_type=LLMFrameworkEnum.LANGCHAIN)

    # Initialize a tool to search the web
    search = TavilySearch(
        max_results=config.max_search_results,
        api_key=os.getenv("TAVILY_API_KEY")
    )

    # Create tools list
    tools = [search]

    # Create a LangGraph ReAct agent using NAT-managed LLM
    graph = create_react_agent(
        model=llm,
        tools=tools,
    )

    async def _response_fn(input_message: str) -> str:
        """Execute the LangGraph agent and return the response."""
        if config.verbose:
            logger.info(f"Processing input: {input_message}")
        
        response = graph.invoke({"messages": [("user", input_message)]})
        final_message = response["messages"][-1]
        
        if config.verbose:
            logger.info(f"Generated response: {final_message.content}")
        
        return final_message.content

    yield FunctionInfo.from_fn(
        _response_fn, 
        description="A NAT-integrated LangGraph agent capable of internet search using Tavily"
    )


Overwriting langgraph_agent_workflow/src/langgraph_agent_workflow/langgraph_agent_workflow.py


---
**‚öôÔ∏è Local NIM Quick Check:**

Before running the workflow, ensure your local NIM is accessible:

```bash
# Quick health check
curl http://0.0.0.0:8000/v1/models

# Test inference
curl -X POST http://0.0.0.0:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model": "meta-llama/llama-3.1-8b-instruct", "messages": [{"role":"user","content":"Hi"}], "max_tokens": 10}'
```

---


Now update the configuration to use NAT's LLM management:


In [None]:
%%writefile langgraph_agent_workflow/configs/config.yml
llms:
  nim_llm:
    _type: nim
    model_name: meta-llama/llama-3.1-8b-instruct
    base_url: http://0.0.0.0:8000/v1
    temperature: 0.2
    max_tokens: 2048

workflow:
  _type: langgraph_agent_workflow
  llm_name: nim_llm
  max_search_results: 5
  verbose: true


Overwriting langgraph_agent_workflow/configs/config.yml


Reinstall and test:


In [53]:
!nat workflow reinstall langgraph_agent_workflow


Reinstalling workflow 'langgraph_agent_workflow'...
Workflow 'langgraph_agent_workflow' reinstalled successfully.
[0m[0m

In [54]:
!nat run --config_file langgraph_agent_workflow/configs/config.yml --input "Who won the last World Cup?"


2025-12-04 11:05:30 - INFO     - nat.cli.commands.start:192 - Starting NAT from config file: 'langgraph_agent_workflow/configs/config.yml'
2025-12-04 11:05:30 - INFO     - langgraph_agent_workflow.langgraph_agent_workflow:40 - Initializing LangGraph agent with NAT-managed LLM: nim_llm

Configuration Summary:
--------------------
Workflow Type: langgraph_agent_workflow
Number of Functions: 0
Number of Function Groups: 0
Number of LLMs: 1
Number of Embedders: 0
Number of Memory: 0
Number of Object Stores: 0
Number of Retrievers: 0
Number of TTC Strategies: 0
Number of Authentication Providers: 0

2025-12-04 11:05:30 - INFO     - langgraph_agent_workflow.langgraph_agent_workflow:63 - Processing input: Who won the last World Cup?
2025-12-04 11:05:32 - INFO     - langgraph_agent_workflow.langgraph_agent_workflow:69 - Generated response: The winner of the last World Cup was Argentina. They defeated France in the 2022 World Cup final with a score of 3-3 (4-2 pens).
2025-12-04 11:05:32 - INFO 

<a id="migration-part-4"></a>
## 2.4) Migration Part 4: A Zero-Code Configuration

Now that we have a fully integrated LangGraph agent, we can leverage NAT's configuration system to easily switch between different LLMs, adjust parameters, or even compose multiple agents together, all through YAML configuration.

For example, you can easily test different models:


In [None]:
%%writefile langgraph_agent_workflow/configs/config_8b.yml
llms:
  nim_llm:
    _type: nim
    model_name: meta-llama/llama-3.1-8b-instruct
    base_url: http://0.0.0.0:8000/v1
    temperature: 0.2
    max_tokens: 2048

workflow:
  _type: langgraph_agent_workflow
  llm_name: nim_llm
  max_search_results: 5
  verbose: true

Overwriting langgraph_agent_workflow/configs/config_8b.yml


In [56]:
!nat run --config_file langgraph_agent_workflow/configs/config_8b.yml --input "Who won the last World Cup?"


2025-12-04 11:05:34 - INFO     - nat.cli.commands.start:192 - Starting NAT from config file: 'langgraph_agent_workflow/configs/config_8b.yml'
2025-12-04 11:05:34 - INFO     - langgraph_agent_workflow.langgraph_agent_workflow:40 - Initializing LangGraph agent with NAT-managed LLM: nim_llm

Configuration Summary:
--------------------
Workflow Type: langgraph_agent_workflow
Number of Functions: 0
Number of Function Groups: 0
Number of LLMs: 1
Number of Embedders: 0
Number of Memory: 0
Number of Object Stores: 0
Number of Retrievers: 0
Number of TTC Strategies: 0
Number of Authentication Providers: 0

2025-12-04 11:05:34 - INFO     - langgraph_agent_workflow.langgraph_agent_workflow:63 - Processing input: Who won the last World Cup?
2025-12-04 11:05:37 - INFO     - langgraph_agent_workflow.langgraph_agent_workflow:69 - Generated response: The current World Cup holder is the Argentina national team. They defeated the French national team in the 2022 World Cup final in Qatar with a score of 

You can also serve your LangGraph agent as an API endpoint:


## Troubleshooting: Improving Response Quality

### Problem: Vague or Incomplete Responses

If you see responses like *"The winner of the last World Cup was not specified in the search results"* when the information is available, here's why and how to fix it:

#### Common Causes:

1. **Too Few Search Results** (max_results=2)
   - Only 2 results may miss key information
   - **Solution**: Increase to 5-10 results

2. **Temperature Too Low** (temperature=0.0)
   - Makes the model overly conservative
   - **Solution**: Use 0.2-0.3 for better reasoning

3. **Token Limits** (max_tokens=1024)
   - May cut off the agent's reasoning
   - **Solution**: Increase to 2048+

4. **Search Tool Limitations**
   - Tavily may not always return the best results
   - **Solution**: Try different queries or add multiple search tools

#### Quick Fix Example:

```python
# Instead of:
search = TavilySearch(max_results=2)
llm = ChatNVIDIA(model="...", temperature=0.0, max_tokens=1024)

# Use:
search = TavilySearch(max_results=5)
llm = ChatNVIDIA(model="...", temperature=0.2, max_tokens=2048)
```

#### In YAML Config:

```yaml
workflow:
  _type: langgraph_agent_workflow
  model_name: meta/llama-3.3-70b-instruct
  temperature: 0.2        # Higher for better reasoning
  max_tokens: 2048        # More room for complete answers
  max_search_results: 5   # More context
  verbose: true           # See what's happening
```

#### Alternative: Add a Custom System Prompt

```python
from langgraph.prebuilt import create_react_agent

system_message = """You are a helpful assistant with access to search tools.
When you find information in search results, provide direct, complete answers.
Always cite the source of your information."""

graph = create_react_agent(
    model=llm,
    tools=tools,
    messages_modifier=system_message
)
```

This is particularly important for LangGraph agents compared to LangChain's AgentExecutor, as the default system prompts may differ!


In [57]:
# Uncomment to serve the agent (this will run in the background)
# !nat serve --config_file langgraph_agent_workflow/configs/config.yml --host 0.0.0.0 --port 8000
# Then visit http://localhost:8000/docs for the API documentation


<a id="next-steps"></a>
# 3) Next Steps

Congratulations! You've successfully integrated a LangGraph agent with the NeMo Agent Toolkit. Here are some next steps to explore:

## Advanced LangGraph Features

1. **Add Custom Tools**: Extend your agent with custom tools beyond web search
   ```python
   from langchain.tools import tool
   
   @tool
   def custom_calculator(expression: str) -> str:
       """Evaluate a mathematical expression."""
       return str(eval(expression))
   
   tools = [search, custom_calculator]
   ```

2. **Build Custom Graphs**: Create specialized workflows with custom state
   ```python
   from langgraph.graph import StateGraph, START, END
   from typing import TypedDict, Annotated
   from langgraph.graph.message import add_messages
   
   class AgentState(TypedDict):
       messages: Annotated[list, add_messages]
       context: str
   
   graph = StateGraph(AgentState)
   graph.add_node("planner", planner_node)
   graph.add_node("executor", executor_node)
   # Add edges and compile
   ```

3. **Multi-Agent Systems**: Compose multiple LangGraph agents together
4. **Human-in-the-Loop**: Add approval steps in your graph
5. **Conditional Routing**: Use conditional edges for complex logic

## NAT Integration Features

6. **Observability**: Add tracing and monitoring
   ```yaml
   tracing:
     _type: phoenix
     endpoint: http://localhost:6006
   
   workflow:
     _type: langgraph_agent_workflow
     llm_name: nim_llm
     tracing_name: phoenix
   ```

7. **Memory Integration**: Add persistent memory
   ```yaml
   memory:
     _type: redis
     host: localhost
     port: 6379
   
   workflow:
     _type: langgraph_agent_workflow
     llm_name: nim_llm
     memory_name: redis
   ```

8. **Evaluation**: Use NAT's evaluation tools
   ```bash
   nat eval --config_file config.yml --dataset eval_dataset.json
   ```

## Key Differences: LangGraph vs LangChain Agents

| Feature | LangChain Agent | LangGraph Agent |
|---------|----------------|-----------------|
| **Architecture** | AgentExecutor | StateGraph |
| **State Management** | Dict-based | Message-based with custom state |
| **Input Format** | `{"input": ..., "chat_history": []}` | `{"messages": [("user", ...)]}` |
| **Output Format** | `response["output"]` | `response["messages"][-1].content` |
| **Flexibility** | Limited | High (custom nodes/edges) |
| **Multi-Agent** | Difficult | Native support |
| **Human-in-Loop** | Manual implementation | Built-in support |
| **Conditional Logic** | Limited | Full control with conditional edges |
| **Production Use** | Good for simple cases | Better for complex workflows |

## When to Use LangGraph

- **Complex agent workflows** with conditional logic
- **Multi-agent collaboration** scenarios
- When you need **fine-grained control** over execution
- **Human-in-the-loop** workflows
- Building **production-grade agentic applications**
- **State management** across multiple steps
- **Parallel execution** of independent tasks

## Additional Resources

- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
- [LangGraph Tutorials](https://langchain-ai.github.io/langgraph/tutorials/)
- [NeMo Agent Toolkit Documentation](https://docs.nvidia.com/nemo-agent-toolkit/)
- [NVIDIA NIM](https://docs.nvidia.com/nim/)
- [LangGraph Examples](https://github.com/langchain-ai/langgraph/tree/main/examples)

## Summary

This notebook demonstrated three progressive levels of LangGraph integration with NAT:

1. **V1 (Basic)**: Quick wrap of existing agent with minimal changes
2. **V2 (Configurable)**: Parameters exposed in YAML for easy experimentation
3. **V3 (Full Integration)**: NAT-managed components for production use

Choose the version that best fits your needs, and progressively enhance your integration as your requirements grow!


# 4) Adding Observability and Profiling

Now that you have a working LangGraph agent, let's add observability and profiling to understand how your agent is performing and where time is being spent.

## 4.1) Observability with Phoenix

Phoenix provides real-time tracing and visualization of your agent's execution, showing each step, token usage, and latency.

### Step 1: Install Phoenix


In [58]:
%%bash
# Install Phoenix server and NAT integration
uv pip show -q "arize-phoenix"
if [ $? -ne 0 ]; then
    echo "Installing Phoenix server..."
    uv pip install arize-phoenix
fi

uv pip show -q "nvidia-nat-phoenix"
if [ $? -ne 0 ]; then
    echo "Installing NAT Phoenix integration..."
    uv pip install "nvidia-nat[phoenix]"
fi

echo "‚úÖ Phoenix installation complete!"
echo "Phoenix version: $(python -c 'import phoenix; print(phoenix.__version__)' 2>/dev/null || echo 'installed')"


‚úÖ Phoenix installation complete!


Phoenix version: 12.19.0


### Step 2: Start Phoenix Server

Phoenix runs as a local server that collects and visualizes traces from your agent.

**Important**: Phoenix must be started in a **separate terminal** before running your agent.


In [59]:
%env PHOENIX_HOST=0.0.0.0


env: PHOENIX_HOST=0.0.0.0


Finally, we will start the Phoenix server in the background:


In [60]:
%%bash --bg
# Phoenix will run on port 6006
phoenix serve


### ‚ö†Ô∏è Important: Don't Run `!phoenix serve` in Notebook

**Why not:**
- It's a long-running server that will block the notebook cell
- You won't be able to run other cells
- The server stops when you interrupt the cell

**Instead:** Use a separate terminal (see instructions above) or use the background method below:


In [61]:
import subprocess
import time
import requests

# Start Phoenix in the background
try:
    # Kill any existing Phoenix processes
    subprocess.run(["pkill", "-f", "phoenix"], stderr=subprocess.DEVNULL)
    time.sleep(2)
    
    # Start Phoenix in background
    phoenix_process = subprocess.Popen(
        ["phoenix", "serve"],
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        cwd="/home/ubuntu/NeMo-Agent-Toolkit/examples/notebooks"
    )
    
    # Wait for Phoenix to start
    print("Starting Phoenix...")
    for i in range(10):
        try:
            response = requests.get("http://localhost:6006", timeout=1)
            if response.status_code == 200:
                print("‚úÖ Phoenix started successfully!")
                print("üåê Access UI at: http://localhost:6006")
                print(f"üìä Process ID: {phoenix_process.pid}")
                break
        except:
            time.sleep(1)
            print(f"   Waiting... ({i+1}/10)")
    else:
        print("‚ö†Ô∏è  Phoenix may not have started. Check manually.")
        
except FileNotFoundError:
    print("‚ùå 'phoenix' command not found.")
    print("\nPlease use a separate terminal instead:")
    print("  cd /home/ubuntu/NeMo-Agent-Toolkit/examples/notebooks")
    print("  python -m phoenix.server.main serve")


Starting Phoenix...
   Waiting... (1/10)
   Waiting... (2/10)
   Waiting... (3/10)
‚úÖ Phoenix started successfully!
üåê Access UI at: http://localhost:6006
üìä Process ID: 306342


**To stop Phoenix later:**
```python
import subprocess
subprocess.run(["pkill", "-f", "phoenix"])
print("Phoenix stopped")
```

**To check if Phoenix is running:**
```python
import requests
try:
    response = requests.get("http://localhost:6006", timeout=1)
    print(f"‚úÖ Phoenix is running (status: {response.status_code})")
except:
    print("‚ùå Phoenix is not running")
```


### ‚úÖ Simplified Approach (Most Reliable)

**Step-by-step Phoenix setup:**

1. **Open a NEW terminal window**

2. **Navigate to the notebook directory:**
   ```bash
   cd /home/ubuntu/NeMo-Agent-Toolkit/examples/notebooks
   ```

3. **Start Phoenix:**
   ```bash
   phoenix serve
   ```
   
   If that doesn't work, try:
   ```bash
   python -m phoenix.server.main serve
   ```

4. **You should see:**
   ```
   Phoenix server running on http://127.0.0.1:6006
   ```

5. **Open in browser:** http://localhost:6006

6. **Keep that terminal open** - closing it stops Phoenix

**That's it!** Phoenix is now ready to collect traces. Proceed to the next cell to configure your agent.


### Step 3: Update Config with Tracing

Now we'll update the workflow configuration to enable Phoenix tracing. We'll append the telemetry section to the existing config:


In [62]:
!cp langgraph_agent_workflow/configs/config.yml langgraph_agent_workflow/configs/config_with_tracing.yml


In [63]:
%%writefile -a langgraph_agent_workflow/configs/config_with_tracing.yml

general:
  telemetry:
    logging:
      console:
        _type: console
        level: WARN
    tracing:
      phoenix:
        _type: phoenix
        endpoint: http://localhost:6006/v1/traces
        project: langgraph_agent


Appending to langgraph_agent_workflow/configs/config_with_tracing.yml


### Step 4: Run Agent with Tracing

Now run your agent and traces will be automatically sent to Phoenix:


In [64]:
!nat run --config_file langgraph_agent_workflow/configs/config_with_tracing.yml --input "Who won the last World Cup?"


2025-12-04 11:05:46 - INFO     - nat.cli.commands.start:192 - Starting NAT from config file: 'langgraph_agent_workflow/configs/config_with_tracing.yml'

Configuration Summary:
--------------------
Workflow Type: langgraph_agent_workflow
Number of Functions: 0
Number of Function Groups: 0
Number of LLMs: 1
Number of Embedders: 0
Number of Memory: 0
Number of Object Stores: 0
Number of Retrievers: 0
Number of TTC Strategies: 0
Number of Authentication Providers: 0

--------------------------------------------------
[32mWorkflow Result:
['The current World Cup holder is the Argentina national team. They defeated the France national team in the 2022 World Cup final in Qatar with a score of 3-3 (4-2 pens).'][39m
--------------------------------------------------
[0m[0m

### Step 5: View Traces in Phoenix UI

After running the agent, open the Phoenix UI in your browser to see:

1. **Visit**: http://localhost:6006
2. **View**:
   - Complete trace of agent execution
   - Each LLM call with prompts and responses
   - Tool calls and their results
   - Token usage per step
   - Latency breakdown
   - Full execution timeline

**What You'll See:**
- üîç **Spans**: Each operation (LLM call, tool call) as a span
- ‚è±Ô∏è **Timing**: How long each step took
- üìä **Token Usage**: Input/output tokens per LLM call
- üîó **Flow**: Visual graph of agent execution
- üìù **Prompts & Responses**: Full text of all interactions

This is incredibly valuable for:
- Debugging agent behavior
- Optimizing performance
- Understanding token usage
- Identifying bottlenecks


### Step 2: Create Evaluation Dataset

Profiling in NAT works through the `nat eval` command. First, create a simple evaluation dataset:


In [65]:
%%writefile langgraph_agent_workflow/data/eval_data.json
[
    {
        "id": "1",
        "question": "Who won the last World Cup?",
        "answer": "Argentina won the 2022 FIFA World Cup. They defeated France in the final with a score of 3-3 (4-2 on penalties) in Qatar on December 18, 2022."
    },
    {
        "id": "2",
        "question": "What year did the last World Cup take place?",
        "answer": "The last FIFA World Cup took place in 2022. It was held in Qatar from November 21 to December 18, 2022."
    },
    {
        "id": "3",
        "question": "Which country hosted the 2022 World Cup?",
        "answer": "Qatar hosted the 2022 FIFA World Cup. It was the first World Cup held in the Middle East and the first held in November-December rather than the traditional June-July timeframe."
    }
]


Overwriting langgraph_agent_workflow/data/eval_data.json


### Step 3: Update Config with Profiler Settings

Add profiler configuration to enable detailed performance analysis:


## 4.2) Profiling with NAT's Built-in Profiler

NAT includes built-in profiling that measures CPU time, memory usage, and execution time for each component.

### Enable Profiling


In [66]:
%%bash
# Install profiling dependencies
uv pip show -q "memray"
if [ $? -ne 0 ]; then
    uv pip install "nvidia-nat[profiling]"
else
    echo "Profiling tools are already installed"
fi


[2mUsing Python 3.12.12 environment at: /home/ubuntu/.venv[0m
[2mAudited [1m1 package[0m [2min 9ms[0m[0m


In [None]:
%%writefile langgraph_agent_workflow/configs/config_with_profiling.yml
llms:
  nim_llm:
    _type: nim
    model_name: meta-llama/llama-3.1-8b-instruct
    base_url: http://0.0.0.0:8000/v1
    temperature: 0.2
    max_tokens: 2048

general:
  telemetry:
    logging:
      console:
        _type: console
        level: INFO
    tracing:
      phoenix:
        _type: phoenix
        endpoint: http://localhost:6006/v1/traces
        project: langgraph_agent

workflow:
  _type: langgraph_agent_workflow
  llm_name: nim_llm
  max_search_results: 5
  verbose: true

eval:
  general:
    output_dir: ./profile_output
    verbose: true
    dataset:
      _type: json
      file_path: ./langgraph_agent_workflow/data/eval_data.json
    
    profiler:
      token_uniqueness_forecast: true
      workflow_runtime_forecast: true
      compute_llm_metrics: true
      csv_exclude_io_text: true


Overwriting langgraph_agent_workflow/configs/config_with_profiling.yml


### Create Config with Profiling

Create a complete config file that includes workflow, tracing, and evaluation with profiling:


---

## üîß Troubleshooting: Python Environment Issues

### Problem: "No module named pip"

This error occurs when pip is not installed in your virtual environment.

**Quick Fix (Run in notebook):**

```python
import sys
!{sys.executable} -m ensurepip --default-pip
!{sys.executable} -m pip install --upgrade pip
```

**Permanent Fix (Run in terminal):**

```bash
# 1. Navigate to your project directory
cd /path/to/your/project

# 2. Deactivate current venv if active
deactivate

# 3. Remove old venv
rm -rf .venv

# 4. Create new venv with system packages
python3 -m venv .venv --system-site-packages

# 5. Activate it
source .venv/bin/activate  # macOS/Linux
# OR
.venv\Scripts\activate  # Windows

# 6. Ensure pip is installed
python -m ensurepip --upgrade
python -m pip install --upgrade pip

# 7. Install Jupyter in the venv
pip install jupyter ipykernel

# 8. Register the kernel
python -m ipykernel install --user --name=nat-env --display-name="Python (NAT)"

# 9. Restart Jupyter and select the "Python (NAT)" kernel
```

**Verify Installation:**

```python
import sys
import pip
print(f"Python: {sys.executable}")
print(f"pip version: {pip.__version__}")
```

### Problem: Wrong Python Environment

**Symptoms:**
- Packages installed but not found
- Import errors after installation

**Solution:**

1. **Check which Python is being used:**
   ```python
   import sys
   print(sys.executable)
   ```

2. **In Jupyter, always use:**
   ```python
   import sys
   !{sys.executable} -m pip install package_name
   ```
   Instead of:
   ```python
   %pip install package_name
   # or
   !pip install package_name
   ```

### Problem: Virtual Environment Not Activated

**Check if in venv:**
```python
import os
print(os.getenv('VIRTUAL_ENV', 'Not in a virtual environment'))
```

**Activate venv in terminal:**
```bash
source /home/ubuntu/.venv/bin/activate  # Your path
```

---


---

## üîß Troubleshooting Local NIM Issues

### Common Issues and Solutions

#### 1. Connection Refused / Cannot Connect

**Symptoms:**
```
ConnectionError: Cannot connect to local NIM at http://0.0.0.0:8000
```

**Solutions:**
- Verify NIM is running: `docker ps | grep nim` or check your NIM process
- Try using `localhost` instead of `0.0.0.0`:
  ```yaml
  base_url: http://localhost:8000/v1
  ```
- Check firewall settings
- Verify the port (8000) is correct for your NIM deployment

#### 2. Model Not Found

**Symptoms:**
```
Error: model 'meta-llama/llama-3.1-8b-instruct' not found
```

**Solutions:**
- List available models:
  ```bash
  curl http://0.0.0.0:8000/v1/models
  ```
- Update `model_name` in config to match an available model
- Ensure your NIM container has the model downloaded

#### 3. Slow Response Times

**Symptoms:**
- Requests taking longer than expected
- Timeouts

**Solutions:**
- Check GPU utilization: `nvidia-smi`
- Reduce `max_tokens` in config
- Ensure NIM has enough GPU memory
- Check if other processes are using the GPU

#### 4. Authentication Errors

**Symptoms:**
```
401 Unauthorized
```

**Solutions:**
- If your NIM requires auth, set proper API key:
  ```python
  os.environ["NVIDIA_API_KEY"] = "your-local-nim-key"
  ```
- Or configure it in the LLM config:
  ```yaml
  llms:
    nim_llm:
      api_key: your-local-key
  ```

#### 5. Testing NIM Directly (Bypass NAT)

If workflows aren't working, test the NIM directly:

```python
import requests

response = requests.post(
    'http://0.0.0.0:8000/v1/chat/completions',
    json={
        "model": "meta-llama/llama-3.1-8b-instruct",
        "messages": [{"role": "user", "content": "Test"}],
        "max_tokens": 50
    }
)
print(response.json())
```

### Performance Tuning for Local NIM

**For Better Throughput:**
```yaml
llms:
  nim_llm:
    model_name: meta-llama/llama-3.1-8b-instruct
    base_url: http://0.0.0.0:8000/v1
    temperature: 0.7
    max_tokens: 1024  # Reduce for faster responses
    top_p: 0.9
```

**For Better Quality (Slower):**
```yaml
llms:
  nim_llm:
    model_name: meta-llama/llama-3.1-8b-instruct
    base_url: http://0.0.0.0:8000/v1
    temperature: 0.2  # More focused
    max_tokens: 4096  # Longer responses
    top_p: 0.95
```

### Monitoring Your Local NIM

**Check NIM Logs:**
```bash
# If running in Docker
docker logs <nim-container-id>

# Check GPU usage
watch -n 1 nvidia-smi
```

**Monitor Performance:**
- Use the profiling features in this notebook (Cell 71+)
- Enable verbose logging in configs
- Use Phoenix tracing to see request/response times

---


### Run Evaluation with Profiling

Run the evaluation which will automatically generate profiling data:


In [68]:
!mkdir -p langgraph_agent_workflow/data
!nat eval --config_file langgraph_agent_workflow/configs/config_with_profiling.yml


2025-12-04 11:06:08 - INFO     - nat.eval.evaluate:446 - Starting evaluation run with config file: langgraph_agent_workflow/configs/config_with_profiling.yml
2025-12-04 11:06:08 - INFO     - phoenix.config:1750 - üìã Ensuring phoenix working directory: /home/ubuntu/.phoenix
2025-12-04 11:06:08 - INFO     - phoenix.inferences.inferences:112 - Dataset: phoenix_inferences_3e86ea26-8d55-4f3e-8539-cc3842854e18 initialized
2025-12-04 11:06:10 - INFO     - langgraph_agent_workflow.langgraph_agent_workflow:40 - Initializing LangGraph agent with NAT-managed LLM: nim_llm
Running workflow:   0%|                                   | 0/3 [00:00<?, ?it/s]2025-12-04 11:06:10 - INFO     - nat.observability.exporter_manager:269 - Started exporter 'phoenix'
2025-12-04 11:06:10 - INFO     - nat.observability.exporter_manager:269 - Started exporter 'phoenix'
2025-12-04 11:06:10 - INFO     - nat.observability.exporter_manager:269 - Started exporter 'phoenix'
2025-12-04 11:06:10 - INFO     - langgraph_agent

### Step 4: View Profiling Results

After the evaluation completes, check the profiling output:


In [69]:
!echo "=== Profiling Output Files ==="
!ls -lh ./profile_output/ 2>/dev/null || echo "Run the evaluation cell above first"
!echo ""
!echo "Expected files:"
!echo "  - all_requests_profiler_traces.json  (Raw LLM traces)"
!echo "  - inference_optimization.json        (Performance metrics)"
!echo "  - standardized_data_all.csv          (Token usage data)"


=== Profiling Output Files ===
total 296K
-rw-rw-r-- 1 ubuntu ubuntu 212K Dec  4 11:07 all_requests_profiler_traces.json
-rw-rw-r-- 1 ubuntu ubuntu 1.7K Dec  4 11:07 inference_optimization.json
-rw-rw-r-- 1 ubuntu ubuntu 6.3K Dec  4 11:07 standardized_data_all.csv
-rw-rw-r-- 1 ubuntu ubuntu  70K Dec  4 11:07 workflow_output.json

Expected files:


  - all_requests_profiler_traces.json  (Raw LLM traces)
  - inference_optimization.json        (Performance metrics)
  - standardized_data_all.csv          (Token usage data)


### Understanding the Results

After running evaluation with profiling, you'll find several files in `./profile_output/`:

#### Core Output Files

**1. `all_requests_profiler_traces.json`**
- Raw traces of all LLM interactions
- Tool input and output data
- Runtime measurements for each component
- Complete execution metadata

**2. `inference_optimization.json`**
- Workflow performance metrics with confidence intervals
- 90%, 95%, and 99% confidence intervals for latency
- Throughput statistics
- Workflow runtime predictions
- Token usage forecasts

**3. `standardized_data_all.csv`**
- Standardized usage data in CSV format
- Prompt tokens and completion tokens per request
- LLM input/output text
- Framework information (LangGraph)
- Timing and metadata for each evaluation question

#### Advanced Analysis Files (if enabled in config)

**4. Analysis Reports**
- **Bottleneck analysis**: Identifies slowest components in your workflow
- **Concurrency analysis**: Shows parallel execution opportunities
- **Token uniqueness forecast**: Predicts token efficiency for future queries

### Key Metrics to Watch

| Metric | Description | Where to Find |
|--------|-------------|---------------|
| **Total Latency** | End-to-end response time | `inference_optimization.json` |
| **Token Usage** | Input/output tokens per request | `standardized_data_all.csv` |
| **LLM Time** | Time spent in LLM calls | `all_requests_profiler_traces.json` |
| **Tool Time** | Time spent in search/tools | Trace JSON, individual tool spans |
| **Cost Estimate** | Approximate API costs | Calculate from token counts |

### Example: Viewing Results

```python
import json
import pandas as pd

# View optimization metrics
with open('./profile_output/inference_optimization.json') as f:
    metrics = json.load(f)
    print(f"Average latency: {metrics.get('avg_latency', 'N/A')}s")

# View detailed usage data
df = pd.read_csv('./profile_output/standardized_data_all.csv')
print(f"Total tokens used: {df['prompt_tokens'].sum() + df['completion_tokens'].sum()}")
print(f"Average response time: {df['execution_time'].mean():.2f}s")
```

### Optimization Tips

1. **High LLM Time**: 
   - Use smaller models (8B instead of 70B)
   - Reduce max_tokens
   - Cache common queries

2. **High Tool Time**:
   - Reduce max_search_results
   - Use faster search APIs
   - Implement tool result caching

3. **High Memory**:
   - Reduce conversation history
   - Clear unused variables
   - Use streaming responses

4. **High Token Usage**:
   - Optimize prompts
   - Reduce search result content
   - Use more focused tool descriptions


## 4.3) Combined: Observability + Profiling Best Practices

### Recommended Development Workflow

1. **Development** (Local):
   - Enable verbose logging
   - Use Phoenix for tracing
   - Profile periodically

2. **Testing** (Pre-Production):
   - Enable profiling for all test runs
   - Monitor memory usage
   - Track token consumption

3. **Production**:
   - Lightweight tracing (sample rate)
   - Continuous performance monitoring
   - Alert on anomalies

### Example Combined Configuration

```yaml
llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.3-70b-instruct
    temperature: 0.2
    max_tokens: 2048

tracing:
  phoenix_tracer:
    _type: phoenix
    endpoint: http://localhost:6006
    sample_rate: 1.0  # 100% in dev, lower in prod

profiling:
  enabled: true
  output_dir: ./profiling_results
  profile_memory: true
  profile_cpu: true

logging:
  level: INFO
  format: json
  output: ./logs/agent.log

workflow:
  _type: langgraph_agent_workflow
  llm_name: nim_llm
  max_search_results: 5
  verbose: true
  tracing_name: phoenix_tracer
```

### Quick Debugging Commands

```bash
# View traces in real-time
open http://localhost:6006

# Check profiling results
ls -lh ./profiling_results/

# View memory profile
memray flamegraph ./profiling_results/memory.bin

# Analyze CPU profile
python -m pstats ./profiling_results/cpu_profile.prof

# Check logs
tail -f ./logs/agent.log | jq '.'
```

### Benefits of Observability + Profiling

‚úÖ **Faster Debugging**: See exactly what the agent is doing  
‚úÖ **Performance Optimization**: Identify bottlenecks quickly  
‚úÖ **Cost Management**: Track token usage and API calls  
‚úÖ **Quality Assurance**: Verify agent behavior  
‚úÖ **Production Readiness**: Monitor health in real-time
