<p> <center> <a href="../start_here.ipynb.ipynb">Home Page</a> </center> </p>

<div>
    <span style="float: left; width: 33%; text-align: left;"><a href="03_low_level_mcp.ipynb">Previous Notebook</a></span>
    <span style="float: left; width: 34%; text-align: center;">
        <a href="01_inference_endpoint.ipynb">1</a>
        <a href="02_introduction_mcp.ipynb">2</a>
        <a href="03_low_level_mcp.ipynb">3</a>
        <a >4</a>
        <a href="05_challenge.ipynb">5</a>
    </span>
    <span style="float: left; width: 33%; text-align: right;"><a href="05_challenge.ipynb">Next Notebook</a></span>
</div>

## Learning objectives

By the end of this notebook, you will be able to:
- Define LangGraph State schemas and build chatbot workflows using StateGraph, nodes, and edges
- Connect NVIDIA NIM endpoints as the LLM backend using the `nvidia` model provider
- Stream graph responses using `graph.stream()` for real-time output
- Implement structured output with Pydantic models for parseable LLM responses

## Setup Environment 

In the first notebook, we learned how to set up our generated `NVIDIA API KEY`. As a requirement for this notebook, you must set up the key as enviroment variable `NVIDIA_API_KEY` to pull the NIMs docker images of your choice. If you haven't gotten your key, please visit the NVIDIA NIMs API [homepage](https://build.nvidia.com/explore/discover) and generate your API Key. Please run the cell below, input your `NVIDIA API KEY` in the display textbox, and press the enter key on your keyboard.

In [None]:
import os
import getpass

if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvapi_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key
    os.environ["NGC_API_KEY"] = nvapi_key

## Introduction to LangGraph

In [1]:
from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START
from langgraph.graph.message import add_messages
from pydantic import BaseModel
from typing import Literal

In [2]:
from langchain.chat_models import init_chat_model
from langchain_nvidia_ai_endpoints import ChatNVIDIA

  from .autonotebook import tqdm as notebook_tqdm


This creates a LangChain chat model connected to NVIDIA NIM. The `init_chat_model()` function handles all the configuration automatically—just specify the model ID and provider, and you're ready to start generating responses.

In [8]:
MODEL_ID = 'meta/llama-3.2-3b-instruct'
llm = init_chat_model(model=MODEL_ID, model_provider="nvidia",max_tokens=32)

In [None]:
# Comment this out if you are using the local endpoint with right local port
# LOCAL_CONTAINER_PORT = 11579
# llm = ChatNVIDIA(base_url="http://0.0.0.0:{}/v1".format(CONTAINER_PORT), model="meta/llama-3.2-3b-instruct")

In [None]:
# llm.get_available_models()

In this section, we'll construct a simple agentic workflow using LangGraph's StateGraph. Here's what we'll do:

1. Create a `StateGraph` with the state schema
2. Add nodes using `add_node(name, function)`
3. Add edges using `add_edge(source, target)`
4. Compile the graph before execution

In [4]:
class State(TypedDict):
    """
    Graph state schema.
    - messages: List of conversation messages with automatic append behavior
    """
    messages: Annotated[list, add_messages]

The State holds the conversation history using the `add_messages` reducer, which automatically appends new messages to the list.

Nodes are Python functions that receive state, perform actions, and return updated state.

In [5]:
def chatbot(state: State):
    """
    Chatbot node that invokes the LLM with conversation history.
    Returns updated state with the assistant's response.
    """
    return {"messages": [llm.invoke(state["messages"])]}

In [6]:
graph_builder = StateGraph(State)

# Add the chatbot node
graph_builder.add_node("chatbot", chatbot)

# Connect START -> chatbot (entry point)
graph_builder.add_edge(START, "chatbot")

# Compile the graph
graph = graph_builder.compile()

Use `graph.invoke()` to get synchronous complete responses

In [9]:
graph.invoke({"messages": [{"role": "user", "content": "What is chicago known for?"}]})

{'messages': [HumanMessage(content='What is chicago known for?', additional_kwargs={}, response_metadata={}, id='59951a82-5f27-41ee-827c-f3ae0f77f531'),
  AIMessage(content='Chicago is known for its rich history, vibrant culture, and numerous attractions. Here are some of the top things Chicago is known for:\n\n1. **Architecture**:', additional_kwargs={}, response_metadata={'role': 'assistant', 'content': 'Chicago is known for its rich history, vibrant culture, and numerous attractions. Here are some of the top things Chicago is known for:\n\n1. **Architecture**:', 'token_usage': {'prompt_tokens': 41, 'total_tokens': 73, 'completion_tokens': 32}, 'finish_reason': 'length', 'model_name': 'meta/llama-3.2-3b-instruct'}, id='lc_run--019beede-0638-73b0-8b1e-36e3ceef1eec-0', tool_calls=[], invalid_tool_calls=[], usage_metadata={'input_tokens': 41, 'output_tokens': 32, 'total_tokens': 73}, role='assistant')]}

Use `graph.stream()` to get synchronous token-by-token responses, improving user experience.

In [None]:
def stream_graph_updates(user_input: str):
    """Stream responses from the graph for real-time output."""
    for event in graph.stream({"messages": [{"role": "user", "content": user_input}]}):
        for value in event.values():
            print("Assistant:", value["messages"][-1].content)

In [None]:
stream_graph_updates("what is portland known for?")

## Structured Output with Pydantic

Applications often need LLM responses in parseable formats (e.g., JSON) for downstream processing. NVIDIA NIM supports structured generation using guided JSON schemas. We use Pydantic's `BaseModel` to define the expected output structure. The `Literal` type restricts the output to specific values.

In [10]:
from pydantic import BaseModel
from typing import Literal

class UserIntent(BaseModel):
    """The user's current intent in the conversation"""
    intent: Literal["naruto", "bleach"]

Reference: [NIM Structured Generation Docs](https://docs.nvidia.com/nim/large-language-models/latest/structured-generation.html)

In [11]:
llm_structured = init_chat_model(model=MODEL_ID, model_provider="nvidia").with_structured_output(UserIntent, strict=True)

In [None]:
# llm = ChatNVIDIA(base_url="http://0.0.0.0:{}/v1".format(CONTAINER_PORT), model=MODEL_ID).with_structured_output(UserIntent, strict=True)

Use `.with_structured_output()` to enforce the Pydantic schema on LLM responses.

In [12]:
# Test: Classify user intent based on anime question
res = llm_structured.invoke([
    {'role':'system','content':'You are an anime encyclopedia. Classify if the user is asking a question on naruto or bleach.'},
    {'role':'user','content':'who is sasuke?'}
])

In [13]:
print(f'intent: {res}')

intent: intent='naruto'


## Memory

AI applications need memory to share context across multiple interactions.

In LangGraph, you can add two types of memory:
1) Short term memory (thread-level persistence) - this enables agents to track multi-turn conversations.
2) Long term memory - use this to store user-specific or application-specific data across conversations.

We will only utilise short term memory in this tutorial & challenge.

In [None]:
from langgraph.checkpoint.memory import InMemorySaver  
from langgraph.graph import StateGraph
import json
from langchain_core.messages import convert_to_openai_messages

checkpointer = InMemorySaver()  

graph = graph_builder.compile(checkpointer=checkpointer)  

res = graph.invoke(
    {"messages": [{"role": "user", "content": "what is cuda?"}]},
    {"configurable": {"thread_id": "1"}},
    
)

In [None]:
json.dumps(convert_to_openai_messages(res['messages']))

Expected output:

```json
[{"role": "user", "content": "what is cuda?"}, {"role": "assistant", "content": "CUDA (Parallel Computation Engine) is a parallel computing platform and programming model developed by NVIDIA. It allows developers to harness the power of multiple graphics processing units ("}]
```

In [22]:
res = graph.invoke(
    {"messages": [{"role": "user", "content": "what was my previous question?"}]},
    {"configurable": {"thread_id": "1"}},  
)

In [None]:
json.dumps(convert_to_openai_messages(res['messages']))

Expected output:

```json
[{"role": "user", "content": "what is cuda?"}, {"role": "assistant", "content": "CUDA (Parallel Computation Engine) is a parallel computing platform and programming model developed by NVIDIA. It allows developers to harness the power of multiple graphics processing units ("}, {"role": "user", "content": "what was my previous question?"}, {"role": "assistant", "content": "Your previous question was: \\"what is cuda?\\""}]
```

By running `graph.invoke` with the same thread id, i.e. ` {"configurable": {"thread_id": "1"}}`, the graph keeps track of previous conversations and is able to utilise its history to continue the conversation

## Interrupts

Interrupts allow you to pause graph execution at specific points and wait for external input before continuing.  
This enables human-in-the-loop patterns where you need external input to proceed.  
When an interrupt is triggered, LangGraph saves the graph state using its persistence layer and waits indefinitely until you resume execution.

In [None]:
from langgraph.checkpoint.memory import InMemorySaver
from typing import TypedDict

from langgraph.graph import StateGraph, START, END
from langgraph.types import Command, interrupt

class FormState(TypedDict):
    age: int | None

def dummy_start_node(state: FormState):
    print('start node')

def get_age_node(state: FormState):
    prompt = "What is your age?"

    while True:
        answer = interrupt(prompt)  # payload surfaces in result["__interrupt__"]

        if isinstance(answer, int) and answer > 0:
            return {"age": answer}

        prompt = f"'{answer}' is not a valid age. Please enter a positive number."

memory = InMemorySaver()

builder = StateGraph(FormState)
builder.add_node("dummy_start_node",dummy_start_node)
builder.add_node("collect_age", get_age_node)
builder.add_edge(START,"dummy_start_node")
builder.add_edge("dummy_start_node","collect_age")
builder.add_edge("collect_age", END)

graph = builder.compile(checkpointer=memory)

config = {"configurable": {"thread_id": "form-1"}}
first = graph.invoke({"age": None}, config=config)
print(first["__interrupt__"])  # -> [Interrupt(value='What is your age?', ...)]

# Provide invalid data; the node re-prompts
retry = graph.invoke(Command(resume="thirty"), config=config)
print(retry["__interrupt__"])  # -> [Interrupt(value="'thirty' is not a valid age...", ...)]

# Provide valid data; loop exits and state updates
final = graph.invoke(Command(resume=30), config=config)
print(final["age"])  # -> 30

Expected output:

```
start node.  
[Interrupt(value='What is your age?', id='93589a0b323f03eeaa19f89000f5216c')].  
[Interrupt(value="'thirty' is not a valid age. Please enter a positive number.", id='93589a0b323f03eeaa19f89000f5216c')]. 
30
```

The graph starts with `dummy_start_node` which prints 'start node'.

In `graph.invoke({"age": None}, config=config)`, `{"age": None}` is passed as the state in the graph. This returns an interrupt of 'What is your age?"

In `graph.invoke(Command(resume="thirty"), config=config)`, the state still retains the value of the initial invocation, i.e. `{"age": None}`; the value "thirty" in `Command(resume="thirty")` is returned to the variable 'answer' in `get_age_node`.

In `graph.invoke(Command(resume=30), config=config)`, the state does not change as above. The value '30' in `Command(resume=30)` is returned to the variable 'answer' in `get_age_node` and this is returned as the final value without an interrupt.

<b>It is important to note that when running `graph.invoke(Command(resume=30), config=config)`, the node that raised an interrupt is rerun entirely; thus everything in the function `get_age_node` gets rerun from `prompt = "What is your age?"` each time.</b>

## Agent Skills

Agent skills are sets of instructions, scripts, and resources that agents can discover and load dynamically to perform better at specific tasks.  
At its core, a skill is a folder containing a SKILL.md file. At a minimum, the skill.md file should contain 'name' and 'description' metadata fields. It can also contain instructions specifying the capabilities of the skill.

Skills use progressive disclosure to manage context efficiently.
* Discovery: At startup, agents load only the name and description of each available skill, just enough to know when it might be relevant.
* Activation: When a task matches a skill’s description, the agent reads the instructions of the skill.
* Execution: The agent follows the instructions found in the skill.

Compared to the MCP protocol where the entire input schema has to be part of the agent's context right from the beginning, only the name and description of each skill is fed into the agent's context at the start and instructions are only loaded on a as needed basis.

Skills can also optionally include scripts, references and assets. We'll skip these for the purpose of this tutorial & challenge.

### Skills Implementation

Create the `skills` folder. This will serve as the directory for all skills. We will only work with 1 skill for this tutorial - `sales-analytics`

In [26]:
!mkdir -p skills
!mkdir -p skills/sales-analytics

Following the [standard specification for skills](https://agentskills.io/specification), we

1. Populate the name and description in the frontmatter. 
2. Fill up the instructions after the frontmatter. [optional]

Following the [guidance](https://agentskills.io/specification#progressive-disclosure), the name and description fields makes up approximately 100 tokens while the instructions should be less than 5000 tokens.

Standard SKILL.md template
```
---
name: skill-name
description: A description of what this skill does and when to use it.
---
<instructions in markdown here>
```


In [27]:
%%writefile skills/sales-analytics/SKILL.md
---
name: sales-analytics
description: Database schema and business logic for sales data analysis including customers, orders, and revenue.
---
# Sales Analytics Schema

## Tables

### customers
- customer_id (PRIMARY KEY)
- name
- email
- signup_date
- status (active/inactive)
- customer_tier (bronze/silver/gold/platinum)

### orders
- order_id (PRIMARY KEY)
- customer_id (FOREIGN KEY -> customers)
- order_date
- status (pending/completed/cancelled/refunded)
- total_amount
- sales_region (north/south/east/west)

### order_items
- item_id (PRIMARY KEY)
- order_id (FOREIGN KEY -> orders)
- product_id
- quantity
- unit_price
- discount_percent

## Business Logic

**Active customers**: status = 'active' AND signup_date <= CURRENT_DATE - INTERVAL '90 days'

**Revenue calculation**: Only count orders with status = 'completed'. Use total_amount from orders table, which already accounts for discounts.

**Customer lifetime value (CLV)**: Sum of all completed order amounts for a customer.

**High-value orders**: Orders with total_amount > 1000

## Example Query

-- Get top 10 customers by revenue in the last quarter
SELECT
    c.customer_id,
    c.name,
    c.customer_tier,
    SUM(o.total_amount) as total_revenue
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'completed'
  AND o.order_date >= CURRENT_DATE - INTERVAL '3 months'
GROUP BY c.customer_id, c.name, c.customer_tier
ORDER BY total_revenue DESC
LIMIT 10

Writing skills/sales-analytics/SKILL.md


Import the necessary libraries.  `skills_ref` includes the helper functions that lists, validates and parses agent skills. This is based on the official [repo](https://github.com/agentskills/agentskills/tree/main) released by Anthropic.

In [None]:
from typing import NotRequired
from langchain.tools import tool
from langchain.agents import create_agent
from langchain.agents.middleware import ModelRequest, ModelResponse, AgentMiddleware, AgentState
from langchain.messages import SystemMessage
from typing import Callable
from pathlib import Path
from skills_ref.utils import list_skills
from skills_ref.parser import read_instruction
from skills_ref.models import SkillProperties

Define the state used by agents to store skills metadata 

In [52]:
class SkillsState(AgentState):
    """State for the skills middleware."""

    skills_metadata: NotRequired[list[SkillProperties]]
    """List of loaded skill metadata (name, description)."""

Create the skill loading tool that loads the instructions from a skill's SKILL.md file.

In [None]:
# Create skill loading tool
@tool
def load_skill(skills_dir: Path,skill_name: str) -> str:
    """Load the full instructions of a skill into the agent's context.

    Use this when you need detailed information about how to handle a specific
    type of request. This will provide you with comprehensive instructions,
    policies, and guidelines for the skill area.

    Args:
        skill_name: The name of the skill to load (e.g., "qna agent")
    """
    content = read_instruction(skills_dir / skill_name)
    if content:
        return f"Loaded skill: {skill_name}\n\ncontent"
    else:
        skills = list_skills(skills_dir)
        available = ", ".join(skills.name for s in skills)
        return f"Skill '{skill_name}' not found. Available skills: {available}"



Create custom middleware that injects skill descriptions into the system prompt.  
The full list of properties and functions that can be implemented in the Agent Middleware interface can be found [here]((https://github.com/langchain-ai/langchain/blob/c930062f69bbf72d0147db2e2db1940777966ffe/libs/langchain_v1/langchain/agents/middleware/types.py#L343-L756)).  
We are only interested in `state_schema`,`tools`, `before_agent` and `wrap_model_call` for the purpose of this tutorial & challenge.  
We specify `tools = [load_skill]` to allow the agent to utilize the `load_skill` tool to read the instructions of the skills.  
In `before_agent`, we retrieve the metadata(name, description) of the skills dynamically from the system directory and update the agent's `state_schema`. Refer to [utils.py](./skills_ref/utils.py) for more details.  
In `wrap_model_call`, we retrieve the skills metadata from the agent's state, build the skills addendum and append it to the system prompt.

In [None]:
# Create skill middleware
class SkillMiddleware(AgentMiddleware):
    """Middleware that injects skill descriptions into the system prompt."""

    state_schema = SkillsState

    # Register the load_skill tool as a class variable
    tools = [load_skill]

    def __init__(self,skills_dir):
        self.skills_dir = skills_dir
    
    def before_agent(self, state:SkillsState, runtime):
        skills = list_skills(self.skills_dir)
        return SkillsState(skills_metadata=skills)

    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelResponse:
        """Sync: Inject skill descriptions into system prompt."""

        skills = request.state.get("skills_metadata", [])
        skills_list = []
        for skill in skills:
            skills_list.append(
                f"- **{self.skills_dir / skill.name}**: {skill.description}"
            )
        self.skills_prompt = "\n".join(skills_list)

        # Build the skills addendum
        skills_addendum = (
            f"\n\n## Available Skills\n\n{self.skills_prompt}\n\n"
            "Use the load_skill tool when you need detailed information "
            "about handling a specific type of request."
        )

        # Append to system message content blocks
        new_content = list(request.system_message.content_blocks) + [
            {"type": "text", "text": skills_addendum}
        ]
        new_system_message = SystemMessage(content=new_content)
        modified_request = request.override(system_message=new_system_message)
        return handler(modified_request)

In [34]:
from langchain.chat_models import init_chat_model
# model_id = "deepseek-ai/deepseek-v3.2"
model_id = 'moonshotai/kimi-k2-thinking'
nvidia_model = init_chat_model(model=model_id,base_url="https://integrate.api.nvidia.com/v1",model_provider="nvidia")

In [35]:
from pydantic import BaseModel, Field
class SQLOutput(BaseModel):
    sql: str = Field(description="runnable SQL query")

In [36]:
def create_sql_agent(skills_dir,debug=False):
    # Create the agent with skill support
    agent = create_agent(
        nvidia_model,
        system_prompt=(
            "You are a SQL query assistant that generates runnable SQL query for a music database."
        ),
        middleware=[SkillMiddleware(skills_dir)],
        # checkpointer=InMemorySaver(),
        response_format=SQLOutput,
        debug=debug
    )
    return agent

In [40]:
skills_dir = Path.cwd().resolve() / 'skills'
agent = create_sql_agent(skills_dir,debug=True)

In [None]:
result = agent.invoke(  
    {
        "messages": [
            {
                "role": "user",
                "content": (
                    "Write a SQL query to find all customers "
                    "who made orders over $1000 in the last month"
                ),
            }
        ]
    }
)

# Print the conversation
for message in result["messages"]:
    if hasattr(message, 'pretty_print'):
        message.pretty_print()
    else:
        print(f"{message.type}: {message.content}")

Expected output:

```
================================ Human Message =================================

Write a SQL query to find all customers who made orders over $1000 in the last month
================================== Ai Message ==================================
Tool Calls:
  load_skill (functions.load_skill:0)
 Call ID: functions.load_skill:0
  Args:
    skills_dir: <root_dir>/agentic-ai-bootcamp/tutorial/jupyter_notebook/skills
    skill_name: sales-analytics
================================= Tool Message =================================
Name: load_skill

Loaded skill: sales-analytics

content
================================== Ai Message ==================================
Tool Calls:
  SQLOutput (functions.SQLOutput:1)
 Call ID: functions.SQLOutput:1
  Args:
    sql: SELECT DISTINCT c.customer_id, c.name, c.email, c.phone, o.order_date, o.total_amount, o.order_id
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
  AND o.order_date < DATE_TRUNC('month', CURRENT_DATE)
  AND o.total_amount > 1000
ORDER BY o.total_amount DESC, c.customer_id;
================================= Tool Message =================================
Name: SQLOutput

Returning structured response: sql="SELECT DISTINCT c.customer_id, c.name, c.email, c.phone, o.order_date, o.total_amount, o.order_id\nFROM customers c\nJOIN orders o ON c.customer_id = o.customer_id\nWHERE o.order_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')\n  AND o.order_date < DATE_TRUNC('month', CURRENT_DATE)\n  AND o.total_amount > 1000\nORDER BY o.total_amount DESC, c.customer_id;"
```

In [None]:
print(result['structured_response'].sql)

Expected output:

```
SELECT DISTINCT c.customer_id, c.name, c.email, c.phone, o.order_date, o.total_amount, o.order_id
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
  AND o.order_date < DATE_TRUNC('month', CURRENT_DATE)
  AND o.total_amount > 1000
ORDER BY o.total_amount DESC, c.customer_id;
```

## Links and Resources

- [LangGraph repo](https://github.com/langchain-ai/langgraph)
- [LangGraph short term memory](https://docs.langchain.com/oss/python/langgraph/add-memory#add-short-term-memory)
- [LangGraph Interrupts](https://docs.langchain.com/oss/python/langgraph/interrupts)
- [LangChain Agent Skills](https://docs.langchain.com/oss/python/langchain/multi-agent/skills-sql-assistant)
- [Agent skills open protocol](https://agentskills.io/home)
- [LangChain NVIDIA](https://github.com/langchain-ai/langchain-nvidia)

---

## Licensing

Copyright © 2025 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.

<p> <center> <a href="../start_here.ipynb.ipynb">Home Page</a> </center> </p>

<div>
    <span style="float: left; width: 33%; text-align: left;"><a href="03_low_level_mcp.ipynb">Previous Notebook</a></span>
    <span style="float: left; width: 34%; text-align: center;">
        <a href="01_inference_endpoint.ipynb">1</a>
        <a href="02_introduction_mcp.ipynb">2</a>
        <a href="03_low_level_mcp.ipynb">3</a>
        <a >4</a>
        <a href="05_agent_skills.ipynb">5</a>
        <a href="06_challenge.ipynb">6</a>
    </span>
    <span style="float: left; width: 33%; text-align: right;"><a href="05_challenge.ipynb">Next Notebook</a></span>
</div>