# NPS Agent with MLflow Tracing & Agent-as-a-Judge

Query the National Parks Service using LlamaStack + MCP, with MLflow tracing and automated evaluation.

**Prerequisites:**
- LlamaStack server on `localhost:8321`
- NPS MCP server on `localhost:3005`
- `OPENAI_API_KEY` in environment

In [1]:
import os
from dotenv import load_dotenv

# Load .env from parent directory (agents_tracing-eval_mlflow/.env)
env_path = os.path.join(os.path.dirname(os.getcwd()), ".env")
load_dotenv(env_path)

import mlflow
from mlflow.entities import SpanType, AssessmentSource, AssessmentSourceType
from mlflow.genai.judges import make_judge
from llama_stack_client import LlamaStackClient
from typing import Literal


  from .autonotebook import tqdm as notebook_tqdm


## Configuration

In [2]:
# Configuration
LLAMA_STACK_URL = "http://localhost:8321/"
NPS_MCP_URL = "http://localhost:3005/sse/"
MODEL_ID = "openai/gpt-4o"
JUDGE_MODEL = "openai:/gpt-4o"

In [3]:

db_path = os.path.join(os.getcwd(), "mlflow.db")
mlflow.set_tracking_uri(f"sqlite:///{db_path}")
mlflow.set_experiment("nps-agent")
print(f"MLflow database: {db_path}")

2026/01/29 16:41:17 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.schemas
2026/01/29 16:41:17 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.tables
2026/01/29 16:41:17 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.types
2026/01/29 16:41:17 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.constraints
2026/01/29 16:41:17 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.defaults
2026/01/29 16:41:17 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.comments
2026/01/29 16:41:17 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2026/01/29 16:41:17 INFO mlflow.store.db.utils: Updating database tables
2026/01/29 16:41:17 INFO alembic.runtime.migration: Context impl SQLiteImpl.
2026/01/29 16:41:17 INFO alembic.runtime.migration: Will assume non-transactional DDL.
2026/01/29 16:41:17 INFO alembic.runtime.migration: Running upgrade  -> 451aebb31d03, add metric step
2026/01/29 16:4

MLflow database: /Users/nnarendr/Documents/Repos/agents/agents_tracing-eval_mlflow/nps_agent/mlflow.db


## Agent Function

Queries NPS via LlamaStack with MCP tools attached. The `@mlflow.trace` decorator captures the execution.

In [4]:
@mlflow.trace(name="query_nps", span_type=SpanType.AGENT)
def query_nps(prompt: str, model: str = MODEL_ID) -> str:
    """Query the National Parks Service agent."""
    client = LlamaStackClient(base_url=LLAMA_STACK_URL)
    
    with mlflow.start_span(name="mcp_tool_call", span_type=SpanType.LLM) as span:
        span.set_inputs({"model": model, "prompt": prompt})
        response = client.responses.create(
            model=model,
            input=prompt,
            tools=[{"type": "mcp", "server_url": NPS_MCP_URL, "server_label": "NPS tools"}]
        )
        span.set_outputs({"response_id": response.id, "status": response.status})
    
    # Extract text response
    for output in response.output:
        if output.type in ("text", "message") and hasattr(output, 'content') and output.content:
            return output.content[0].text
    return ""

## Agent-as-a-Judge

An Agent that evaluates the agent's trace after execution. Instead of just looking at inputs/outputs, it uses tools to inspect the full execution:
- What spans were created
- What tools were called
- How long each step took

The `{{ trace }}` in the instructions tells MLflow to give the judge these inspection tools.

In [5]:
# Agent-as-a-Judge scorer
nps_judge = make_judge(
    name="nps_agent_evaluator",
    instructions=(
        "Evaluate the NPS agent's performance in {{ trace }}.\n\n"
        "Check for:\n"
        "1. Response Quality: Did the agent correctly identify parks and provide accurate information?\n"
        "2. Tool Usage: Were the correct NPS MCP tools used (search_parks, get_park_events, etc.)?\n"
        "3. Completeness: Did the agent answer all parts of the user's question?\n\n"
        "Rate as: 'good', 'acceptable', or 'poor'"
    ),
    feedback_value_type=Literal["good", "acceptable", "poor"],
    model=JUDGE_MODEL,
)

In [6]:
def evaluate_trace(trace):
    """Run Agent-as-a-Judge evaluation and log to MLflow."""
    feedback = nps_judge(trace=trace)
    
    trace_id = trace.info.trace_id
    mlflow.log_feedback(
        trace_id=trace_id,
        name="nps_agent_evaluation",
        value=feedback.value,
        rationale=feedback.rationale,
        source=AssessmentSource(
            source_type=AssessmentSourceType.LLM_JUDGE,
            source_id=f"agent-as-a-judge/{JUDGE_MODEL}",
        ),
    )
    
    print(f"\nEvaluation: {feedback.value}")
    print(f"Rationale: {feedback.rationale}")
    return feedback


## Run Agent & Evaluate

1. Send a query to the NPS agent
2. Get the MLflow trace from the execution
3. Pass the trace to the judge for evaluation
4. Log the feedback to MLflow (visible in Assessments panel)

In [8]:
prompt = "Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them."

result = query_nps(prompt)
print(f"Response:\n{result}")

# Evaluate the trace
trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id)
evaluate_trace(trace)


[92m16:42:11 - LiteLLM:INFO[0m: utils.py:3872 - 
LiteLLM completion() model= gpt-4o; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= gpt-4o; provider = openai


Response:
Here is some information about national parks in Rhode Island and their upcoming events:

### Blackstone River Valley National Historical Park
This park explores America's entry into the Industrial Age, powered by the Blackstone River. The historical significance of Samuel Slater's cotton spinning mill in Pawtucket, RI is a highlight.

- **Website:** [Visit the park's website](https://www.nps.gov/blrv/index.htm)
- **Location:** Primarily in Rhode Island and Massachusetts

**Upcoming Events:**
- **Revolutionary War Pension Files Transcription Event:** Engage in transcription events across different dates and locations to help transcribe records for the 250th anniversary of American independence. Events are scheduled for February 10, February 20, and March 11, 2026.
- **Old Slater Mill Tour:** Join guided tours to explore the mill that started the American Industrial Revolution.
- **Take Me Fishing:** Family-friendly fishing events at the Blackstone River State Park.
- **Nature

[92m16:42:12 - LiteLLM:INFO[0m: utils.py:1621 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m16:42:12 - LiteLLM:INFO[0m: utils.py:3872 - 
LiteLLM completion() model= gpt-4o; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= gpt-4o; provider = openai
[92m16:42:13 - LiteLLM:INFO[0m: utils.py:1621 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m16:42:13 - LiteLLM:INFO[0m: utils.py:3872 - 
LiteLLM completion() model= gpt-4o; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= gpt-4o; provider = openai
[92m16:42:14 - LiteLLM:INFO[0m: utils.py:1621 - Wrapper: Completed Call, calling success_handler
INFO:LiteLLM:Wrapper: Completed Call, calling success_handler
[92m16:42:14 - LiteLLM:INFO[0m: utils.py:3872 - 
LiteLLM completion() model= gpt-4o; provider = openai
INFO:LiteLLM:
LiteLLM completion() model= gpt-4o; provider = opena


Evaluation: acceptable
Rationale: The agent's response provides information on several national parks located in Rhode Island, including Blackstone River Valley National Historical Park, Roger Williams National Memorial, Touro Synagogue National Historic Site, and Washington-Rochambeau Revolutionary Route National Historic Trail. The response includes details about each park, such as their historical significance and location, as well as information on upcoming events for most of these parks, except for Touro Synagogue and Washington-Rochambeau, for which no events were found.

1. **Response Quality:** The agent correctly identified and provided relevant details about the parks in Rhode Island, accurately describing their characteristics and historical context. However, the response could have been strengthened by covering more parks or regional options thoroughly or confirming and clarifying the presence of events.

2. **Tool Usage:** The trace shows that the agent used the appropria

Feedback(name='nps_agent_evaluator', source=AssessmentSource(source_type='LLM_JUDGE', source_id='openai:/gpt-4o'), trace_id='tr-f2b0be1c48eb1500efd7d935ba13a61e', run_id=None, rationale='The agent\'s response provides information on several national parks located in Rhode Island, including Blackstone River Valley National Historical Park, Roger Williams National Memorial, Touro Synagogue National Historic Site, and Washington-Rochambeau Revolutionary Route National Historic Trail. The response includes details about each park, such as their historical significance and location, as well as information on upcoming events for most of these parks, except for Touro Synagogue and Washington-Rochambeau, for which no events were found.\n\n1. **Response Quality:** The agent correctly identified and provided relevant details about the parks in Rhode Island, accurately describing their characteristics and historical context. However, the response could have been strengthened by covering more park

## View Traces in MLflow UI

Start the MLflow UI to view traces and assessments:

```bash
mlflow ui --port 5001
```

Then open http://localhost:5001 in your browser.

### How to Navigate

1. **Select the Experiment** - Click on `nps-agent` in the left sidebar
2. **Go to Traces tab** - Click the "Traces" tab to see all agent executions
3. **View Trace Details** - Click on any Trace ID to open the trace detail view
   - You'll see the span hierarchy showing the agent execution (query_nps â†’ mcp_tool_call)
   - Click on individual spans to see inputs/outputs for each step
4. **View Assessments** - In the trace detail view, look for the assessments side-panel on the right
   - This shows the Agent-as-a-Judge evaluation results.