## Reference
- https://learn.deeplearning.ai/courses/building-code-agents-with-hugging-face-smolagents

## Overview: Monitoring and Evaluating your Agent
- This notebook shows how to record/monitor Agent runs using Phoenix

## Setup Tracing

In [1]:
PROJECT_NAME = "Customer-Success"

In [2]:
import os
from dotenv import load_dotenv, find_dotenv
from phoenix.otel import register
from openinference.instrumentation.smolagents import SmolagentsInstrumentor

# Setup tracing interface
tracer_provider = register(
    project_name=PROJECT_NAME,
    #endpoint= get_phoenix_endpoint() + "v1/traces"
    endpoint = os.getenv('DLAI_LOCAL_URL').format(port='6006') + "v1/traces"
)
SmolagentsInstrumentor().instrument(tracer_provider=tracer_provider)

🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: Customer-Success
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: https://s172-29-123-128p6006.lab-aws-production.deeplearning.ai/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



In [3]:
from dotenv import load_dotenv, find_dotenv
load_dotenv() # load variables from local .env file

from huggingface_hub import login

login(os.getenv('HF_API_KEY'))

In [4]:
from smolagents import HfApiModel

model=HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct", provider="together")

model([{"role": "user", "content": "Hello!"}])

ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='Hello! How can I assist you today?', tool_calls=[], raw=ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content='Hello! How can I assist you today?', tool_call_id=None, tool_calls=[]), logprobs=None, seed=11989984481295479000)], created=1760462508, id='oFB3X69-62bZhn-98e8c153ab8e28c0', model='Qwen/Qwen2.5-Coder-32B-Instruct', system_fingerprint=None, usage=ChatCompletionOutputUsage(completion_tokens=10, prompt_tokens=31, total_tokens=41, cached_tokens=0), object='chat.completion', prompt=[]))

In [5]:
# This is where you can access the display:
print(os.environ.get('DLAI_LOCAL_URL').format(port='6006'))

https://s172-29-123-128p6006.lab-aws-production.deeplearning.ai/


## Trace an agent run

In [6]:
from smolagents import HfApiModel, CodeAgent

agent = CodeAgent(model=model, tools=[])

>Note, the following line will sometimes get a timeout on the interface to the tracing package due to the networked interface. If this happens, try it again.


In [7]:
agent.run("What is the 100th Fibonacci number?")

354224848179261915075

In [8]:
# This is where you can access the display:
print(os.environ.get('DLAI_LOCAL_URL').format(port='6006'))

https://s172-29-123-128p6006.lab-aws-production.deeplearning.ai/


## Setup ice cream production system

In [9]:
from smolagents import tool
from typing import Dict

menu_prices = {"crepe nutella": 1.50, "vanilla ice cream": 2, "maple pancake": 1.}

ORDER_BOOK = {}

@tool
def place_order(quantities: Dict[str, int], session_id: int) -> None:
    """Places a pre-order of snacks.

    Args:
        quantities: a dictionary with names as keys and quantities as values
        session_id: the id for the client session
    """
    global ORDER_BOOK
    assert isinstance(quantities, dict), "Incorrect type for the input dictionary!"
    assert [key in menu_prices for key in quantities.keys()], f"All food names should be within {menu_prices.keys()}"
    ORDER_BOOK[session_id] = quantities

@tool
def get_prices(quantities: Dict[str, int]) -> str:
    """Gets price for certain quantities of ice cream.

    Args:
        quantities: a dictionary with names as keys and quantities as values
    """
    assert isinstance(quantities, dict), "Incorrect type for the input dictionary!"
    assert [key in menu_prices for key in quantities.keys()], f"All food names should be within {menu_prices.keys()}"
    total_price = sum([menu_prices[key] * value for key, value in quantities.items()])
    return (
        f"Given the current menu prices:\n{menu_prices}\nThe total price for your order would be: ${total_price}"
    )

In [10]:
order_agent = CodeAgent(
    tools=[place_order, get_prices],
    model=HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct", provider="together")
)

In [11]:
order_agent.run(
    "Could I come and collect one crepe nutella?",
    additional_args={"session_id": 192}
)

'Order placed for one crepe nutella.'

### Try multiple orders

In [12]:
client_requests = [
    ("Could I come and collect one crepe nutella?", "place_order"),
    ("What would be the price for 1 crêpe nutella + 2 pancakes?", "get_prices"),
    ("How did you start your ice-cream business?", None),
    ("What's the weather at the Louvre right now?", None),
    ("I'm not sure if I should order. I want a vanilla ice cream. but if it's more expensive than $1, I don't want it. If it's below, I'll order it, please.", "place_order")
]

In [13]:
for request in client_requests:
    order_agent.run(
        request[0],
        additional_args={"session_id": 0, "menu_prices": menu_prices}
    )

In [14]:
import phoenix as px

spans = px.Client().get_spans_dataframe(project_name=PROJECT_NAME)
spans.head(20)



Unnamed: 0_level_0,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,...,attributes.llm.input_messages,attributes.llm.token_count.prompt,attributes.llm.token_count.total,attributes.input.mime_type,attributes.llm.output_messages,attributes.output.mime_type,attributes.llm.model_name,attributes.llm.token_count.completion,attributes.output.value,attributes.smolagents
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
95ad829e129e0829,FinalAnswerTool,TOOL,2418534023e07741,2025-10-18 00:26:39.803757+00:00,2025-10-18 00:26:39.803812+00:00,OK,,[],95ad829e129e0829,6568e03bb59b1fb6164c3dca58a5a568,...,,,,,,,,,,
2942aa75af120501,HfApiModel.__call__,LLM,2418534023e07741,2025-10-18 00:26:39.761749+00:00,2025-10-18 00:26:39.776183+00:00,OK,,[],2942aa75af120501,6568e03bb59b1fb6164c3dca58a5a568,...,"[{'message.role': 'system', 'message.content':...",2463.0,2512.0,application/json,"[{'message.role': 'assistant', 'message.conten...",application/json,Qwen/Qwen2.5-Coder-32B-Instruct,49.0,"{""role"": ""assistant"", ""content"": ""Thought: The...",
2418534023e07741,Step 2,CHAIN,a05f81fe82415e92,2025-10-18 00:26:39.761382+00:00,2025-10-18 00:26:39.842943+00:00,OK,,[],2418534023e07741,6568e03bb59b1fb6164c3dca58a5a568,...,,,,,,,,,Execution logs:\nLast output from code snippet...,
6b97beb345363fb1,HfApiModel.__call__,LLM,cbd0953e3ac959be,2025-10-18 00:26:39.678072+00:00,2025-10-18 00:26:39.691814+00:00,OK,,[],6b97beb345363fb1,6568e03bb59b1fb6164c3dca58a5a568,...,"[{'message.role': 'system', 'message.content':...",2260.0,2349.0,application/json,"[{'message.role': 'assistant', 'message.conten...",application/json,Qwen/Qwen2.5-Coder-32B-Instruct,89.0,"{""role"": ""assistant"", ""content"": ""Thought: I n...",
cbd0953e3ac959be,Step 1,CHAIN,a05f81fe82415e92,2025-10-18 00:26:39.677782+00:00,2025-10-18 00:26:39.727847+00:00,OK,,[],cbd0953e3ac959be,6568e03bb59b1fb6164c3dca58a5a568,...,,,,,,,,,Execution logs:\nThe price of vanilla ice crea...,
a05f81fe82415e92,CodeAgent.run,AGENT,,2025-10-18 00:26:39.671107+00:00,2025-10-18 00:26:39.927617+00:00,OK,,[],a05f81fe82415e92,6568e03bb59b1fb6164c3dca58a5a568,...,,4723.0,4861.0,,,,,138.0,I will not order the vanilla ice cream.,"{'additional_args': '{""session_id"": 0, ""menu_p..."
fa492173f735a58f,FinalAnswerTool,TOOL,e104d7786183e481,2025-10-18 00:26:39.562181+00:00,2025-10-18 00:26:39.562235+00:00,OK,,[],fa492173f735a58f,7c6fed9c920c246072e17a5de83cbe58,...,,,,,,,,,,
427768a1403952c9,HfApiModel.__call__,LLM,e104d7786183e481,2025-10-18 00:26:39.512031+00:00,2025-10-18 00:26:39.529900+00:00,OK,,[],427768a1403952c9,7c6fed9c920c246072e17a5de83cbe58,...,"[{'message.role': 'system', 'message.content':...",2771.0,2834.0,application/json,"[{'message.role': 'assistant', 'message.conten...",application/json,Qwen/Qwen2.5-Coder-32B-Instruct,63.0,"{""role"": ""assistant"", ""content"": ""Thought: The...",
e104d7786183e481,Step 3,CHAIN,5d8cb1481f3ec88d,2025-10-18 00:26:39.511644+00:00,2025-10-18 00:26:39.592119+00:00,OK,,[],e104d7786183e481,7c6fed9c920c246072e17a5de83cbe58,...,,,,,,,,,Execution logs:\nLast output from code snippet...,
c4732fbe59dfaf6e,HfApiModel.__call__,LLM,7463c8d94a7ca8ed,2025-10-18 00:26:39.419437+00:00,2025-10-18 00:26:39.433313+00:00,OK,,[],c4732fbe59dfaf6e,7c6fed9c920c246072e17a5de83cbe58,...,"[{'message.role': 'system', 'message.content':...",2442.0,2587.0,application/json,"[{'message.role': 'assistant', 'message.conten...",application/json,Qwen/Qwen2.5-Coder-32B-Instruct,145.0,"{""role"": ""assistant"", ""content"": ""Thought: Sin...",


### Add processing to extract desired information

In [15]:
import pandas as pd
import json

agents = spans[spans['span_kind'] == 'AGENT'].copy()
agents['task'] = agents['attributes.input.value'].apply(
    lambda x: json.loads(x).get('task') if isinstance(x, str) else None
)

tools = spans.loc[
    spans['span_kind'] == 'TOOL',
    ["attributes.tool.name", "attributes.input.value", "context.trace_id"]
].copy()

tools_per_task = agents[
    ["name", "start_time", "task", "context.trace_id"]
].merge(
    tools,
    on="context.trace_id",
    how="left",
)
tools_per_task.head()

Unnamed: 0,name,start_time,task,context.trace_id,attributes.tool.name,attributes.input.value
0,CodeAgent.run,2025-10-18 00:26:39.671107+00:00,I'm not sure if I should order. I want a vanil...,6568e03bb59b1fb6164c3dca58a5a568,final_answer,"{""args"": [""I will not order the vanilla ice cr..."
1,CodeAgent.run,2025-10-18 00:26:39.319057+00:00,What's the weather at the Louvre right now?,7c6fed9c920c246072e17a5de83cbe58,final_answer,"{""args"": [""Sunny""], ""sanitize_inputs_outputs"":..."
2,CodeAgent.run,2025-10-18 00:26:39.034116+00:00,How did you start your ice-cream business?,a7995db6a56cde455cb7b2550abe9bfb,final_answer,"{""args"": [""I started my ice-cream business wit..."
3,CodeAgent.run,2025-10-18 00:26:39.034116+00:00,How did you start your ice-cream business?,a7995db6a56cde455cb7b2550abe9bfb,get_prices,"{""args"": [], ""sanitize_inputs_outputs"": false,..."
4,CodeAgent.run,2025-10-18 00:26:38.751541+00:00,What would be the price for 1 crêpe nutella + ...,80e1794844b40a8cc621163d24c3297b,final_answer,"{""args"": [3.5], ""sanitize_inputs_outputs"": fal..."


### Now, compare tool calls with exected tool calls

In [16]:
def score_request(expected_tool: str, tool_calls: list):
    if expected_tool is None:
        return tool_calls == set(["final_answer"])
    else:
        return expected_tool in tool_calls

results = []
for request, expected_tool in client_requests:
    tool_calls = set(tools_per_task.loc[tools_per_task["task"] == request, "attributes.tool.name"].tolist())
    results.append(
        {
            "request": request,
            "tool_calls_performed": tool_calls,
            "is_correct": score_request(expected_tool, tool_calls)
        }
    )
pd.DataFrame(results)

Unnamed: 0,request,tool_calls_performed,is_correct
0,Could I come and collect one crepe nutella?,"{final_answer, place_order}",True
1,What would be the price for 1 crêpe nutella + ...,"{get_prices, final_answer}",True
2,How did you start your ice-cream business?,"{get_prices, final_answer}",False
3,What's the weather at the Louvre right now?,{final_answer},True
4,I'm not sure if I should order. I want a vanil...,{final_answer},False
