
# Developing Agents in Databricks

This notebook will give you a quick tour of some of the agent development features you can use in your Databricks workspace. These tools will enable you to quickly prototype, evaluate, and iterate on agents and agentic workflows in Databricks.

This notebook will cover:
- Model Serving/Foundation Models in Databricks
- MLflow Tracing
- Agent Evaluations

This guide is by no means exhaustive; for more, see the [Databricks docs](https://docs.databricks.com/aws/en/generative-ai/agent-framework/build-genai-apps).

In [0]:
%pip install -U -q openai mlflow databricks-agents
%restart_python

## Problem Setup

These examples will focus on prototyping a chat application for a delivery-focused restaurant. The application will enable users to ask questions about their outstanding orders and issue simulated API calls to update their orders or escalate their issues to a human.

### Raw Data

We will be using a small sample of **simulated delivery event data**. Each record captures an event in an order's lifecycle, from creation to delivery. Key columns include:

- **event_id:** Unique event identifier.
- **event_type:** Type of event (e.g., order_created, driver_ping, delivered).
- **ts:** Event timestamp.
- **order_id:** Unique order identifier.
- **body:** JSON string with detailed information, parsed into dedicated columns like:
  - customer_lat, customer_lon, customer_addr, items (from order_created)
  - delivered_lat, delivered_lon (from delivered)
  - progress_pct, loc_lat, loc_lon (from driver_ping)
  - eta_mins (from driver_picked_up)

In [0]:
raw_data = [
    # Completed Order - Really Late (Order ID: 15913d059bcd458aad7e567c3ca533fd)
    {"event_id": "0fb437bb36d44b6489528d426d09fdb3", "event_type": "order_created", "ts": "2025-05-15T23:59:15.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "15913d059bcd458aad7e567c3ca533fd", "sequence": 0, "body": '{"customer_lat": 41.8997961, "customer_lon": -87.6625103, "customer_addr": "7561 Main St", "items": [{"id": 101, "name": "Tuna Sashimi", "price": 7.5, "qty": 2}]}', "day": "2025-05-15"},
    {"event_id": "f8e9aa45e1264c6a80afed082b21291d", "event_type": "gk_started", "ts": "2025-05-15T23:59:38.420Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "15913d059bcd458aad7e567c3ca533fd", "sequence": 1, "body": "{}", "day": "2025-05-15"},
    {"event_id": "7dff70ccf63e4c9d95dd7c48588f3f71", "event_type": "gk_finished", "ts": "2025-05-16T00:13:49.207Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "15913d059bcd458aad7e567c3ca533fd", "sequence": 2, "body": "{}", "day": "2025-05-16"},
    {"event_id": "bf48e524d12a46e499d48a9d26a5266c", "event_type": "gk_ready", "ts": "2025-05-16T00:16:15.067Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "15913d059bcd458aad7e567c3ca533fd", "sequence": 3, "body": "{}", "day": "2025-05-16"},
    {"event_id": "426a44af92be4524a2a3012c4cbfefd2", "event_type": "driver_picked_up", "ts": "2025-05-16T00:27:42.399Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "15913d059bcd458aad7e567c3ca533fd", "sequence": 4, "body": '{"route_points": [[41.9045205, -87.6533404], [41.8997961, -87.6625103]], "eta_mins": 2.6}', "day": "2025-05-16"},
    {"event_id": "3c3ee480f01e47f3be58c99aa9692d9c", "event_type": "driver_ping", "ts": "2025-05-16T00:28:42.399Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "15913d059bcd458aad7e567c3ca533fd", "sequence": 5, "body": '{"progress_pct": 50.0, "loc_lat": 41.9030141, "loc_lon": -87.6578659}', "day": "2025-05-16"},
    {"event_id": "9eb460e121da4129b193194dd7b52994", "event_type": "delivered", "ts": "2025-05-16T00:30:21.142Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "15913d059bcd458aad7e567c3ca533fd", "sequence": 6, "body": '{"delivered_lat": 41.8997961, "delivered_lon": -87.6625103}', "day": "2025-05-16"},

    # Completed Order - On Time (Order ID: 07dac347d4e54841b24326bac267c20d)
    {"event_id": "7d274c0bc7c14a0eb17c3b334a9f0ba3", "event_type": "order_created", "ts": "2025-05-15T23:49:31.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "07dac347d4e54841b24326bac267c20d", "sequence": 0, "body": '{"customer_lat": 41.8975713, "customer_lon": -87.6868809, "customer_addr": "2038 Main St", "items": [{"id": 21, "name": "Keto Bowl", "price": 9.16, "qty": 3}]}', "day": "2025-05-15"},
    {"event_id": "8fff3c7a6c7f48d9a80a684ff3969fe9", "event_type": "gk_started", "ts": "2025-05-15T23:52:03.827Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "07dac347d4e54841b24326bac267c20d", "sequence": 1, "body": "{}", "day": "2025-05-15"},
    {"event_id": "95b11ebf00894dbab4ec8203a6c15648", "event_type": "gk_finished", "ts": "2025-05-16T00:01:58.088Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "07dac347d4e54841b24326bac267c20d", "sequence": 2, "body": "{}", "day": "2025-05-16"},
    {"event_id": "8cf1c8abfb0346ef8a50ac35fc9536f0", "event_type": "gk_ready", "ts": "2025-05-16T00:03:57.197Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "07dac347d4e54841b24326bac267c20d", "sequence": 3, "body": "{}", "day": "2025-05-16"},
    {"event_id": "647f74d8eac84f9cafd428db6c105761", "event_type": "driver_picked_up", "ts": "2025-05-16T00:11:04.227Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "07dac347d4e54841b24326bac267c20d", "sequence": 4, "body": '{"route_points": [[41.9045205, -87.6533404], [41.8975713, -87.6868809]], "eta_mins": 5.4}', "day": "2025-05-16"},
    {"event_id": "afa201852f7f42daa0c4dd4107d0751b", "event_type": "driver_ping", "ts": "2025-05-16T00:12:04.227Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "07dac347d4e54841b24326bac267c20d", "sequence": 5, "body": '{"progress_pct": 20.0, "loc_lat": 41.9034802, "loc_lon": -87.6588746}', "day": "2025-05-16"},
    {"event_id": "b4c2dd62af884839bb36f0c42b9e0abd", "event_type": "driver_ping", "ts": "2025-05-16T00:13:04.227Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "07dac347d4e54841b24326bac267c20d", "sequence": 6, "body": '{"progress_pct": 40.0, "loc_lat": 41.9033638, "loc_lon": -87.6664876}', "day": "2025-05-16"},
    {"event_id": "8315c58a333c44e8a45aa9ca27b6f277", "event_type": "driver_ping", "ts": "2025-05-16T00:14:04.227Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "07dac347d4e54841b24326bac267c20d", "sequence": 7, "body": '{"progress_pct": 60.0, "loc_lat": 41.9032523, "loc_lon": -87.6732925}', "day": "2025-05-16"},
    {"event_id": "a4a5ad8daf984257bfe2a77c849fa5ad", "event_type": "driver_ping", "ts": "2025-05-16T00:15:04.227Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "07dac347d4e54841b24326bac267c20d", "sequence": 8, "body": '{"progress_pct": 80.0, "loc_lat": 41.9030657, "loc_lon": -87.6842046}', "day": "2025-05-16"},
    {"event_id": "7ce7bace386c42a6abb9770f1df1a1b3", "event_type": "delivered", "ts": "2025-05-16T00:16:25.371Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "07dac347d4e54841b24326bac267c20d", "sequence": 9, "body": '{"delivered_lat": 41.8975713, "delivered_lon": -87.6868809}', "day": "2025-05-16"},

    # In Progress - gk_started (Order ID: 7ff80e031bb645a9b8942cd21ff4f946)
    {"event_id": "order_created_7ff80e031bb6", "event_type": "order_created", "ts": "2025-05-21T09:00:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "7ff80e031bb645a9b8942cd21ff4f946", "sequence": 0, "body": '{"customer_lat": 41.9114875, "customer_lon": -87.6495344, "customer_addr": "123 Main St", "items": [{"id": 1, "name": "Burger", "price": 12.00, "qty": 1}]}', "day": "2025-05-21"},
    {"event_id": "gk_started_7ff80e031bb6", "event_type": "gk_started", "ts": "2025-05-21T09:05:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "7ff80e031bb645a9b8942cd21ff4f946", "sequence": 1, "body": "{}", "day": "2025-05-21"},

    # In Progress - Out for Delivery (On Time) (Order ID: dec7915028bf4a5d96f6f31b7ee60b17)
    {"event_id": "order_created_dec7915028bf", "event_type": "order_created", "ts": "2025-05-21T09:10:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "dec7915028bf4a5d96f6f31b7ee60b17", "sequence": 0, "body": '{"customer_lat": 41.8867966, "customer_lon": -87.6456691, "customer_addr": "456 Oak Ave", "items": [{"id": 2, "name": "Pizza", "price": 18.50, "qty": 1}]}', "day": "2025-05-21"},
    {"event_id": "gk_started_dec7915028bf", "event_type": "gk_started", "ts": "2025-05-21T09:15:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "dec7915028bf4a5d96f6f31b7ee60b17", "sequence": 1, "body": "{}", "day": "2025-05-21"},
    {"event_id": "gk_finished_dec7915028bf", "event_type": "gk_finished", "ts": "2025-05-21T09:20:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "dec7915028bf4a5d96f6f31b7ee60b17", "sequence": 2, "body": "{}", "day": "2025-05-21"},
    {"event_id": "gk_ready_dec7915028bf", "event_type": "gk_ready", "ts": "2025-05-21T09:22:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "dec7915028bf4a5d96f6f31b7ee60b17", "sequence": 3, "body": "{}", "day": "2025-05-21"},
    {"event_id": "driver_picked_up_dec7915028bf", "event_type": "driver_picked_up", "ts": "2025-05-21T09:25:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "dec7915028bf4a5d96f6f31b7ee60b17", "sequence": 4, "body": '{"route_points": [[41.9045205, -87.6533404], [41.8867966, -87.6456691]], "eta_mins": 5.0}', "day": "2025-05-21"},
    {"event_id": "driver_ping_dec7915028bf_1", "event_type": "driver_ping", "ts": "2025-05-21T09:26:30.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "dec7915028bf4a5d96f6f31b7ee60b17", "sequence": 5, "body": '{"progress_pct": 30.0, "loc_lat": 41.8980000, "loc_lon": -87.6470000}', "day": "2025-05-21"},
    {"event_id": "driver_ping_dec7915028bf_2", "event_type": "driver_ping", "ts": "2025-05-21T09:28:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "dec7915028bf4a5d96f6f31b7ee60b17", "sequence": 6, "body": '{"progress_pct": 60.0, "loc_lat": 41.8900000, "loc_lon": -87.6460000}', "day": "2025-05-21"},

    # In Progress - Out for Delivery (Running Late) (Order ID: 34af35a043334a24b836f6dc1a6e5c2b)
    {"event_id": "order_created_34af35a04333", "event_type": "order_created", "ts": "2025-05-21T08:50:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "34af35a043334a24b836f6dc1a6e5c2b", "sequence": 0, "body": '{"customer_lat": 41.9288242, "customer_lon": -87.6658665, "customer_addr": "789 Pine Rd", "items": [{"id": 3, "name": "Sushi Set", "price": 25.00, "qty": 1}]}', "day": "2025-05-21"},
    {"event_id": "gk_started_34af35a04333", "event_type": "gk_started", "ts": "2025-05-21T08:55:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "34af35a043334a24b836f6dc1a6e5c2b", "sequence": 1, "body": "{}", "day": "2025-05-21"},
    {"event_id": "gk_finished_34af35a04333", "event_type": "gk_finished", "ts": "2025-05-21T09:00:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "34af35a043334a24b836f6dc1a6e5c2b", "sequence": 2, "body": "{}", "day": "2025-05-21"},
    {"event_id": "gk_ready_34af35a04333", "event_type": "gk_ready", "ts": "2025-05-21T09:02:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "34af35a043334a24b836f6dc1a6e5c2b", "sequence": 3, "body": "{}", "day": "2025-05-21"},
    {"event_id": "driver_picked_up_34af35a04333", "event_type": "driver_picked_up", "ts": "2025-05-21T09:05:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "34af35a043334a24b836f6dc1a6e5c2b", "sequence": 4, "body": '{"route_points": [[41.9045205, -87.6533404], [41.9288242, -87.6658665]], "eta_mins": 5.0}', "day": "2025-05-21"},
    {"event_id": "driver_ping_34af35a04333_1", "event_type": "driver_ping", "ts": "2025-05-21T09:06:30.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "34af35a043334a24b836f6dc1a6e5c2b", "sequence": 5, "body": '{"progress_pct": 20.0, "loc_lat": 41.9100000, "loc_lon": -87.6600000}', "day": "2025-05-21"},
    {"event_id": "driver_ping_34af35a04333_2", "event_type": "driver_ping", "ts": "2025-05-21T09:08:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "34af35a043334a24b836f6dc1a6e5c2b", "sequence": 6, "body": '{"progress_pct": 40.0, "loc_lat": 41.9180000, "loc_lon": -87.6620000}', "day": "2025-05-21"},

    # In Progress - Order Created (Order ID: e5ab1bc493ea4f7a98accd29301d713b)
    {"event_id": "order_created_e5ab1bc493ea", "event_type": "order_created", "ts": "2025-05-21T09:25:00.000Z", "gk_id": "6c47adae5f884d7aa0d7bd500517fe94", "order_id": "e5ab1bc493ea4f7a98accd29301d713b", "sequence": 0, "body": '{"customer_lat": 41.8990524, "customer_lon": -87.6478849, "customer_addr": "999 Maple Dr", "items": [{"id": 4, "name": "Salad", "price": 9.00, "qty": 2}]}', "day": "2025-05-21"},
]


## Data Preprocessing
For almost any use case, you will want to do some preprocessing and cleaning to make the data suitable for your application. In this simple example, we really only need to current state of each order, not the whole sequence of events. Let's create a table showing the key details and current state of each order.

In [0]:
import pandas as pd
import json

df = pd.DataFrame(raw_data)

# Parse the 'body' column (it's a JSON string)
df['body_parsed'] = df['body'].apply(json.loads)

# Sort by order_id and sequence to easily get the latest event
df = df.sort_values(['order_id', 'sequence'])

# Group by order_id and aggregate
condensed_df = df.groupby('order_id').agg(
    # Extract order details from the first event (order_created)
    customer_lat=('body_parsed', lambda x: x.iloc[0].get('customer_lat')),
    customer_lon=('body_parsed', lambda x: x.iloc[0].get('customer_lon')),
    customer_addr=('body_parsed', lambda x: x.iloc[0].get('customer_addr')),
    items=('body_parsed', lambda x: x.iloc[0].get('items')),

    # Extract eta_mins from driver_picked_up event if it exists
    eta_mins=('body_parsed', lambda x: next((item.get('eta_mins') for item in x if 'eta_mins' in item), None)),

    # Extract the latest ping event's progress_pct
    progress_pct=('body_parsed', lambda x: next((item.get('progress_pct') for item in reversed(list(x)) if isinstance(item, dict) and 'progress_pct' in item), None)),

    # Get the latest status (event_type) and its timestamp
    latest_status=('event_type', 'last'),
    latest_ts=('ts', 'last')
).reset_index()

display(condensed_df)

Now we have some sample data we can easily access and use with AI models and tools. 

## Using AI Models from Databricks Model Serving

You have access to a range of state of the art models via Databricks model serving. You can use them in Databricks notebooks via the OpenAI Python SDK. Here we will use the `databricks-meta-llama-3-3-70b-instruct` model. You do not need to get or configure an API key though you will need to set up a Databricks token.

Note that, to use the Anthropic Claude models accessible via model serving, you will need to add your credit card information to your Databricks account using the "manage trial" link in the top right corner of the Databricks UI. You will not be charged for usage until after you consume your trial credits or your trial credits expire.

See the [Agent Library Integrations](2025_agent_hackathon_resources/databricks_agent_library_integrations.ipynb) notebook for more guidance on which models to use, compatibility notes, etc.

In [0]:
from openai import OpenAI
from databricks.sdk import WorkspaceClient

# Create a temporary token
w = WorkspaceClient()
tmp_token = w.tokens.create(lifetime_seconds=2400).token_value



client = OpenAI(
    api_key=tmp_token,
    base_url=f"{w.config.host}/serving-endpoints",
)

chat_completion = client.chat.completions.create(
    messages=[
        {"role": "system", "content": "You are an AI assistant"},
        {"role": "user", "content": "Tell me about Large Language Models"},
    ],
    model="databricks-meta-llama-3-3-70b-instruct",
    max_tokens=256,
)

Note that you can also configure [external models](https://docs.databricks.com/aws/en/generative-ai/external-models/) if you want to use models from other providers with your own API keys.

## Tracing for Agent Observability

[MLflow tracing](https://docs.databricks.com/aws/en/mlflow/mlflow-tracing) gives you complete observability into your agentic workflows, captuting execution information on each workflow step, including LLM responses, tool calls, retrieval steps, and more.

You can enable tracing for many common GenAI providers and agent frameworks with one line of code: `mlflow.<library>.autolog()`. In this case, as we are using the OpenAI SDK (even though we are using a Meta Llama model), we will use `mlflow.openai.autolog()`

In [0]:
import mlflow

mlflow.openai.autolog()

chat_completion = client.chat.completions.create(
    messages=[
        {"role": "system", "content": "You are an AI assistant"},
        {"role": "user", "content": "Tell me about Large Language Models"},
    ],
    model="databricks-meta-llama-3-3-70b-instruct",
    max_tokens=256,
)


You can view these traces right in the notebook. They will also be collected for downstream review and analysis; you can find them in the Experiments tab.

## Add a Tool Call

Now let's create a tool for checking on the order status. In a real-world application, such a customer support chat would take place in the context of user logged into the app, so the model would already have access to the user's order and would not provide information about other users.

In [0]:
tools_list = [
    {
        "type": "function",
        "function": {
            "name": "get_user_order_status",
            "description": "Retrieves the current status and details of all orders for a specified user ID. Use this tool when a user asks about their orders.",
            "parameters": {
                "type": "object",
                "properties": {
                    "user_id": {
                        "type": "string",
                        "description": "The unique identifier of the user whose orders are to be fetched (e.g., 'user_alice', 'user_bob').",
                    }
                },
                "required": ["user_id"],
            },
        },
    }
]

### Define a tool-calling loop

Here we will define a simple loop that takes the user's question, calls the tool if needed, reviews the results of the tool call, and responds to the user.

While we implement this manually, you can also use agent frameworks such as LlamaIndex, LangGraph, SmolAgents, or CrewAI.

First, let's map the orders to sample user IDs. We will use these to simulate being logged in to a delivery app. The chatbot will only be able to access information about the selected/"logged-in" user.

In [0]:
from ipywidgets import widgets

user_order_mapping = {
    "user_alice": ["07dac347d4e54841b24326bac267c20d", "15913d059bcd458aad7e567c3ca533fd"],
    "user_bob": ["34af35a043334a24b836f6dc1a6e5c2b"],
    "user_charlie": ["7ff80e031bb645a9b8942cd21ff4f946", "dec7915028bf4a5d96f6f31b7ee60b17", "e5ab1bc493ea4f7a98accd29301d713b"]
}

user_dropdown = widgets.Dropdown(
    options=list(user_order_mapping.keys()),
    description='Select User:',
    disabled=False,
)

def on_user_change(change):
    global selected_user_id_global
    if change['type'] == 'change' and change['name'] == 'value':
        selected_user_id_global = change['new']

user_dropdown.observe(on_user_change)
display(user_dropdown)



Next, we'll write the `order_status` python function corresponding to the tool call spec we defined above. Note that we are using the `@mlflow.trace` decorator to make sure this function is traced.

In [0]:
@mlflow.trace(span_type="TOOL")
def order_status(user_id):
    user_orders = user_order_mapping.get(user_id)

    if user_orders:
        order_status = []
        for order_id in user_orders:
            order = condensed_df[condensed_df['order_id'] == order_id].iloc[0]
            order_status.append(order.to_dict())
        return order_status
    else:
        return "No orders found for the specified user."

Lastly, we define the tool calling loop itself. This loop takes a user's query and the user ID as arguments and returns the assistant's final answer.

In [0]:
def run_tool_loop(user_query, user_context_id):
    # Initialize messages ONCE with system and initial user query
    messages = [
        {"role": "system", "content": f"You are an AI assistant for user {user_context_id}."},
        {"role": "user", "content": user_query},
    ]
    with mlflow.start_span(span_type="CHAIN") as span:
        while True:
            response = client.chat.completions.create(
                messages=messages,
                model="databricks-meta-llama-3-3-70b-instruct", # Your specified model
                max_tokens=256, 
                tools=tools_list,
                tool_choice="auto",
            )
            assistant_message = response.choices[0].message
            messages.append(assistant_message) # Add assistant's full response

            if not assistant_message.tool_calls:
                break # No tool call, LLM gave final text answer

            tool_call = assistant_message.tool_calls[0]
            
            try:
                arguments = json.loads(tool_call.function.arguments)
                user_id_for_tool = arguments.get("user_id", user_context_id)
                tool_result_str = str(order_status(user_id_for_tool))
            except Exception as e: # Catch-all for any error during arg parsing or tool execution
                tool_result_str = json.dumps({"error": f"Tool execution failed: {str(e)}"})

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id, # Essential
                "name": tool_call.function.name, # Essential
                "content": tool_result_str, # Must be a string
            })
            # Loop continues: sends messages including tool result back to LLM for summarization

    return assistant_message.content

In [0]:
response = run_tool_loop("What is the status of my order", user_dropdown.value)

## Evaluate our Agent

We can use the `databricks-agents` sdk along with MLflow Evaluations to evaluate our agents. We can define a set of natural language guidelines to specify how we want our agent system to respond.

In [0]:
global_guidelines = {
    "brevity_and_minimalism": [
        "When a user asks for a general order status, the response must provide only a very brief, high-level summary (e.g., 'Your order is on its way,' 'It has been delivered,' or 'There appears to be an issue with your order'). The response must avoid volunteering additional specific details like order IDs, item lists, precise timestamps, ETA, or progress percentages unless the user explicitly asks for these further details."
    ],
    "clarity_and_friendliness": [
        "The agent's responses must be phrased in clear, simple language that is easy for any user to understand, and it must consistently maintain a polite, empathetic, and friendly tone throughout the interaction."
    ],
    "proactive_problem_guidance": [
        "If an order status retrieved by the tool indicates a clear problem, delay, or failure (e.g., 'delivery_attempt_failed', 'on_hold_pending_user_action', 'payment_declined'), the agent's response must include at least one concrete, actionable suggestion or piece of guidance for the user (e.g., 'You might want to contact support at [number/email],' or 'You can try to reschedule by visiting [link/platform section]'). Simply stating the problem without offering a path forward is insufficient."
    ],
    "scope_and_relevance_management": [
        "The agent must primarily address inquiries related to order status, order details, and closely related logistics. If the user's request is clearly and significantly outside this defined scope (e.g., asking for weather forecasts, general knowledge, or to perform unrelated tasks), the response must politely state its functional limitations and avoid attempting to answer the unrelated query."
    ],
    "honesty_about_capabilities": [
        "If the user requests the agent to perform an action beyond its current capabilities, the response must clearly and truthfully state that it cannot perform such actions. The agent must not imply or falsely claim it can take actions for which it has no tools or authorization. At present, the agent can only provide information about order status. It cannot take any actions on behalf of the user."
    ],
}

In [0]:
eval_set = [
    {
        "request": {"user": "user_bob", "user_query": "What is the status of my order"},
        "response": response,
    }
]

mlflow.evaluate(
    data=eval_set,
    model_type="databricks-agent",
    evaluator_config={"databricks-agent": {"global_guidelines": global_guidelines}},
)

### Generate an Evaluation Dataset
This is much more useful with a larger evaluation set. Let's generate one.

In [0]:
prompts = [{"user": "user_bob", "user_query": "What is the status of my order"},
           {"user": "user_bob", "user_query": "I haven't received my order yet!!"},
           {"user": "user_bob", "user_query": "What is the status of Alice's order?"},
           {"user": "user_bob", "user_query": "I need to cancel my order! Please cancel my order."},
           {"user": "user_alice", "user_query": "What is the status of my order?"},
           {"user": "user_alice", "user_query": "I accidentally ordered twice! Please help."},
           {"user": "user_charlie", "user_query": "Hi! I really liked what I ordered last time but I don't remember what it was, can you tell me?"},
           {"user": "user_charlie", "user_query": "Please create a hello world SQL query for me."}]

In [0]:
eval_set = [{"request": prompt, "response": run_tool_loop(prompt["user_query"], prompt["user"])} for prompt in prompts]
mlflow.evaluate(
    data=eval_set,
    model_type="databricks-agent",
    evaluator_config={"databricks-agent": {"global_guidelines": global_guidelines}},
)

We can review the results of the evaluation in the `experiments` tab.

![](./images/dev_1_eval.png)

The evaluation results suggest that a good next step in improving the model would be to clarify the expected format for order status responses. We won't go through that whole process here, but this highlights how agent evaluations provide a convenient way to help you clarify your quality criteria and get valuable cues on how to update your agents.

In general, this notebook details a simple workflow you may wish to consider when developing your own agents. We first define a very simple agent, then define our success criteria, and then evaluate the agent. We can then use the evaluation results to iteratively improve the agent.