#Tool-calling Agent

This is an auto-generated notebook created by an AI playground export. In this notebook, you will:
- Author a tool-calling [MLflow's `ResponsesAgent`](https://mlflow.org/docs/latest/api_reference/python_api/mlflow.pyfunc.html#mlflow.pyfunc.ResponsesAgent) that uses the OpenAI client
- Manually test the agent's output
- Evaluate the agent with Mosaic AI Agent Evaluation
- Log and deploy the agent

This notebook should be run on serverless or a cluster with DBR<17.

 **_NOTE:_**  This notebook uses the OpenAI SDK, but AI Agent Framework is compatible with any agent authoring framework, including LlamaIndex or LangGraph. To learn more, see the [Authoring Agents](https://docs.databricks.com/generative-ai/agent-framework/author-agent) Databricks documentation.

## Prerequisites

- Address all `TODO`s in this notebook.

In [0]:
%pip install -U -qqqq backoff databricks-openai uv databricks-agents mlflow-skinny[databricks]
dbutils.library.restartPython()

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyter-server 1.23.4 requires anyio<4,>=3.1.0, but you have anyio 4.12.1 which is incompatible.
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


## Define the agent in code
Below we define our agent code in a single cell, enabling us to easily write it to a local Python file for subsequent logging and deployment using the `%%writefile` magic command.

For more examples of tools to add to your agent, see [docs](https://docs.databricks.com/generative-ai/agent-framework/agent-tool.html).

In [0]:
%%writefile agent.py
import json
from typing import Any, Callable, Generator, Optional
from uuid import uuid4
import warnings

import backoff
import mlflow
import openai
from databricks.sdk import WorkspaceClient
from databricks_openai import UCFunctionToolkit, VectorSearchRetrieverTool
from mlflow.entities import SpanType
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import (
    ResponsesAgentRequest,
    ResponsesAgentResponse,
    ResponsesAgentStreamEvent,
    output_to_responses_items_stream,
    to_chat_completions_input,
)
from openai import OpenAI
from pydantic import BaseModel
from unitycatalog.ai.core.base import get_uc_function_client

############################################
# Define your LLM endpoint and system prompt
############################################
LLM_ENDPOINT_NAME = "databricks-meta-llama-3-3-70b-instruct"

SYSTEM_PROMPT = """"""


###############################################################################
## Define tools for your agent, enabling it to retrieve data or take actions
## beyond text generation
## To create and see usage examples of more tools, see
## https://docs.databricks.com/generative-ai/agent-framework/agent-tool.html
###############################################################################
class ToolInfo(BaseModel):
    """
    Class representing a tool for the agent.
    - "name" (str): The name of the tool.
    - "spec" (dict): JSON description of the tool (matches OpenAI Responses format)
    - "exec_fn" (Callable): Function that implements the tool logic
    """

    name: str
    spec: dict
    exec_fn: Callable


def create_tool_info(tool_spec, exec_fn_param: Optional[Callable] = None):
    tool_spec["function"].pop("strict", None)
    tool_name = tool_spec["function"]["name"]
    udf_name = tool_name.replace("__", ".")

    # Define a wrapper that accepts kwargs for the UC tool call,
    # then passes them to the UC tool execution client
    def exec_fn(**kwargs):
        function_result = uc_function_client.execute_function(udf_name, kwargs)
        if function_result.error is not None:
            return function_result.error
        else:
            return function_result.value
    return ToolInfo(name=tool_name, spec=tool_spec, exec_fn=exec_fn_param or exec_fn)


TOOL_INFOS = []

# You can use UDFs in Unity Catalog as agent tools
# TODO: Add additional tools
UC_TOOL_NAMES = ["takamol_demo.nitaqat_tools.*"]

uc_toolkit = UCFunctionToolkit(function_names=UC_TOOL_NAMES)
uc_function_client = get_uc_function_client()
for tool_spec in uc_toolkit.tools:
    TOOL_INFOS.append(create_tool_info(tool_spec))


# Use Databricks vector search indexes as tools
# See [docs](https://docs.databricks.com/generative-ai/agent-framework/unstructured-retrieval-tools.html) for details

# # (Optional) Use Databricks vector search indexes as tools
# # See https://docs.databricks.com/generative-ai/agent-framework/unstructured-retrieval-tools.html
# # for details
VECTOR_SEARCH_TOOLS = []
# # TODO: Add vector search indexes as tools or delete this block
# VECTOR_SEARCH_TOOLS.append(
#         VectorSearchRetrieverTool(
#         index_name="",
#         # filters="..."
#     )
# )
for vs_tool in VECTOR_SEARCH_TOOLS:
    TOOL_INFOS.append(create_tool_info(vs_tool.tool, vs_tool.execute))



class ToolCallingAgent(ResponsesAgent):
    """
    Class representing a tool-calling Agent
    """

    def __init__(self, llm_endpoint: str, tools: list[ToolInfo]):
        """Initializes the ToolCallingAgent with tools."""
        self.llm_endpoint = llm_endpoint
        self.workspace_client = WorkspaceClient()
        self.model_serving_client: OpenAI = (
            self.workspace_client.serving_endpoints.get_open_ai_client()
        )
        self._tools_dict = {tool.name: tool for tool in tools}

    def get_tool_specs(self) -> list[dict]:
        """Returns tool specifications in the format OpenAI expects."""
        return [tool_info.spec for tool_info in self._tools_dict.values()]

    @mlflow.trace(span_type=SpanType.TOOL)
    def execute_tool(self, tool_name: str, args: dict) -> Any:
        """Executes the specified tool with the given arguments."""
        return self._tools_dict[tool_name].exec_fn(**args)

    def call_llm(self, messages: list[dict[str, Any]]) -> Generator[dict[str, Any], None, None]:
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", message="PydanticSerializationUnexpectedValue")
            for chunk in self.model_serving_client.chat.completions.create(
                model=self.llm_endpoint,
                messages=to_chat_completions_input(messages),
                tools=self.get_tool_specs(),
                stream=True,
            ):
                chunk_dict = chunk.to_dict()
                if len(chunk_dict.get("choices", [])) > 0:
                    yield chunk_dict

    def handle_tool_call(
        self,
        tool_call: dict[str, Any],
        messages: list[dict[str, Any]],
    ) -> ResponsesAgentStreamEvent:
        """
        Execute tool calls, add them to the running message history, and return a ResponsesStreamEvent w/ tool output
        """
        args = json.loads(tool_call["arguments"])
        result = str(self.execute_tool(tool_name=tool_call["name"], args=args))

        tool_call_output = self.create_function_call_output_item(tool_call["call_id"], result)
        messages.append(tool_call_output)
        return ResponsesAgentStreamEvent(type="response.output_item.done", item=tool_call_output)

    def call_and_run_tools(
        self,
        messages: list[dict[str, Any]],
        max_iter: int = 10,
    ) -> Generator[ResponsesAgentStreamEvent, None, None]:
        for _ in range(max_iter):
            last_msg = messages[-1]
            if last_msg.get("role", None) == "assistant":
                return
            elif last_msg.get("type", None) == "function_call":
                yield self.handle_tool_call(last_msg, messages)
            else:
                yield from output_to_responses_items_stream(
                    chunks=self.call_llm(messages), aggregator=messages
                )

        yield ResponsesAgentStreamEvent(
            type="response.output_item.done",
            item=self.create_text_output_item("Max iterations reached. Stopping.", str(uuid4())),
        )

    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        session_id = None
        if request.custom_inputs and "session_id" in request.custom_inputs:
            session_id = request.custom_inputs.get("session_id")
        elif request.context and request.context.conversation_id:
            session_id = request.context.conversation_id

        if session_id:
            mlflow.update_current_trace(
                metadata={
                    "mlflow.trace.session": session_id,
                }
            )

        outputs = [
            event.item
            for event in self.predict_stream(request)
            if event.type == "response.output_item.done"
        ]
        return ResponsesAgentResponse(output=outputs, custom_outputs=request.custom_inputs)

    def predict_stream(self, request: ResponsesAgentRequest) -> Generator[ResponsesAgentStreamEvent, None, None]:
        session_id = None
        if request.custom_inputs and "session_id" in request.custom_inputs:
            session_id = request.custom_inputs.get("session_id")
        elif request.context and request.context.conversation_id:
            session_id = request.context.conversation_id

        if session_id:
            mlflow.update_current_trace(
                metadata={
                    "mlflow.trace.session": session_id,
                }
            )

        messages = to_chat_completions_input([i.model_dump() for i in request.input])
        if SYSTEM_PROMPT:
            messages.insert(0, {"role": "system", "content": SYSTEM_PROMPT})
        yield from self.call_and_run_tools(messages=messages)


# Log the model using MLflow
mlflow.openai.autolog()
AGENT = ToolCallingAgent(llm_endpoint=LLM_ENDPOINT_NAME, tools=TOOL_INFOS)
mlflow.models.set_model(AGENT)

Writing agent.py


## Test the agent

Interact with the agent to test its output. Since we manually traced methods within `ResponsesAgent`, you can view the trace for each step the agent takes, with any LLM calls made via the OpenAI SDK automatically traced by autologging.

Replace this placeholder input with an appropriate domain-specific example for your agent.

In [0]:
dbutils.library.restartPython()

In [0]:
from agent import AGENT

AGENT.predict(
    {"input": [{"role": "user", "content": "what is 4*3 in python"}], "custom_inputs": {"session_id": "test-session-123"}},
)

  check_function_info(function_info)


ResponsesAgentResponse(tool_choice=None, truncation=None, id=None, created_at=None, error=None, incomplete_details=None, instructions=None, metadata=None, model=None, object='response', output=[OutputItem(type='function_call', id='chatcmpl_3fad4718-656a-4d39-9f44-27b1499db6e8', call_id='call_79cdf2e8-9e93-42ae-8ccf-9f91d187fe27', name='takamol_demo__nitaqat_tools__get_overall_summary', arguments='{}'), OutputItem(type='function_call_output', call_id='call_79cdf2e8-9e93-42ae-8ccf-9f91d187fe27', output='total_establishments,total_workforce,total_saudis,overall_saudization_pct,platinum_count,high_green_count,mid_green_count,low_green_count,yellow_count,red_count,compliant_count,non_compliant_count\n50,10144,2798,27.58,4,12,11,7,7,9,34,16\n'), OutputItem(type='message', id='chatcmpl_271e4079-2fa7-4534-8d9a-7b22aa17fc1d', content=[{'text': 'In Python, the result of 4*3 is 12.', 'type': 'output_text'}], role='assistant')], parallel_tool_calls=None, temperature=None, tools=None, top_p=None, m

Trace(trace_id=tr-4e7124a420c8e446ef435ebbafa29548)

In [0]:
for chunk in AGENT.predict_stream(
    {"input": [{"role": "user", "content": "What is 4*3 in Python?"}], "custom_inputs": {"session_id": "test-session-123"}}
):
    print(chunk.model_dump(exclude_none=True))

{'type': 'response.output_item.done', 'item': {'type': 'function_call', 'id': 'chatcmpl_03b34b9c-4ff9-46d0-868f-45d4a01758d1', 'call_id': 'call_303a6fb4-8afc-44ac-8d72-451a67bb7c2b', 'name': 'takamol_demo__nitaqat_tools__get_overall_summary', 'arguments': '{}'}}


  check_function_info(function_info)


{'type': 'response.output_item.done', 'item': {'type': 'function_call_output', 'call_id': 'call_303a6fb4-8afc-44ac-8d72-451a67bb7c2b', 'output': 'total_establishments,total_workforce,total_saudis,overall_saudization_pct,platinum_count,high_green_count,mid_green_count,low_green_count,yellow_count,red_count,compliant_count,non_compliant_count\n50,10144,2798,27.58,4,12,11,7,7,9,34,16\n'}}
{'type': 'response.output_text.delta', 'item_id': 'chatcmpl_75a0a0da-f2c1-40e3-b90f-d7abbc86702f', 'delta': ''}
{'type': 'response.output_text.delta', 'item_id': 'chatcmpl_75a0a0da-f2c1-40e3-b90f-d7abbc86702f', 'delta': 'In Python, the result of 4*3 is 12.'}
{'type': 'response.output_text.delta', 'item_id': 'chatcmpl_75a0a0da-f2c1-40e3-b90f-d7abbc86702f', 'delta': ''}
{'type': 'response.output_item.done', 'item': {'id': 'chatcmpl_75a0a0da-f2c1-40e3-b90f-d7abbc86702f', 'content': [{'text': 'In Python, the result of 4*3 is 12.', 'type': 'output_text'}], 'role': 'assistant', 'type': 'message'}}


Trace(trace_id=tr-0d32a914715ea3ffb7cbcb358777042e)

### Log the `agent` as an MLflow model
Determine Databricks resources to specify for automatic auth passthrough at deployment time
- **TODO**: If your Unity Catalog Function queries a [vector search index](https://docs.databricks.com/generative-ai/agent-framework/unstructured-retrieval-tools.html) or leverages [external functions](https://docs.databricks.com/generative-ai/agent-framework/external-connection-tools.html), you need to include the dependent vector search index and UC connection objects, respectively, as resources. See [docs](https://docs.databricks.com/generative-ai/agent-framework/log-agent.html#specify-resources-for-automatic-authentication-passthrough) for more details.

Log the agent as code from the `agent.py` file. See [MLflow - Models from Code](https://mlflow.org/docs/latest/models.html#models-from-code).

In [0]:
# Determine Databricks resources to specify for automatic auth passthrough at deployment time
import mlflow
from agent import UC_TOOL_NAMES, VECTOR_SEARCH_TOOLS, LLM_ENDPOINT_NAME
from mlflow.models.resources import DatabricksFunction, DatabricksServingEndpoint
from pkg_resources import get_distribution
from databricks_openai import UCFunctionToolkit

resources = [DatabricksServingEndpoint(endpoint_name=LLM_ENDPOINT_NAME)]
for tool in VECTOR_SEARCH_TOOLS:
    resources.extend(tool.resources)

# Expand UC_TOOL_NAMES wildcards to actual function names
uc_toolkit = UCFunctionToolkit(function_names=UC_TOOL_NAMES)
for tool_spec in uc_toolkit.tools:
    # Extract the actual function name from the tool spec
    actual_function_name = tool_spec["function"]["name"].replace("__", ".")
    # TODO: If the UC function includes dependencies like external connection or vector search, please include them manually.
    # See the TODO in the markdown above for more information.
    resources.append(DatabricksFunction(function_name=actual_function_name))

input_example = {
    "input": [
        {
            "role": "user",
            "content": "Hello, what can you help me with?"
        }
    ],
    "custom_inputs": {
        "session_id": "test-session"
    }
}

with mlflow.start_run():
    logged_agent_info = mlflow.pyfunc.log_model(
        name="agent",
        python_model="agent.py",
        input_example=input_example,
        pip_requirements=[
            "databricks-openai",
            "backoff",
            f"databricks-connect=={get_distribution('databricks-connect').version}",
        ],
        resources=resources,
    )

🔗 View Logged Model at: https://dbc-c970b4e0-df3d.cloud.databricks.com/ml/experiments/1642291144890862/models/m-65ccebf70a554634bd3d4728c3ba9940?o=7474651595665617
2026/01/09 11:03:13 INFO mlflow.pyfunc: Predicting on input example to validate output


## Evaluate the agent with [Agent Evaluation](https://docs.databricks.com/mlflow3/genai/eval-monitor)

You can edit the requests or expected responses in your evaluation dataset and run evaluation as you iterate your agent, leveraging mlflow to track the computed quality metrics.

Evaluate your agent with one of our [predefined LLM scorers](https://docs.databricks.com/mlflow3/genai/eval-monitor/predefined-judge-scorers), or try adding [custom metrics](https://docs.databricks.com/mlflow3/genai/eval-monitor/custom-scorers).

In [0]:
import mlflow
from mlflow.genai.scorers import RelevanceToQuery, Safety, RetrievalRelevance, RetrievalGroundedness

eval_dataset = [
    {
        "inputs": {
            "input": [
                {
                    "role": "user",
                    "content": "Which establishments need the most urgent attention?"
                }
            ]
        },
        "expected_response": None
    },
    {
        "inputs": {
            "input": [
                {
                    "role": "user",
                    "content": "Which establishments need the most urgent attention?"
                },
                {
                    "role": "assistant",
                    "content": "To identify the establishments that need the most urgent attention, we need to retrieve the compliance alerts for each establishment.\n\n<uc_function_call>{\"id\":\"2Z7E0cMCxbWZtxeFGDPg\",\"name\":\"takamol_demo.nitaqat_tools.get_compliance_alerts\",\"arguments\":\"{\\\"establishment_name\\\":\\\"all\\\"}\"}</uc_function_call>\n\n<uc_function_result>{\"id\":\"2Z7E0cMCxbWZtxeFGDPg\",\"content\":\"{\\\"is_truncated\\\":false,\\\"columns\\\":[\\\"establishment_name\\\",\\\"alert_type\\\",\\\"severity\\\",\\\"message\\\",\\\"recommended_action\\\",\\\"created_date\\\"],\\\"rows\\\":[[\\\"Royal Apex Financial\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Yellow. Action required to maintain or improve compliance status.\\\",\\\"Hire 4 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Premier Summit Contracting\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Red. Action required to maintain or improve compliance status.\\\",\\\"Hire 3 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Elite Global Resorts\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Red. Action required to maintain or improve compliance status.\\\",\\\"Hire 8 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Arabian Vista Clinic Group\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Red. Action required to maintain or improve compliance status.\\\",\\\"Hire 7 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Al Atlas Development Co.\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Red. Action required to maintain or improve compliance status.\\\",\\\"Hire 3 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"National Global Resorts\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Yellow. Action required to maintain or improve compliance status.\\\",\\\"Hire 4 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Al Apex Factory\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Red. Action required to maintain or improve compliance status.\\\",\\\"Hire 7 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Golden Prime Mobile Solutions\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Yellow. Action required to maintain or improve compliance status.\\\",\\\"Hire 10 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Royal Phoenix IT Solutions\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Yellow. Action required to maintain or improve compliance status.\\\",\\\"Hire 6 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Premier Oasis Hotels\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Red. Action required to maintain or improve compliance status.\\\",\\\"Hire 9 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Golden Oasis Stores\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Red. Action required to maintain or improve compliance status.\\\",\\\"Hire 5 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Al Prime Networks\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Yellow. Action required to maintain or improve compliance status.\\\",\\\"Hire 8 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"National Oasis Industrial\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Red. Action required to maintain or improve compliance status.\\\",\\\"Hire 2 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Golden Vista Telecom\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Yellow. Action required to maintain or improve compliance status.\\\",\\\"Hire 6 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Saudi Atlas Medical Center\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Red. Action required to maintain or improve compliance status.\\\",\\\"Hire 6 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Elite Horizon Tourism\\\",\\\"Zone Risk\\\",\\\"High\\\",\\\"Current zone is Yellow. Action required to maintain or improve compliance status.\\\",\\\"Hire 4 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"United Phoenix Engineering\\\",\\\"Zone Risk\\\",\\\"Medium\\\",\\\"Current zone is Low_Green. Action required to maintain or improve compliance status.\\\",\\\"Hire 10 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"United Crown Systems\\\",\\\"Zone Risk\\\",\\\"Medium\\\",\\\"Current zone is Low_Green. Action required to maintain or improve compliance status.\\\",\\\"Hire 5 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"United Crown Systems\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"5 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Elite Palm Builders\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"4 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Al Palm Engineering\\\",\\\"Zone Risk\\\",\\\"Medium\\\",\\\"Current zone is Low_Green. Action required to maintain or improve compliance status.\\\",\\\"Hire 9 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Al Palm Engineering\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"4 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Gulf Summit Capital\\\",\\\"Zone Risk\\\",\\\"Medium\\\",\\\"Current zone is Low_Green. Action required to maintain or improve compliance status.\\\",\\\"Hire 10 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Arabian Vista Clinic Group\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"5 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Royal Metro Capital\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"4 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Al Apex Factory\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"2 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Arabian Star Health Services\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"3 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Golden Prime Mobile Solutions\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"1 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Gulf Palm Tourism\\\",\\\"Zone Risk\\\",\\\"Medium\\\",\\\"Current zone is Low_Green. Action required to maintain or improve compliance status.\\\",\\\"Hire 8 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Al Crescent Connect\\\",\\\"Zone Risk\\\",\\\"Medium\\\",\\\"Current zone is Low_Green. Action required to maintain or improve compliance status.\\\",\\\"Hire 10 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Al Crescent Connect\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"4 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Golden Oasis Stores\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"1 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Al Summit Banking Services\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"3 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"National Falcon Digital\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"1 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Royal Crescent Commerce LLC\\\",\\\"Zone Risk\\\",\\\"Medium\\\",\\\"Current zone is Low_Green. Action required to maintain or improve compliance status.\\\",\\\"Hire 8 Saudi nationals to move to the next zone.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Royal Summit Commerce LLC\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"2 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Al Prime Networks\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"4 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Golden Vista Telecom\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"2 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Golden Crescent Clinic Group\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"5 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"],[\\\"Arabian Global Telecom\\\",\\\"Contract Expiry\\\",\\\"Medium\\\",\\\"4 Saudi employee contract(s) expiring within 30 days.\\\",\\\"Review and renew contracts to maintain Saudization rate.\\\",\\\"2026-01-06T03:46:55.000+0000\\\"]]}\"}</uc_function_result>\n\nThe establishments that need the most urgent attention are:\n\n1. Premier Summit Contracting - Current zone is Red, and they need to hire 3 Saudi nationals to move to the next zone.\n2. Elite Global Resorts - Current zone is Red, and they need to hire 8 Saudi nationals to move to the next zone.\n3. Arabian Vista Clinic Group - Current zone is Red, and they need to hire 7 Saudi nationals to move to the next zone.\n4. Al Atlas Development Co. - Current zone is Red, and they need to hire 3 Saudi nationals to move to the next zone.\n5. Al Apex Factory - Current zone is Red, and they need to hire 7 Saudi nationals to move to the next zone.\n6. Premier Oasis Hotels - Current zone is Red, and they need to hire 9 Saudi nationals to move to the next zone.\n7. Golden Oasis Stores - Current zone is Red, and they need to hire 5 Saudi nationals to move to the next zone.\n8. National Oasis Industrial - Current zone is Red, and they need to hire 2 Saudi nationals to move to the next zone.\n9. Saudi Atlas Medical Center - Current zone is Red, and they need to hire 6 Saudi nationals to move to the next zone.\n\nThese establishments are at high risk and require immediate attention to maintain or improve their compliance status."
                },
                {
                    "role": "user",
                    "content": "Compare saudization rates of Retail sector versus Healthcare sector"
                }
            ]
        },
        "expected_response": None
    }
]

eval_results = mlflow.genai.evaluate(
    data=eval_dataset,
    predict_fn=lambda input: AGENT.predict({"input": input, "custom_inputs": {"session_id": "evaluation-session"}}),
    scorers=[RelevanceToQuery(), Safety()], # add more scorers here if they're applicable
)

# Review the evaluation results in the MLfLow UI (see console output)

2026/01/09 11:00:44 INFO mlflow.genai.utils.data_validation: Testing model prediction with the first sample in the dataset. To disable this check, set the MLFLOW_GENAI_EVAL_SKIP_TRACE_VALIDATION environment variable to True.


[0;31m---------------------------------------------------------------------------[0m
[0;31mAttributeError[0m                            Traceback (most recent call last)
File [0;32m/local_disk0/.ephemeral_nfs/envs/pythonEnv-071344ce-014b-439e-bb94-37f481af5dac/lib/python3.10/site-packages/mlflow/tracing/fluent.py:490[0m, in [0;36mstart_span[0;34m(name, span_type, attributes, trace_destination)[0m
[1;32m    488[0m trace_manager [38;5;241m=[39m InMemoryTraceManager[38;5;241m.[39mget_instance()
[1;32m    489[0m mlflow_span [38;5;241m=[39m trace_manager[38;5;241m.[39mget_span_from_id(
[0;32m--> 490[0m     request_id, encode_span_id([43motel_span[49m[38;5;241;43m.[39;49m[43mcontext[49m[38;5;241m.[39mspan_id)
[1;32m    491[0m )
[1;32m    492[0m mlflow_span[38;5;241m.[39mset_span_type(span_type)

[0;31mAttributeError[0m: 'NonRecordingSpan' object has no attribute 'context'

During handling of the above exception, another exception occurred:

[0;31mValue

## Perform pre-deployment validation of the agent
Before registering and deploying the agent, we perform pre-deployment checks via the [mlflow.models.predict()](https://mlflow.org/docs/latest/python_api/mlflow.models.html#mlflow.models.predict) API. See [documentation](https://docs.databricks.com/machine-learning/model-serving/model-serving-debug.html#validate-inputs) for details

In [0]:
mlflow.models.predict(
    model_uri=f"runs:/{logged_agent_info.run_id}/agent",
    input_data={"input": [{"role": "user", "content": "Hello!"}], "custom_inputs": {"session_id": "validation-session"}},
    env_manager="uv",
)

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

2026/01/09 11:00:01 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

2026/01/09 11:00:05 INFO mlflow.utils.virtualenv: Creating a new environment in /tmp/virtualenv_envs/mlflow-5804cae42c486e0a5c4d1efd9890ca6d437c8e9d with python version 3.10.12 using uv
Using CPython 3.10.12 interpreter at: [36m/usr/bin/python3.10[39m
Creating virtual environment at: [36m/tmp/virtualenv_envs/mlflow-5804cae42c486e0a5c4d1efd9890ca6d437c8e9d[39m
Activate with: [32msource /tmp/virtualenv_envs/mlflow-5804cae42c486e0a5c4d1efd9890ca6d437c8e9d/bin/activate[39m
2026/01/09 11:00:06 INFO mlflow.utils.virtualenv: Installing dependencies
[2mUsing Python 3.10.12 environment at: /tmp/virtualenv_envs/mlflow-5804cae42c486e0a5c4d1efd9890ca6d437c8e9d[0m
[2mResolved [1m3 packages[0m [2min 79ms[0m[0m
[36m[1mDownloading[0m[39m setuptools [2m(1.1MiB)[0m
[36m[1mDownloading[0m[39m pip [2m(2.0MiB)[0m
 [36m[1mDownloaded[0m[39m setuptools
 [36m[1mDownloaded[0m[39m pip
[2mPrepared [1m3 packages[0m [2min 112ms[0m[0m
[2mInstalled [1m3 packages[0m [2min 18m

{"object": "response", "output": [{"type": "message", "id": "chatcmpl_20cb0825-cf0c-4ed2-822d-0574bb127a42", "content": [{"text": "Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?", "type": "output_text"}], "role": "assistant"}], "custom_outputs": {"session_id": "validation-session"}}

2026/01/09 11:00:22 INFO mlflow.tracing.export.async_export_queue: Flushing the async trace logging queue before program exit. This may take a while...


## Register the model to Unity Catalog

Update the `catalog`, `schema`, and `model_name` below to register the MLflow model to Unity Catalog.

In [0]:
mlflow.set_registry_uri("databricks-uc")

# TODO: define the catalog, schema, and model name for your UC model
catalog = "takamol_demo"
schema = "labor_market"
model_name = "agentic_ai_model"
UC_MODEL_NAME = f"{catalog}.{schema}.{model_name}"

# register the model to UC
uc_registered_model_info = mlflow.register_model(
    model_uri=logged_agent_info.model_uri, name=UC_MODEL_NAME
)

Registered model 'takamol_demo.labor_market.agentic_ai_model' already exists. Creating a new version of this model...


Downloading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Uploading artifacts:   0%|          | 0/13 [00:00<?, ?it/s]

🔗 Created version '1' of model 'takamol_demo.labor_market.agentic_ai_model': https://dbc-c970b4e0-df3d.cloud.databricks.com/explore/data/models/takamol_demo/labor_market/agentic_ai_model/version/1?o=7474651595665617


## Deploy the agent

In [0]:
from databricks import agents
# NOTE: pass scale_to_zero=True to agents.deploy() to enable scale-to-zero for cost savings.
# This is not recommended for production workloads, as capacity is not guaranteed when scaled to zero.
# Scaled to zero endpoints may take extra time to respond when queried, while they scale back up.
agents.deploy(UC_MODEL_NAME, uc_registered_model_info.version, tags = {"endpointSource": "playground"})

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

For more information, see: https://docs.databricks.com/aws/en/generative-ai/agent-framework/feedback-model



    Deployment of takamol_demo.labor_market.agentic_ai_model version 1 initiated.  This can take up to 15 minutes and the Review App & Query Endpoint will not work until this deployment finishes.

    View status: https://dbc-c970b4e0-df3d.cloud.databricks.com/ml/endpoints/agents_takamol_demo-labor_market-agentic_ai_model/?o=7474651595665617
    Review App: https://dbc-c970b4e0-df3d.cloud.databricks.com/ml/review-v2/ce6bd467742d4bc3adc0bc60349771ac/chat?o=7474651595665617

You can refer back to the links above from the endpoint detail page at https://dbc-c970b4e0-df3d.cloud.databricks.com/ml/endpoints/agents_takamol_demo-labor_market-agentic_ai_model/?o=7474651595665617.

To set up monitoring for your deployed agent, see:
https://docs.databricks.com/aws/en/mlflow3/genai/eval-monitor/production-monitoring


Deployment(model_name='takamol_demo.labor_market.agentic_ai_model', model_version='1', endpoint_name='agents_takamol_demo-labor_market-agentic_ai_model', served_entity_name='takamol_demo-labor_market-agentic_ai_model_1', query_endpoint='https://dbc-c970b4e0-df3d.cloud.databricks.com/serving-endpoints/agents_takamol_demo-labor_market-agentic_ai_model/served-models/takamol_demo-labor_market-agentic_ai_model_1/invocations?o=7474651595665617', endpoint_url='https://dbc-c970b4e0-df3d.cloud.databricks.com/ml/endpoints/agents_takamol_demo-labor_market-agentic_ai_model/?o=7474651595665617', review_app_url='https://dbc-c970b4e0-df3d.cloud.databricks.com/ml/review-v2/ce6bd467742d4bc3adc0bc60349771ac/chat?o=7474651595665617')

## Next steps

After your agent is deployed, you can chat with it in AI playground to perform additional checks, share it with SMEs in your organization for feedback, or embed it in a production application. See [docs](https://docs.databricks.com/generative-ai/deploy-agent.html) for details