## Evaluating the Fabric Data Agent

### 1. Initial Configurations
#### 1.1 Libraries and packages

In [1]:
# Install initial packages and sdk
%pip install openai==1.70.0 httpx==0.27.2

StatementMeta(, 8e753cac-58dd-44d0-bd97-b418b6932841, 8, Finished, Available, Finished)

Collecting openai==1.70.0
  Downloading openai-1.70.0-py3-none-any.whl.metadata (25 kB)
Collecting httpx==0.27.2
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai==1.70.0)
  Downloading jiter-0.11.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Collecting pydantic<3,>=1.9.0 (from openai==1.70.0)
  Downloading pydantic-2.11.9-py3-none-any.whl.metadata (68 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m68.4/68.4 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
Collecting typing-extensions<5,>=4.11 (from openai==1.70.0)
  Downloading typing_extensions-4.15.0-py3-none-any.whl.metadata (3.3 kB)
Collecting httpcore==1.* (from httpx==0.27.2)
  Downloading httpcore-1.0.9-py3-none-any.whl.metadata (21 kB)
Collecting h11>=0.16 (from httpcore==1.*->httpx==0.27.2)
  Downloading h11-0.16.0-py3-none-any.whl.metadata (8.3 kB)
Collecting annotated-types>=0.6.0 (from pydantic<3,>=1.9.0->openai==1.70.0)


In [2]:
%pip install -U fabric-data-agent-sdk

StatementMeta(, 8e753cac-58dd-44d0-bd97-b418b6932841, 14, Finished, Available, Finished)

Collecting fabric-data-agent-sdk
  Downloading fabric_data_agent_sdk-0.1.14a0-py3-none-any.whl.metadata (5.7 kB)
Collecting semantic-link-labs==0.9.10 (from fabric-data-agent-sdk)
  Downloading semantic_link_labs-0.9.10-py3-none-any.whl.metadata (26 kB)
Collecting azure-kusto-data>=4.5.0 (from fabric-data-agent-sdk)
  Downloading azure_kusto_data-5.0.5-py2.py3-none-any.whl.metadata (4.2 kB)
Collecting azure-identity==1.17.1 (from fabric-data-agent-sdk)
  Downloading azure_identity-1.17.1-py3-none-any.whl.metadata (79 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.4/79.4 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting markdown2==2.5.3 (from fabric-data-agent-sdk)
  Downloading markdown2-2.5.3-py3-none-any.whl.metadata (2.1 kB)
Collecting anytree (from semantic-link-labs==0.9.10->fabric-data-agent-sdk)
  Downloading anytree-2.13.0-py3-none-any.whl.metadata (8.0 kB)
Collecting polib (from semantic-link-labs==0.9.10->fabric-data-agent-sdk)
  Downloadi

In [7]:
# Install generic libraries and packages
import requests
import json
import pprint
import typing as t
import time
import uuid
import pandas as pd

# Install openai libraries and packages
import openai, httpx
from openai import OpenAI
from openai._exceptions import APIStatusError
from openai._models import FinalRequestOptions
from openai._types import Omit
from openai._utils import is_given
from synapse.ml.mlflow import get_mlflow_env_config
from sempy.fabric._token_provider import SynapseTokenProvider
from fabric.dataagent.evaluation import evaluate_data_agent #evaluacion del agente
from fabric.dataagent.evaluation import get_evaluation_summary #obtener el resumen de evaluacion

StatementMeta(, 8e753cac-58dd-44d0-bd97-b418b6932841, 20, Finished, Available, Finished)

### 2. Define the questions to ask to the Agent

Now, we are going to define some examples of questions (with their respective answers) that we are goingo to use during the evaluation

In [4]:
# Define a sample evaluation set with user questions and their expected answers.
df = pd.DataFrame(
    columns=["question", "expected_answer"],
    data=[
        ["Dime el resultado neto que se ha obtenido en diciembre de 2025", "141.492"],
        ["¿Cuál es el cliente con mayor volumen de ventas para la comunidad autónoma (CCAA) de Valencia en 2025?", "Carpintería Martínez"],
        ["Muestrame el valor acumulado del EBITD para julio de 2024", "3.237.742"],
        ["¿Cuál es el tercer producto con mayor volumen de ventas para la comunidad autónoma (CCAA) de Galicia en 2025?", "Serrucho manual"]
    ]
)

df.head()

StatementMeta(, 8e753cac-58dd-44d0-bd97-b418b6932841, 17, Finished, Available, Finished)

Unnamed: 0,question,expected_answer
0,Dime el resultado neto que se ha obtenido en d...,141.492
1,¿Cuál es el cliente con mayor volumen de venta...,Carpintería Martínez
2,Muestrame el valor acumulado del EBITD para ju...,3.237.742
3,¿Cuál es el tercer producto con mayor volumen ...,Serrucho manual


### 3. Evaluate the Agent

The next step is to run the evaluation using the evaluate_data_agent function. This function compares the agent's responses with the expected results and stores the evaluation metrics.

In [5]:
# Name of your Data Agent
data_agent_name = "FinancialAnalystAssistant_FDA"

# (Optional) Name of the workspace if the Data Agent is in a different workspace
#workspace_name = None

# (Optional) Name of the output table to store evaluation results (default: "evaluation_output")
# Two tables will be created:
# - "<table_name>": contains summary results (e.g., accuracy)
# - "<table_name>_steps": contains detailed reasoning and step-by-step execution
table_name = "demo_evaluation_output"

# Specify the Data Agent stage: "production" (default) or "sandbox"
data_agent_stage = "production"

# Run the evaluation and get the evaluation ID
evaluation_id = evaluate_data_agent(
    df,
    data_agent_name,
    #workspace_name=workspace_name,
    table_name=table_name,
    data_agent_stage=data_agent_stage
)

print(f"Unique ID for the current evaluation run: {evaluation_id}")

StatementMeta(, 8e753cac-58dd-44d0-bd97-b418b6932841, 18, Finished, Available, Finished)

**🔗Failed Thread(s):**
- [thread_4qs87rNoJkii1hoo60tk5CnB](https://app.fabric.microsoft.com/workloads/de-ds/dataagents/80caf30a-9a48-4a48-81fd-f3762bba1365/externalThread?debug.aiSkillThreadIdOverride=thread_4qs87rNoJkii1hoo60tk5CnB&debug.aiSkillViewPublishedOverride=0)
- [thread_N8A2aOGrYiLJYDEgQHdZFRAc](https://app.fabric.microsoft.com/workloads/de-ds/dataagents/80caf30a-9a48-4a48-81fd-f3762bba1365/externalThread?debug.aiSkillThreadIdOverride=thread_N8A2aOGrYiLJYDEgQHdZFRAc&debug.aiSkillViewPublishedOverride=0)
- [thread_rP1RsbQODMIn24VAvzASyl7V](https://app.fabric.microsoft.com/workloads/de-ds/dataagents/80caf30a-9a48-4a48-81fd-f3762bba1365/externalThread?debug.aiSkillThreadIdOverride=thread_rP1RsbQODMIn24VAvzASyl7V&debug.aiSkillViewPublishedOverride=0)
- [thread_Em5D1EX201clPJ13r06tXdkp](https://app.fabric.microsoft.com/workloads/de-ds/dataagents/80caf30a-9a48-4a48-81fd-f3762bba1365/externalThread?debug.aiSkillThreadIdOverride=thread_Em5D1EX201clPJ13r06tXdkp&debug.aiSkillViewPublishedOverride=0)

100%|██████████| 4/4 [00:30<00:00,  7.60s/it]


Unique ID for the current evaluation run: c5d79162-6e3f-4e57-9dcc-6819fc5a4c87


We can have a summary of the results evaluation, in which we can see if the responses of the Agent are true, false or doubtful

In [9]:
# Retrieve a summary of the evaluation results
df = get_evaluation_summary(table_name)
df.head()

StatementMeta(, 8e753cac-58dd-44d0-bd97-b418b6932841, 22, Finished, Available, Finished)

index,evaluation_id,T,F,?,%
0,c5d79162-6e3f-4e57-9dcc-6819fc5a4c87,0,4,0,0.0


### 4. Check the results deeper

To delve deeper into how the Data Agent responded to each individual question, use the get_evaluation_details function. This function returns a detailed breakdown of the evaluation run, including the agent's actual responses, whether they match the expected answer, and a link to the evaluation thread.

In [12]:
from fabric.dataagent.evaluation import get_evaluation_details

# Table name used during evaluation
table_name = "demo_evaluation_output"

# Whether to return all evaluation rows (True) or only failures (False)
get_all_rows = False

# Whether to print a summary of the results
verbose = True

# Retrieve evaluation details for a specific run
eval_details = get_evaluation_details(
    evaluation_id,
    table_name,
    get_all_rows=get_all_rows,
    verbose=verbose
)

StatementMeta(, 8e753cac-58dd-44d0-bd97-b418b6932841, 25, Finished, Available, Finished)

index,question,expected_answer,actual_answer,evaluation_judgement,thread_url
0,Dime el resultado neto que se ha obtenido en diciembre de 2025,141.492,"No dispongo del dato del Resultado Neto para diciembre de 2025 con la información disponible actualmente. Si necesitas el resultado de otro mes o año, házmelo saber.",False,thread_4qs87rNoJkii1hoo60tk5CnB
1,Muestrame el valor acumulado del EBITD para julio de 2024,3.237.742,El valor acumulado del EBITDA para julio de 2024 es de 1.526.084 €.,False,thread_rP1RsbQODMIn24VAvzASyl7V
2,¿Cuál es el cliente con mayor volumen de ventas para la comunidad autónoma (CCAA) de Valencia en 2025?,Carpintería Martínez,"No se han encontrado datos de ventas para la comunidad autónoma de Valencia en el año 2025, o bien aún no existen registros para ese periodo o región en el sistema. Si necesitas información de otro año o comunidad, por favor indícalo.",False,thread_Em5D1EX201clPJ13r06tXdkp
3,¿Cuál es el tercer producto con mayor volumen de ventas para la comunidad autónoma (CCAA) de Galicia en 2025?,Serrucho manual,El tercer producto con mayor volumen de ventas en Galicia en 2025 es el Poste tratado 8x8.,False,thread_N8A2aOGrYiLJYDEgQHdZFRAc
