# **Using Semantic Models in Fabric Data Agent**

This notebook contains utilities to help configure & evaluate the Data Agent for accuracy and performance. Refer to the blog xxx and checklist yyyy for more details.

**Pre-requisites** : 
- A paid F2 or higher Fabric capacity, or a Power BI Premium per capacity (P1 or higher) capacity with Microsoft Fabric enabled
- Fabric data agent tenant settings is enabled.
- Cross-geo processing for AI is enabled.
- Cross-geo storing for AI is enabled.
- At least one of these, with data: A warehouse, a lakehouse, one or more Power BI semantic models, a KQL database, or an ontology.
- Power BI semantic models via XMLA endpoints tenant switch is enabled for Power BI semantic model data sources.

This notebook will only run as a Fabric notebook


In [1]:
%pip install semantic-link-labs fabric-data-agent-sdk --q

Reason for being yanked: Yanked due to conflicts with CVE-2024-35195 mitigation[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


In [2]:
import sempy.fabric as fabric

dataset = "agentic_semantic_model"
workspace = "Dynamic_UI_Agentic_App"
workspace_id = fabric.resolve_workspace_id(workspace)
data_agent_name = "Transactions Data Agent"
data_agent_id = "0b3af9aa-bdfd-4211-bac9-4c9ab96fa174"

## Run Model BPA

**Documentation** : https://learn.microsoft.com/en-us/power-bi/transform-model/service-notebooks#best-practice-analyzer

Best practices analyzer offers tips to improve the design and performance of your semantic model. By default, the BPA checks a set of more than 60 rules against your semantic model and summarizes the results. The rules used in the BPA come from experts within Microsoft and the Fabric Community. The suggestions for improvement are organized into five categories: Performance, DAX Expressions, Error Prevention, Maintenance, and Formatting.

For Data Agent performance and accuracy, prioritize performance, DAX Expressions, Error Prevention suggestions. 

In [3]:
fabric.run_model_bpa(dataset=dataset, workspace=workspace)

Rule Name,Object Type,Object Name,Severity
"Inactive relationships that are never activatedInactive relationships are activated using the USERELATIONSHIP function. If an inactive relationship is not referenced in any measure via this function, the relationship will not be used. It should be determined whether the relationship is not necessary or to activate the relationship via this method.",Relationship,'chat_history'[user_id] -> 'users'[id],⚠️
"Inactive relationships that are never activatedInactive relationships are activated using the USERELATIONSHIP function. If an inactive relationship is not referenced in any measure via this function, the relationship will not be used. It should be determined whether the relationship is not necessary or to activate the relationship via this method.",Relationship,'session_duration_date'[user_id] -> 'users'[id],⚠️
"Inactive relationships that are never activatedInactive relationships are activated using the USERELATIONSHIP function. If an inactive relationship is not referenced in any measure via this function, the relationship will not be used. It should be determined whether the relationship is not necessary or to activate the relationship via this method.",Relationship,'transactions'[to_account_id] -> 'accounts'[id],⚠️
"Measures should not be direct references of other measuresThis rule identifies measures which are simply a reference to another measure. As an example, consider a model with two measures: [MeasureA] and [MeasureB]. This rule would be triggered for MeasureB if MeasureB's DAX was MeasureB:=[MeasureA]. Such duplicative measures should be removed.",Measure,Net Cash Flow per User,⚠️
"Measures should not be direct references of other measuresThis rule identifies measures which are simply a reference to another measure. As an example, consider a model with two measures: [MeasureA] and [MeasureB]. This rule would be triggered for MeasureB if MeasureB's DAX was MeasureB:=[MeasureA]. Such duplicative measures should be removed.",Measure,Total Incoming per User,⚠️
"Measures should not be direct references of other measuresThis rule identifies measures which are simply a reference to another measure. As an example, consider a model with two measures: [MeasureA] and [MeasureB]. This rule would be triggered for MeasureB if MeasureB's DAX was MeasureB:=[MeasureA]. Such duplicative measures should be removed.",Measure,Total Outgoing per User,⚠️
No two measures should have the same definitionTwo measures with different names and defined by the same DAX expression should be avoided to reduce redundancy.,Measure,Deposits Amount,⚠️
No two measures should have the same definitionTwo measures with different names and defined by the same DAX expression should be avoided to reduce redundancy.,Measure,Income by Category,⚠️
No two measures should have the same definitionTwo measures with different names and defined by the same DAX expression should be avoided to reduce redundancy.,Measure,Outgoing Payments Amount,⚠️
No two measures should have the same definitionTwo measures with different names and defined by the same DAX expression should be avoided to reduce redundancy.,Measure,Spending by Category,⚠️

Rule Name,Object Type,Object Name,Severity
"Do not summarize numeric columnsNumeric columns (integer, decimal, double) should have their SummarizeBy property set to ""None"" to avoid accidental summation in Power BI (create measures instead).",Column,'UserAsks'[response_time_seconds],⚠️
"Do not summarize numeric columnsNumeric columns (integer, decimal, double) should have their SummarizeBy property set to ""None"" to avoid accidental summation in Power BI (create measures instead).",Column,'accounts'[balance],⚠️
"Do not summarize numeric columnsNumeric columns (integer, decimal, double) should have their SummarizeBy property set to ""None"" to avoid accidental summation in Power BI (create measures instead).",Column,'chat_history'[completion_tokens],⚠️
"Do not summarize numeric columnsNumeric columns (integer, decimal, double) should have their SummarizeBy property set to ""None"" to avoid accidental summation in Power BI (create measures instead).",Column,'chat_history'[prompt_tokens],⚠️
"Do not summarize numeric columnsNumeric columns (integer, decimal, double) should have their SummarizeBy property set to ""None"" to avoid accidental summation in Power BI (create measures instead).",Column,'chat_history'[response_time_ms],⚠️
"Do not summarize numeric columnsNumeric columns (integer, decimal, double) should have their SummarizeBy property set to ""None"" to avoid accidental summation in Power BI (create measures instead).",Column,'chat_history'[total_tokens],⚠️
"Do not summarize numeric columnsNumeric columns (integer, decimal, double) should have their SummarizeBy property set to ""None"" to avoid accidental summation in Power BI (create measures instead).",Column,'tool_usage'[tokens_used],⚠️
"Do not summarize numeric columnsNumeric columns (integer, decimal, double) should have their SummarizeBy property set to ""None"" to avoid accidental summation in Power BI (create measures instead).",Column,'transactions'[amount],⚠️
First letter of objects must be capitalizedThe first letter of object names should be capitalized to maintain professional quality.,Column,'ContentIssues'[agent_id],ℹ️
First letter of objects must be capitalizedThe first letter of object names should be capitalized to maintain professional quality.,Column,'ContentIssues'[content],ℹ️

Rule Name,Object Type,Object Name,Severity
"Visible objects with no descriptionAdd descriptions to objects. These descriptions are shown on hover within the Field List in Power BI Desktop. Additionally, you can leverage these descriptions to create an automated data dictionary.",Column,'ContentIssues'[agent_id],ℹ️
"Visible objects with no descriptionAdd descriptions to objects. These descriptions are shown on hover within the Field List in Power BI Desktop. Additionally, you can leverage these descriptions to create an automated data dictionary.",Column,'ContentIssues'[content],ℹ️
"Visible objects with no descriptionAdd descriptions to objects. These descriptions are shown on hover within the Field List in Power BI Desktop. Additionally, you can leverage these descriptions to create an automated data dictionary.",Column,'ContentIssues'[content_filter_results],ℹ️
"Visible objects with no descriptionAdd descriptions to objects. These descriptions are shown on hover within the Field List in Power BI Desktop. Additionally, you can leverage these descriptions to create an automated data dictionary.",Column,'ContentIssues'[hate_filtered],ℹ️
"Visible objects with no descriptionAdd descriptions to objects. These descriptions are shown on hover within the Field List in Power BI Desktop. Additionally, you can leverage these descriptions to create an automated data dictionary.",Column,'ContentIssues'[hate_severity],ℹ️
"Visible objects with no descriptionAdd descriptions to objects. These descriptions are shown on hover within the Field List in Power BI Desktop. Additionally, you can leverage these descriptions to create an automated data dictionary.",Column,'ContentIssues'[jailbreak_detected],ℹ️
"Visible objects with no descriptionAdd descriptions to objects. These descriptions are shown on hover within the Field List in Power BI Desktop. Additionally, you can leverage these descriptions to create an automated data dictionary.",Column,'ContentIssues'[jailbreak_filtered],ℹ️
"Visible objects with no descriptionAdd descriptions to objects. These descriptions are shown on hover within the Field List in Power BI Desktop. Additionally, you can leverage these descriptions to create an automated data dictionary.",Column,'ContentIssues'[message_type],ℹ️
"Visible objects with no descriptionAdd descriptions to objects. These descriptions are shown on hover within the Field List in Power BI Desktop. Additionally, you can leverage these descriptions to create an automated data dictionary.",Column,'ContentIssues'[self_harm_filtered],ℹ️
"Visible objects with no descriptionAdd descriptions to objects. These descriptions are shown on hover within the Field List in Power BI Desktop. Additionally, you can leverage these descriptions to create an automated data dictionary.",Column,'ContentIssues'[self_harm_severity],ℹ️

Rule Name,Object Type,Object Name,Severity
"Avoid using views when using Direct Lake modeIn Direct Lake mode, views will always fall back to DirectQuery. Thus, in order to obtain the best performance use lakehouse tables instead of views.",Model,Model,⚠️
Check if bi-directional and many-to-many relationships are validBi-directional and many-to-many relationships may cause performance degradation or even have unintended consequences. Make sure to check these specific relationships to ensure they are working as designed and are actually necessary.,Relationship,'chat_history'[tool_call_id] -> 'tool_usage'[tool_call_id],⚠️
"Consider a star-schema instead of a snowflake architectureGenerally speaking, a star-schema is the optimal architecture for tabular models. That being the case, there are valid cases to use a snowflake approach. Please check your model and consider moving to a star-schema architecture.",Table,UserAsks,⚠️
"Consider a star-schema instead of a snowflake architectureGenerally speaking, a star-schema is the optimal architecture for tabular models. That being the case, there are valid cases to use a snowflake approach. Please check your model and consider moving to a star-schema architecture.",Table,accounts,⚠️
"Consider a star-schema instead of a snowflake architectureGenerally speaking, a star-schema is the optimal architecture for tabular models. That being the case, there are valid cases to use a snowflake approach. Please check your model and consider moving to a star-schema architecture.",Table,session_duration_date,⚠️
Date/calendar tables should be marked as a date tableThis rule looks for tables that contain the words 'date' or 'calendar' as they should likely be marked as a date table.,Table,session_duration_date,⚠️
"Do not use floating point data typesThe ""Double"" floating point data type should be avoided, as it can result in unpredictable roundoff errors and decreased performance in certain scenarios. Use ""Int64"" or ""Decimal"" where appropriate (but note that ""Decimal"" is limited to 4 digits after the decimal sign).",Column,'UserAsks'[response_time_seconds],⚠️
Many-to-many relationships should be single-directionMany-to-many relationships should be single-direction,Relationship,'chat_history'[tool_call_id] -> 'tool_usage'[tool_call_id],⚠️
"Model should have a date tableGenerally speaking, models should generally have a date table. Models that do not have a date table generally are not taking advantage of features such as time intelligence or may not have a properly structured architecture.",Model,Model,⚠️


## Model memory analyzer

**Documentation**: https://learn.microsoft.com/en-us/power-bi/transform-model/service-notebooks#model-memory-analyzer

 Memory Analyzer shows you memory and storage statistics about the objects in your semantic model, such as Tables, Columns, Hierarchies, Partitions, and Relationships. The statistics the Memory Analyzer returns can be used to identify areas of performance optimization and memory reduction for your semantic model.

- Remove any tables and columns that are not part of the Data Agent scope
- Size of the tables and columns can affect the DAX performance and hence will add to the Data Agent response time.



In [4]:
fabric.model_memory_analyzer(dataset=dataset, workspace=workspace)



Unnamed: 0,Dataset Name,Total Size,Table Count,Column Count,Compatibility Level,Default Mode
0,agentic_semantic_model,152.14 KB,12,108,1604,Import

Unnamed: 0,Table Name,Type,Row Count,Total Size,Dictionary Size,Data Size,Hierarchy Size,Relationship Size,User Hierarchy Size,Partitions,Columns,% DB
0,transactions,Table,57,108940,107588,1352,0,0,0,1,8,68.91%
1,users,Table,7,18098,17426,672,0,0,0,1,4,11.45%
2,ContentIssues,Table,0,12712,10264,2448,0,0,0,1,17,8.04%
3,UserAsks,Table,0,4216,3264,952,0,0,0,1,6,2.67%
4,chat_history,Table,228,2992,144,2848,0,0,0,1,20,1.89%
5,session_duration_date,Table,0,2816,2136,680,0,0,0,1,4,1.78%
6,ai_widgets,Table,12,2040,144,1896,0,0,0,1,13,1.29%
7,tool_usage,Table,53,1632,144,1488,0,0,0,1,10,1.03%
8,tool_definitions,Table,6,1496,144,1352,0,0,0,1,9,0.95%
9,accounts,Table,16,1248,144,1104,0,0,0,1,7,0.79%

Unnamed: 0,Table Name,Partition Name,Mode,Record Count,Segment Count,Records per Segment
0,chat_history,chat_history,DirectLake,228,1,228.0
1,transactions,transactions,DirectLake,57,1,57.0
2,tool_usage,tool_usage,DirectLake,53,1,53.0
3,chat_sessions,chat_sessions,DirectLake,24,1,24.0
4,accounts,accounts,DirectLake,16,1,16.0
5,ai_widgets,ai_widgets,DirectLake,12,1,12.0
6,users,users,DirectLake,7,1,7.0
7,tool_definitions,tool_definitions,DirectLake,6,1,6.0
8,agent_definitions,agent_definitions,DirectLake,1,1,1.0
9,ContentIssues,ContentIssues,DirectLake,0,1,0.0

Unnamed: 0,Table Name,Column Name,Type,Data Type,Cardinality,Total Size,Data Size,Dictionary Size,Hierarchy Size,% Table,% DB,Encoding,Is Resident,Temperature,Last Accessed
0,transactions,id,Data,String,1,20524,176,20348,0,18.86%,13.17%,Hash,True,0.0518279900807387,2026-01-05 17:26:01.033333
1,transactions,from_account_id,Data,String,1,17846,168,17678,0,16.40%,11.46%,Hash,True,0.0518117947264693,2026-01-05 17:26:00.690000
2,transactions,to_account_id,Data,String,1,17818,160,17658,0,16.38%,11.44%,Hash,True,0.051808857465878,2026-01-05 17:26:00.626667
3,transactions,category,Data,String,1,17682,168,17514,0,16.25%,11.35%,Hash,True,0.0518147589344055,2026-01-05 17:26:00.753333
4,users,name,Data,String,1,17418,136,17282,0,97.01%,11.18%,Hash,True,0.0518007463150709,2026-01-05 17:26:00.456667
5,transactions,type,Data,String,1,17290,144,17146,0,15.89%,11.10%,Hash,True,0.0517948717938883,2026-01-05 17:26:00.330000
6,transactions,status,Data,String,1,17236,136,17100,0,15.84%,11.06%,Hash,True,0.051791934533297,2026-01-05 17:26:00.266667
7,ContentIssues,jailbreak_detected,Data,String,1,720,136,584,0,5.82%,0.46%,Hash,False,,NaT
8,session_duration_date,session_id,Data,String,1,720,136,584,0,29.03%,0.46%,Hash,False,,NaT
9,ContentIssues,trace_id,Data,String,1,720,136,584,0,5.82%,0.46%,Hash,False,,NaT

Unnamed: 0,Table Name,Column Name,Type,Data Type,Cardinality,Total Size,Data Size,Dictionary Size,Hierarchy Size,% Table,% DB,Encoding,Is Resident,Temperature,Last Accessed
0,transactions,id,Data,String,1,20524,176,20348,0,18.86%,13.17%,Hash,True,0.0518279900807387,2026-01-05 17:26:01.033333
1,transactions,category,Data,String,1,17682,168,17514,0,16.25%,11.35%,Hash,True,0.0518147589344055,2026-01-05 17:26:00.753333
2,transactions,from_account_id,Data,String,1,17846,168,17678,0,16.40%,11.46%,Hash,True,0.0518117947264693,2026-01-05 17:26:00.690000
3,transactions,to_account_id,Data,String,1,17818,160,17658,0,16.38%,11.44%,Hash,True,0.051808857465878,2026-01-05 17:26:00.626667
4,users,name,Data,String,1,17418,136,17282,0,97.01%,11.18%,Hash,True,0.0518007463150709,2026-01-05 17:26:00.456667
5,transactions,type,Data,String,1,17290,144,17146,0,15.89%,11.10%,Hash,True,0.0517948717938883,2026-01-05 17:26:00.330000
6,transactions,status,Data,String,1,17236,136,17100,0,15.84%,11.06%,Hash,True,0.051791934533297,2026-01-05 17:26:00.266667
7,ContentIssues,session_id,Data,String,1,720,136,584,0,5.82%,0.46%,Hash,False,,NaT
8,ContentIssues,trace_id,Data,String,1,720,136,584,0,5.82%,0.46%,Hash,False,,NaT
9,ContentIssues,user_id,Data,String,1,720,136,584,0,5.82%,0.46%,Hash,False,,NaT

Unnamed: 0,From Object,To Object,Multiplicity,Max From Cardinality,Max To Cardinality,Used Size,Missing Rows
0,'chat_history'[session_id],'chat_sessions'[session_id],m:1,1,1,0,0
1,'chat_history'[agent_id],'agent_definitions'[agent_id],m:1,1,1,0,0
2,'chat_history'[tool_id],'tool_definitions'[tool_id],m:1,1,1,0,0
3,'chat_history'[trace_id],'UserAsks'[trace_id],m:1,1,1,0,0
4,'chat_history'[trace_id],'ContentIssues'[trace_id],m:1,1,1,0,0
5,'UserAsks'[session_id],'session_duration_date'[session_id],m:1,1,1,0,0
6,'chat_history'[tool_call_id],'tool_usage'[tool_call_id],m:m,1,1,0,0
7,'UserAsks'[user_id],'users'[id],m:1,1,1,0,0
8,'session_duration_date'[user_id],'users'[id],m:1,1,1,0,0
9,'ai_widgets'[user_id],'users'[id],m:1,1,1,0,0

Unnamed: 0,Table Name,Hierarchy Name,Used Size


In [17]:
df_columns = fabric.list_columns(dataset=dataset, workspace=workspace)[['Table Name', 'Column Name', 'Description']]
df_tables = fabric.list_tables(dataset=dataset, workspace=workspace)[['Name', 'Description']]
df_measures = fabric.list_measures(dataset=dataset, workspace=workspace)[['Measure Name','Measure Expression','Measure Description']]


Unnamed: 0,Measure Name,Measure Expression,Measure Description
0,trace_with_tool,CALCULATE(DISTINCTCOUNT(chat_history[trace_id]...,
1,high_value_tasks,DISTINCTCOUNT(tool_usage[trace_id])/DISTINCTCO...,
2,Total Transaction Amount,SUM ( 'transactions'[amount] ),
3,Total Outgoing Amount,"CALCULATE ( SUM ( 'transactions'[amount] ), FI...",
4,Total Incoming Amount,"CALCULATE ( SUM ( 'transactions'[amount] ), US...",
5,Net Cash Flow,[Total Incoming Amount] - [Total Outgoing Amount],
6,Number of Transactions,COUNTROWS ( 'transactions' ),
7,Average Transaction Amount,"DIVIDE ( [Total Transaction Amount], [Number o...",
8,Completed Transaction Amount,"CALCULATE ( [Total Transaction Amount], 'trans...",
9,Pending Transaction Amount,"CALCULATE ( [Total Transaction Amount], 'trans...",


## Table Description

Table description helps AI get context about the purpose of the table and how to use it for DAX query generation. Tables selected in Prep for AI > AI Data Schema should have clear and concise description to help the AI.


Below is the list of tables without description:


In [27]:
display(df_tables[df_tables['Description'].isna() | (df_tables['Description'] == '')])

## Column Names and Descriptions

Column names and descriptions help the DAX generation tool understand the purpose and contents of each column, especially when names are ambiguous. Clear names and descriptions reduce misinterpretation and improve response accuracy.

Below are duplicate column names in the model. If these columns are part of the Data Agent schema, consider renaming them or adding descriptions to clarify their purpose. Also identify columns that are similar, e.g. `region` and `geography` may seem related but without descriptions clarifying the difference, the AI may misinterpret them.



#### Columns without description

In [26]:
display(df_columns[df_columns['Description'].isna() | (df_columns['Description'] == '')])

#### Duplicate column names

In [37]:
## Either rename columns to distinguish them or add descriptions if you cannot rename
## Id/key columns can be left as is. Focus on columns that are used in metric calculations

counts = df_columns['Column Name'].value_counts()
counts[counts > 1]

Column Name
user_id                   7
session_id                6
created_at                6
trace_id                  4
id                        4
name                      4
updated_at                3
description               3
content                   3
agent_id                  3
tool_id                   3
tool_name                 2
tool_call_id              2
status                    2
message_type              2
content_filter_results    2
title                     2
tool_input                2
date                      2
tool_output               2
Name: count, dtype: int64

## Measures Description

Below measures do not have description. Measures that are included in the AI Schema should have description to improve response accuracy.

In [39]:
display(df_measures[df_measures['Measure Description'].isna() | (df_measures['Measure Description'] == '')])

## Using Data Agent Python SDK

The Fabric Data Agent Python SDK library provides programmatic access to Fabric Data Agent artifacts. The SDK is designed for code-first users, and it simplifies the creation, management, and use of Fabric data agents within Microsoft Fabric notebooks. It offers a set of straightforward APIs to integrate and manage data sources, automate workflow operations, and interact with the Fabric Data Agent, based on the OpenAI Assistants API within Microsoft Fabric notebook.

**Note that for semantic model data sources, you cannot use the SDK to update the `Prep for AI` configuration**

- Documentation : https://learn.microsoft.com/en-us/fabric/data-science/fabric-data-agent-sdk
 - Notebooks : https://github.com/microsoft/fabric-samples/tree/main/docs-samples/data-science/data-agent-sdk



In [3]:
from fabric.dataagent.client import FabricDataAgentManagement

data_agent = FabricDataAgentManagement(data_agent_name)

##### Get existing configuration

In [4]:
data_agent.get_configuration()

DataAgentConfiguration(instructions='\r\n\r\nAlways breakdown the question into segments to think step by step. Return the reasoning logic in the answer.\r\n\r\nToday is December 31, 2025\r\n')

##### Get list of data sources

List of data sources available and selected. Use `data_agent.get_datasources()` to get all data sources added. 

In the below example, first data source is inspected using `[0]`. It shows all the tables and tables with `*` ate the selected tables.

In [5]:
datasource = data_agent.get_datasources()[0]
datasource.pretty_print()

 accounts *
  | account_number
  | account_type
  | Accounts per User
  | Average Account Balance
  | balance
  | created_at
  | Credit Utilization
  | id
  | name
  | Number of Accounts
  | Total Account Balance
  | Total Balance - Checking
  | Total Balance - Credit
  | Total Balance - Savings
  | user_id
 chat_history
  | agent_id
  | completion_tokens
  | content
  | content_filter_results
  | finish_reason
  | message_id
  | message_type
  | model_name
  | prompt_tokens
  | response_time_ms
  | session_id
  | tool_call_id
  | tool_id
  | tool_input
  | tool_name
  | tool_output
  | total_tokens
  | trace_end
  | trace_id
  | trace_with_tool
  | user_id
 chat_sessions
  | created_at
  | high_value_tasks
  | session_id
  | title
  | updated_at
  | user_id
 ContentIssues
  | agent_id
  | content
  | content_filter_results
  | hate_filtered
  | hate_severity
  | jailbreak_detected
  | jailbreak_filtered
  | message_type
  | self_harm_filtered
  | self_harm_severity
  | session_id
  | 

'agentic_semantic_model'

##### Tables part of AI Schema

Below you can see the list of tables that are selected and will be used in the Data Agent.

In [6]:
datasource = data_agent.get_datasources()[0]
config = datasource.get_configuration()
selected_tables = [t for t in config['elements'] if t['is_selected'] == True]

print('> Selected tables')
for table in selected_tables:
    print("-- " + table['display_name'])

> Selected tables
-- accounts
-- transactions
-- users


##### List all columns and measures selected

In [7]:
import pandas as pd
config = datasource.get_configuration()

rows = []
for table in config['elements']:
    if table['is_selected']:
        for child in table['children']:
            rows.append({
                'Table': table['display_name'],
                'Column/Measure': child['display_name'],
                'Type': child['type'].split('.')[-1],
                'Data Type': child.get('data_type'),
                'Description': child.get('description')
            })

df_selected = pd.DataFrame(rows)
df_selected

Unnamed: 0,Table,Column/Measure,Type,Data Type,Description
0,accounts,account_number,column,String,
1,accounts,account_type,column,String,
2,accounts,Accounts per User,measure,Double,
3,accounts,Average Account Balance,measure,Decimal,
4,accounts,balance,column,Decimal,
5,accounts,created_at,column,DateTime,
6,accounts,Credit Utilization,measure,Decimal,
7,accounts,id,column,String,
8,accounts,name,column,String,
9,accounts,Number of Accounts,measure,Int64,


In [4]:
import time
import uuid
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

from fabric.dataagent.client import FabricOpenAI


def _as_thread_id(x):
    """Normalize whatever the SDK returns into a plain thread_id string."""
    if isinstance(x, str):
        return x
    if hasattr(x, "id"):
        return x.id
    if isinstance(x, dict) and "id" in x:
        return x["id"]
    if isinstance(x, (tuple, list)) and len(x) > 0:
        return _as_thread_id(x[0])
    raise TypeError(f"Unknown thread return type: {type(x)} value={x!r}")


def ask_dataagent(
    fabric_client: FabricOpenAI,
    assistant_id: str,
    question: str,
    *,
    thread_id: str | None = None,
    thread_tag: str | None = None,
    new_thread: bool = False,
    poll_interval_s: float = 2.0,
    poll_timeout_s: float = 300.0,
    steps_retry: int = 10,
    steps_retry_sleep_s: float = 1.0,
):
    """
    Returns: (answer_text, run_steps, messages, thread_id)

    Thread behavior:
      - If thread_id provided: use it
      - Else if new_thread=True: create new thread via unique tag
      - Else: use get_or_create_thread(tag=thread_tag or "default")
    """

    # thread
    if thread_id:
        tid = thread_id
    else:
        if new_thread:
            tag = thread_tag or f"thread_{uuid.uuid4().hex}"
        else:
            tag = thread_tag or "default"

        t = fabric_client.get_or_create_thread(tag=tag)
        tid = _as_thread_id(t)

    # message
    fabric_client.beta.threads.messages.create(
        thread_id=tid,
        role="user",
        content=question,
    )

    # start run
    run = fabric_client.beta.threads.runs.create(
        thread_id=tid,
        assistant_id=assistant_id,
    )

    # poll run
    terminal = {"completed", "failed", "cancelled", "expired", "incomplete"}
    start = time.time()
    while run.status not in terminal:
        if time.time() - start > poll_timeout_s:
            raise TimeoutError(f"Run timed out after {poll_timeout_s}s (last status={run.status})")
        time.sleep(poll_interval_s)
        run = fabric_client.beta.threads.runs.retrieve(thread_id=tid, run_id=run.id)

    if run.status != "completed":
        last_error = getattr(run, "last_error", None)
        incomplete_details = getattr(run, "incomplete_details", None)
        raise RuntimeError(f"Run ended with status={run.status} last_error={last_error} incomplete_details={incomplete_details}")

    # get messages
    messages_obj = fabric_client.beta.threads.messages.list(thread_id=tid, order="asc")
    messages = getattr(messages_obj, "data", messages_obj)

    # extract answer text
    answer_text = None
    for m in reversed(messages):
        if getattr(m, "role", None) == "assistant" and getattr(m, "content", None):
            parts = []
            for block in m.content:
                if getattr(block, "text", None) and getattr(block.text, "value", None):
                    parts.append(block.text.value)
            if parts:
                answer_text = "\n".join(parts)
                break

    # run steps
    run_steps_obj = None
    steps_data = None
    for _ in range(steps_retry):
        run_steps_obj = fabric_client.beta.threads.runs.steps.list(thread_id=tid, run_id=run.id)
        steps_data = getattr(run_steps_obj, "data", None)
        if steps_data:
            break
        time.sleep(steps_retry_sleep_s)

    return answer_text, run_steps_obj, messages, tid



fabric_client = FabricOpenAI(artifact_name=data_agent_name)
assistant = fabric_client.beta.assistants.create(model="gpt-4o")

answer, run_steps, messages, tid = ask_dataagent(
    fabric_client,
    assistant_id=assistant.id,
    question="Which accounts have the highest credit utilization, and what is the total outstanding credit?",
    thread_tag="store_042",
    new_thread=False,
)


print("\nANSWER:\n", answer)

steps = getattr(run_steps, "data", []) or []



ANSWER:
 The accounts with the highest credit utilization are:

- Student Credit Card: 1200
- Business Credit Card: 750
- Platinum Credit Card: 490

The total outstanding credit across all credit accounts is 2,440.

If you need information about more accounts or further breakdowns, let me know!


<u>**Note**</u> : `semantic link` library can be used to execute DAX queries in a notebook which can be helpful for programmatic evaluation & execution. See example below:



In [8]:
fabric.evaluate_dax(dataset=dataset, workspace=workspace, dax_string= f"EVALUATE TOPN(1, transactions)")

Unnamed: 0,transactions[id],transactions[from_account_id],transactions[to_account_id],transactions[amount],transactions[type],transactions[category],transactions[status],transactions[created_at]
0,txn_fcfa2c51-91d1-4159-a0ae-f8e75b3d3ca0,acc_862a10b2-fb81-4022-b141-1da4bd0061e9,,1000.0,payment,Groceries,completed,2025-12-04 19:04:56.233333


##### Programmatic Data Agent Response

If you have a list of questions identified during your scoping exercise, you can automate the process of generating responses from the Data Agent for evaluation. You can also download the `df_questions` dataframe as a CSV to collaborate with stakeholders during the evaluation exercise. It is strongly recommended that you use this type of programmatic evaluation to iteratively assess, refine, and optimize the Data Agent throughout development and beyond. 

See [Evaluate a Fabric Data Agent notebook](https://github.com/microsoft/fabric-samples/blob/main/docs-samples/data-science/data-agent-sdk/Fabric-DataAgent-Evaluation-sample.ipynb) for more details and examples.

In [20]:
import pandas as pd

df_questions = pd.DataFrame({
    "questions": [
        "What is the net cash flow per user, and who has the highest net cash flow?",
        "How much are customers spending by category (and which categories are highest)?",
        "What is the total account balance breakdown by account type (checking vs savings vs credit)?",
        "How many active users are there, and what is the average transaction amount for their activity?",
        "What are the totals for completed vs pending vs failed transactions (amount and count)?",
        "Which accounts have the highest credit utilization, and what is the total outstanding credit?",
    ]
})

THREAD_TAG = "batch_context_001"
responses = []

for q in df_questions["questions"]:
    answer, _, _, thread_id = ask_dataagent(
        fabric_client,
        assistant_id=assistant.id,
        question=q,
        thread_tag=THREAD_TAG,
        new_thread=False,
    )
    responses.append(answer)

df_questions["data_agent_response"] = responses

display(df_questions[["questions", "data_agent_response"]])
