### Fabric Data Agent Client Implementation

### Introduction
This notebook demonstrates how to set up and manage a Fabric Data Agent to query and analyze data. Through step-by-step instructions and examples, you will learn how to install necessary packages, create or fetch an existing Data Agent, connect datasources, and run queries for insights. This will help you or your team quickly get started with data-driven decision-making in Fabric. 

### Installation and Prerequisites
- A Microsoft Fabric environment or subscription is required.
- Fabric Capacity: F64 (or higher)  
- Tenant Switches: Enable AI skill, Copilot, cross-geo processing, and cross-geo storage. 
- Data Sources: Warehouse, Lakehouse, Power BI semantic models, KQL databases.
- Ensure the Python package "fabric-data-agent-sdk" is installed (as shown in the following cell).
- You also need "sempy" and other dependencies from "fabric" to leverage Fabric Data Agent features. These can be preinstalled in your Fabric environment.

In [1]:
%pip install fabric-data-agent-sdk

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 7, Finished, Available, Finished)

Collecting fabric-data-agent-sdk
  Downloading fabric_data_agent_sdk-0.1.8a0-py3-none-any.whl.metadata (4.7 kB)
Collecting openai>=1.57.0 (from fabric-data-agent-sdk)
  Downloading openai-1.93.0-py3-none-any.whl.metadata (29 kB)
Collecting httpx==0.27.2 (from fabric-data-agent-sdk)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting semantic-link-labs==0.9.10 (from fabric-data-agent-sdk)
  Downloading semantic_link_labs-0.9.10-py3-none-any.whl.metadata (26 kB)
Collecting azure-kusto-data>=4.5.0 (from fabric-data-agent-sdk)
  Downloading azure_kusto_data-5.0.4-py2.py3-none-any.whl.metadata (4.2 kB)
Collecting markdown2==2.5.3 (from fabric-data-agent-sdk)
  Downloading markdown2-2.5.3-py3-none-any.whl.metadata (2.1 kB)
Collecting httpcore==1.* (from httpx==0.27.2->fabric-data-agent-sdk)
  Downloading httpcore-1.0.9-py3-none-any.whl.metadata (21 kB)
Collecting anytree (from semantic-link-labs==0.9.10->fabric-data-agent-sdk)
  Downloading anytree-2.13.0-py3-none-any.wh

### Import Data Agent methods and specify Data Agent name

Assign a fabric.dataagent name (previously Data Agent was called AI Skill, and method names will be updated to the same in near future). Import methods to manage, create and delete Data Agent.

In [2]:
# Specify the DataAgent
from fabric.dataagent.client import (
    FabricDataAgentManagement,
    create_data_agent,
    delete_data_agent,
)

data_agent_name = "agent_sample"

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 9, Finished, Available, Finished)

#### New vs Pre-existing Agent
- Data Agent is either created or pointed to a preexisting one.
- If you are creating a new Data Agent, use `create_data_agent`.
- In case of existing agent already created, `create_data_agent` can lead to an error message mentioning conflict in name. In that case, you should use `FabricDataAgentManagement`.
e.g. `data_agent = FabricDataAgentManagement(data_agent_name)`

In [3]:
# create or fetch DataAgent
data_agent = create_data_agent(data_agent_name)
# by default the instructions and description for the Data Agent will be empty, we will update them later in the notebook
data_agent.get_configuration()

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 10, Finished, Available, Finished)

DataAgentConfiguration(instructions=None, user_description=None)

Upon Data Agent creation, you will see a new Data Agent in your workspace. If you are using a pre-existing one, make sure to assign it as the `data_agent_name`.

#### Add User Instructions
You can update configuration by adding instructions and user description. `user_instructions` are same as `model_notes` in UI. Note that these instructions are for Agent and not directly used by the tools the agent uses. In the next update, adding notes to tools will be facilitated. For now, you can specify instructions for the agent itself to use specific instructions for tools. 

In [4]:
user_instructions = """You are an expert analyst. For *any* user question that requires you to query a database, instead of answering it directly,
you should give the user a detailed response around the question, from the *available database added as a datasource*. You should do this by extending
the user question into 3 distinct questions to independently query the database with. Here is an example that shows how to do it:\n If the question is 
"what is the top selling product in 2019?". Expand these questions to gather more information, such as asking "what is the top selling 3 products in 2019"
to learn not only the top one, but how it compares to the others following it. Then ask "what was the top 3 products sold in 2018" to learn about the
previous year. Then ask "what were the top 3 best-selling products across all years?" to learn about the overall response. Then query the database for
each question independently. \nThis way we are learning not only the best selling product, but how it compares with top 3, learn about the previous year,
and learn about all time. After getting the answer for each question, formulate your response to answer the original user question with these additional
details. This will give the user a more comprehensive look at their original query. We gave an example with one sample question but you should follow these
instructions for any user questions that requires a database look up."""

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 11, Finished, Available, Finished)

#### Update and Verify
`update_configuration` will update the AI instructions. You can verify this by calling `get_configuration`

In [5]:
data_agent.update_configuration(
    instructions=user_instructions,
)
data_agent.get_configuration()

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 12, Finished, Available, Finished)

DataAgentConfiguration(instructions='You are an expert analyst. For *any* user question that requires you to query a database, instead of answering it directly,\nyou should give the user a detailed response around the question, from the *available database added as a datasource*. You should do this by extending\nthe user question into 3 distinct questions to independently query the database with. Here is an example that shows how to do it:\n If the question is \n"what is the top selling product in 2019?". Expand these questions to gather more information, such as asking "what is the top selling 3 products in 2019"\nto learn not only the top one, but how it compares to the others following it. Then ask "what was the top 3 products sold in 2018" to learn about the\nprevious year. Then ask "what were the top 3 best-selling products across all years?" to learn about the overall response. Then query the database for\neach question independently. \nThis way we are learning not only the best 

#### Adding Datasources
Use the `add_datasource` method to add relevant datasources to get insights from. In this sample, we are showing how to add a Lakehouse.
Note that, if you already added the lakehouse, you don't need to do it again. We need to check if datasources is already connected.

In [6]:
# add a lakehouse
lakehouse_name = "AdventureWorks"
data_agent.add_datasource(lakehouse_name, type="lakehouse")

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 13, Finished, Available, Finished)

Datasource(11564ef5-8f2a-4bc7-ad9e-004ab3c3c80b)

#### Exploring Data Within the Notebook
- Assign lakehouse data or other datasources added to `datasource` and check for column fields.

In [7]:
# we can check which datasources are added to the Data Agent
data_agent.get_datasources()

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 14, Finished, Available, Finished)

[Datasource(11564ef5-8f2a-4bc7-ad9e-004ab3c3c80b)]

Publish the Data Source

In [8]:
data_agent.publish()

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 15, Finished, Available, Finished)

In [9]:
datasource = data_agent.get_datasources()[0]

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 16, Finished, Available, Finished)

In [10]:
datasource.pretty_print()

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 17, Finished, Available, Finished)

 dbo
  | dimaccount
  |  | AccountKey
  |  | ParentAccountKey
  |  | AccountCodeAlternateKey
  |  | ParentAccountCodeAlternateKey
  |  | AccountDescription
  |  | AccountType
  |  | Operator
  |  | CustomMembers
  |  | ValueType
  | factinternetsales
  |  | ProductKey
  |  | OrderDateKey
  |  | DueDateKey
  |  | ShipDateKey
  |  | CustomerKey
  |  | PromotionKey
  |  | CurrencyKey
  |  | SalesTerritoryKey
  |  | SalesOrderNumber
  |  | SalesOrderLineNumber
  |  | RevisionNumber
  |  | OrderQuantity
  |  | UnitPrice
  |  | ExtendedAmount
  |  | UnitPriceDiscountPct
  |  | DiscountAmount
  |  | ProductStandardCost
  |  | TotalProductCost
  |  | SalesAmount
  |  | TaxAmt
  |  | Freight
  |  | OrderDate
  |  | DueDate
  |  | ShipDate
  | dimemployee
  |  | EmployeeKey
  |  | ParentEmployeeKey
  |  | EmployeeNationalIDAlternateKey
  |  | SalesTerritoryKey
  |  | FirstName
  |  | LastName
  |  | MiddleName
  |  | NameStyle
  |  | Title
  |  | HireDate
  |  | BirthDate
  |  | LoginID
  |  | E

'AdventureWorks'

#### Select particular tables that you need to ask questions about.
- Note that by default, no table is selected. A `*` in front of table indicates selected table.
- You can select the tables using `datasource.select` to pick the tables applicable to the context of the question.
- Calling `datasource.select` without any arguments will enable all data sources for the data agent.
- Selecting a table will also select all columns in the table.

In [11]:
datasource.select("dbo", "dimcustomer")
datasource.select("dbo", "dimproduct")
datasource.select("dbo", "factinternetsales")
datasource.pretty_print()

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 18, Finished, Available, Finished)

 dbo
  | dimaccount
  |  | AccountKey
  |  | ParentAccountKey
  |  | AccountCodeAlternateKey
  |  | ParentAccountCodeAlternateKey
  |  | AccountDescription
  |  | AccountType
  |  | Operator
  |  | CustomMembers
  |  | ValueType
  | factinternetsales *
  |  | ProductKey
  |  | OrderDateKey
  |  | DueDateKey
  |  | ShipDateKey
  |  | CustomerKey
  |  | PromotionKey
  |  | CurrencyKey
  |  | SalesTerritoryKey
  |  | SalesOrderNumber
  |  | SalesOrderLineNumber
  |  | RevisionNumber
  |  | OrderQuantity
  |  | UnitPrice
  |  | ExtendedAmount
  |  | UnitPriceDiscountPct
  |  | DiscountAmount
  |  | ProductStandardCost
  |  | TotalProductCost
  |  | SalesAmount
  |  | TaxAmt
  |  | Freight
  |  | OrderDate
  |  | DueDate
  |  | ShipDate
  | dimemployee
  |  | EmployeeKey
  |  | ParentEmployeeKey
  |  | EmployeeNationalIDAlternateKey
  |  | SalesTerritoryKey
  |  | FirstName
  |  | LastName
  |  | MiddleName
  |  | NameStyle
  |  | Title
  |  | HireDate
  |  | BirthDate
  |  | LoginID
  |  |

'AdventureWorks'

#### Add Data Source Instructions
We will call `update_configuration` on the data source to update data source instructions.

In [13]:
ds_notes = """ \
When answering about a product, make sure to include the Product Name in dimproduct in the answer. 
Best selling product should be determined by sales volume, not sales amount. 
If you answer questions about quantities, make sure to include the quantity. 
"""
datasource.update_configuration(instructions=ds_notes)
datasource.get_configuration()["additional_instructions"]

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 20, Finished, Available, Finished)

' When answering about a product, make sure to include the Product Name in dimproduct in the answer. \nBest selling product should be determined by sales volume, not sales amount. \nIf you answer questions about quantities, make sure to include the quantity. \n'

#### Add IDs to Fabric Client

Now that the Data Agent is created and relevant datasources and tables are selected. We can start with adding the fabric client. Pass the `data_agent_name` to the `FabricOpenAI` class imported and create the `fabric_client` instance.

In [14]:
import sempy.fabric as fabric
from fabric.dataagent.client import FabricOpenAI


fabric_client = FabricOpenAI(artifact_name=data_agent_name)
assistant = fabric_client.beta.assistants.create(model="gpt-4o")
thread = fabric_client.beta.threads.create()

print(assistant.id)
print(thread.id)

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 21, Finished, Available, Finished)

asst_JvNQ3k50sAcAcTRz8FQYc9Yn
thread_6xqBDTYEqH94uJsAOPYzDoTV


#### Example Showing Message Submission and Query to Response Steps
- Message appended to a thread, run creation, checking the status and final response
- Submit message using the create method in messages, pass an example question in content.
- Then, create a run for the particular `thread.id`.

In [15]:
# Create a message to append to our thread
fabric_client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="What was the best selling product by volume in 2013?",
)
run = fabric_client.beta.threads.runs.create(
    thread_id=thread.id, assistant_id=assistant.id
)

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 22, Finished, Available, Finished)

  fabric_client.beta.threads.messages.create(
  run = fabric_client.beta.threads.runs.create(


#### Checking Status
Below cell is to show you how we can check the status of the run. Note that if multiple questions are asked, you will use this to check status of various questions in the interim, as they are in a queue. You can always only have 1x run active per thread.

In [16]:
import time

# Wait for completion
while run.status == "queued" or run.status == "in_progress":
    run = fabric_client.beta.threads.runs.retrieve(
        thread_id=thread.id,
        run_id=run.id,
    )
    time.sleep(2)
print(run.status)

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 23, Finished, Available, Finished)

  run = fabric_client.beta.threads.runs.retrieve(


completed


#### Retrieving and Displaying Results
Once the run is completed, we can retrieve and display the results generated by the Data Agent. The following steps demonstrate how to access and present the response data.


- Response from the agent can be read from the messages list for particular thread id.
- In order to improve readability, we can use a pretty_print function for all the messages.

In [17]:
# Retrieve all the messages added after our last user message
response = fabric_client.beta.threads.messages.list(thread_id=thread.id, order="asc")

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 24, Finished, Available, Finished)

  response = fabric_client.beta.threads.messages.list(thread_id=thread.id, order="asc")


In [18]:
# Pretty printing helper
def pretty_print(messages):
    print("# Messages")
    for m in messages:
        print(f"{m.role}: {m.content[0].text.value}")
    print()

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 25, Finished, Available, Finished)

In [19]:
pretty_print(response)

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 26, Finished, Available, Finished)

# Messages
user: What was the best selling product by volume in 2013?
assistant: To determine the best selling product by volume in 2013, we need to gather some more information. I will break down the question into three distinct queries to get a comprehensive understanding:

1. What were the top 3 best selling products by volume in 2013?
2. What were the top 3 best selling products by volume in 2012 (to compare with 2013)?
3. What are the top 3 best selling products by volume across all years?

I will query the database with these questions now.
assistant: ### Best Selling Products by Volume in 2013
1. **Water Bottle - 30 oz.**: 4080 units sold
2. **Patch Kit/8 Patches**: 3026 units sold
3. **Mountain Tire Tube**: 2926 units sold

### Best Selling Products by Volume in 2012
1. **Mountain-200 Black, 46**: 206 units sold
2. **Mountain-200 Black, 42**: 191 units sold
3. **Mountain-200 Silver, 46**: 182 units sold

### Best Selling Products by Volume Across All Years
1. **Water Bottle - 3

#### Some debugging and verification tools: `run_steps`
We can use `run_steps` to check the data from the run. It can be parsed to separate out specifics such as generation of SQL query or some intermediate answers.


In [20]:
run_steps = fabric_client.beta.threads.runs.steps.list(
    thread_id=thread.id,
    run_id=run.id,
)

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 27, Finished, Available, Finished)

  run_steps = fabric_client.beta.threads.runs.steps.list(


In [None]:
run_steps.data

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 28, Finished, Available, Finished)

[RunStep(id='step_fabcszJg43mz8x0QN5WSv9OfJhuM', assistant_id='asst_JvNQ3k50sAcAcTRz8FQYc9Yn', cancelled_at=None, completed_at=1751321848, created_at=1751321840, expired_at=None, failed_at=None, last_error=None, metadata=None, object='thread.run.step', run_id='run_fab9dJacMm34B9RJoV0dBnTHXrpY', status='completed', step_details=ToolCallsStepDetails(tool_calls=[FunctionToolCall(id='call_LIPrkfBX27gUQfjEoB5Pz4yg', function=Function(arguments='{"datasource_name":"AdventureWorks","datasource_type":"LakehouseTables","datasource_artifact_id":"11564ef5-8f2a-4bc7-ad9e-004ab3c3c80b","datasource_workspace_id":"67422a4f-c3fa-484a-9a48-9a120894e9fc","natural_language_query":"What are the top 3 best selling products by volume across all years?","code":"\\u0060\\u0060\\u0060sql\\nSELECT TOP 3 \\n    p.EnglishProductName, \\n    SUM(f.OrderQuantity) AS TotalOrderQuantity\\nFROM \\n    dbo.factinternetsales f\\nJOIN \\n    dbo.dimproduct p ON f.ProductKey = p.ProductKey\\nGROUP BY \\n    p.EnglishProdu

#### Clean Up
- After all required tasks with the run are over, you can delete the thread. If you want to retain the previous information, you can skip this step.
- If you don't need the Data Agent anymore, you can delete it entirely also.

In [22]:
fabric_client.beta.threads.delete(thread.id)

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 29, Finished, Available, Finished)

  fabric_client.beta.threads.delete(thread.id)


ThreadDeleted(id='thread_6xqBDTYEqH94uJsAOPYzDoTV', deleted=None, object=None, messages=None, metadata=None)

In [23]:
delete_data_agent(data_agent_name)

StatementMeta(, 01f87db7-4ac6-475f-bc8b-241960aa9d6f, 30, Finished, Available, Finished)