### Fabric Data Agent Client Implementation

### Introduction
This notebook demonstrates how to set up and manage a Fabric data agent to query and analyze data. Through step-by-step instructions and examples, you will learn how to install necessary packages, create or fetch an existing data agent, connect data sources, and run queries for insights. This will help you or your team quickly get started with using Fabric data agent, do Q&A over your data that lives in Fabric OneLake and cultivate a culture of data-driven decision-making. 

### Installation and Prerequisites
- A Microsoft Fabric environment or subscription is required.
- Fabric Capacity: A paid F2 or higher Fabric capacity  
- Tenant Switches: Enable Data Agent, Copilot, cross-geo processing, and cross-geo storage. 
- Data Sources: Warehouse, Lakehouse, Power BI semantic models, KQL databases.
- Ensure the Python package "fabric-data-agent-sdk" is installed (as shown in the following cell) in a Fabric Python notebook.
- You also need "sempy" and other dependencies from "fabric" to leverage Fabric Data Agent features. These can be preinstalled in your Fabric environment.- Power BI semantic models via XMLA endpoints tenant switch is enabled for Power BI semantic model data sources.

In [None]:
%pip install fabric-data-agent-sdk

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 7, Finished, Available, Finished)

Processing /lakehouse/default/Files/fabric.dataagent_sdk-0.0.1a0-py3-none-any.whl
Collecting openai>=1.57.0 (from fabric.dataagent-sdk==0.0.1a0)
  Downloading openai-1.65.2-py3-none-any.whl (473 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m473.2/473.2 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx==0.27.2 (from fabric.dataagent-sdk==0.0.1a0)
  Downloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m32.2 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx==0.27.2->fabric.dataagent-sdk==0.0.1a0)
  Downloading httpcore-1.0.7-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.6/78.6 kB[0m [31m46.4 MB/s[0m eta [36m0:00:00[0m
Collecting distro<2,>=1.7.0 (from openai>=1.57.0->fabric.dataagent-sdk==0.0.1a0)
  Downloading distro-1.9.0-py3-none-any.whl (20 kB)
Collecting jiter<1,>=0.4.0 (from openai>=1.57.0->fabri

### Import Data Agent methods and specify Data Agent name

Assign a `fabric.dataagent` name (previously Data Agent was called AI Skill, and method names will be updated to the same in near future). Import methods to manage, create and delete Data Agent.

In [None]:
# Specify the DataAgent
from fabric.dataagent.client import FabricDataAgentManagement, create_data_agent, delete_data_agent

ai_skill_name = "agent_sample"

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 9, Finished, Available, Finished)

#### New vs Pre-existing Agent
- Data Agent is either created or pointed to a preexisting one.
- If you are creating a new Data Agent, use `create_data_agent`.
- In case of existing agent already created, `create_data_agent` can lead to an error message mentioning conflict in name. In that case, you should use `FabricDataAgentManagement`.
e.g. `data_agent = FabricDataAgentManagement(ai_skill_name)`

In [None]:
# create or fetch DataAgent
data_agent = create_data_agent(ai_skill_name)
# by default the instructions and description for the Data Agent will be empty, we will update them later in the notebook
data_agent.get_configuration()

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 10, Finished, Available, Finished)

DataAgentConfiguration(instructions=None, user_description=None)

Upon Data Agent creation, you will see a new Data Agent in your workspace. If you are using a pre-existing one, make sure to assign it as the `ai_skill_name`.

#### Add User Instructions
You can update configuration by adding instructions and user description. `user_instructions` are same as `model_notes` in UI. Note that these instructions are for Agent and not directly used by the tools the agent uses. In the next update, adding notes to tools will be facilitated. For now, you can specify instructions for the agent itself to use specific instructions for tools. 

In [None]:
user_instructions = """You are an expert analyst. For *any* user question that requires you to query a database, instead of answering it directly,
you should give the user a detailed response around the question, from the *available database added as a datasource*. You should do this by extending
the user question into 3 distinct questions to independently query the database with. Here is an example that shows how to do it:\n If the question is 
"what is the top selling product in 2019?". Expand these questions to gather more information, such as asking "what is the top selling 3 products in 2019"
to learn not only the top one, but how it compares to the others following it. Then ask "what was the top 3 products sold in 2018" to learn about the
previous year. Then ask "what were the top 3 best-selling products across all years?" to learn about the overall response. Then query the database for
each question independently. \nThis way we are learning not only the best selling product, but how it compares with top 3, learn about the previous year,
and learn about all time. After getting the answer for each question, formulate your response to answer the original user question with these additional
details. This will give the user a more comprehensive look at their original query. We gave an example with one sample question but you should follow these
instructions for any user questions that requires a database look up."""

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 11, Finished, Available, Finished)

#### Update and Verify
`update_configuration` will update the instructions and also add `user_description` for the Fabric Agent API. You can verify this by calling `get_configuration`

In [None]:
data_agent.update_configuration(
    instructions=user_instructions,
    user_description="Instructions for Fabric Data Agent to assist with insights from the AdventureWorks dataset. ",
)
data_agent.get_configuration()

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 12, Finished, Available, Finished)

DataAgentConfiguration(instructions='You are an expert analyst. For *any* user question that requires you to query a database, instead of answering it directly,\nyou should give the user a detailed response around the question, from the *available database added as a datasource*. You should do this by extending\nthe user question into 3 distinct questions to independently query the database with. Here is an example that shows how to do it:\n If the question is \n"what is the top selling product in 2019?". Expand these questions to gather more information, such as asking "what is the top selling 3 products in 2019"\nto learn not only the top one, but how it compares to the others following it. Then ask "what was the top 3 products sold in 2018" to learn about the\nprevious year. Then ask "what were the top 3 best-selling products across all years?" to learn about the overall response. Then query the database for\neach question independently. \nThis way we are learning not only the best 

#### Adding Datasources
Use the `add_datasource` method to add relevant datasources to get insights from. In this sample, we are showing how to add a Lakehouse.
Note that, if you already added the lakehouse, you don't need to do it again. We need to check if datasources is already connected.

In [None]:
# add a lakehouse
lakehouse_name = "AdventureWorks"
data_agent.add_datasource(lakehouse_name, type="lakehouse")

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 13, Finished, Available, Finished)

Datasource(c379c093-3137-4547-84ca-824299351f9c)

#### Exploring Data Within the Notebook
- Assign lakehouse data or other datasources added to `datasource` and check for column fields.

In [None]:
# we can check which datasources are added to the Data Agent
data_agent.get_datasources()

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 14, Finished, Available, Finished)

[Datasource(c379c093-3137-4547-84ca-824299351f9c)]

Publish the Data Source

In [None]:
data_agent.publish()

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 15, Finished, Available, Finished)

In [None]:
datasource = data_agent.get_datasources()[0]

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 16, Finished, Available, Finished)

In [None]:
datasource.pretty_print()

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 17, Finished, Available, Finished)

 dbo
  | dimaccount
  |  | AccountKey
  |  | ParentAccountKey
  |  | AccountCodeAlternateKey
  |  | ParentAccountCodeAlternateKey
  |  | AccountDescription
  |  | AccountType
  |  | Operator
  |  | CustomMembers
  |  | ValueType
  | factinternetsales
  |  | ProductKey
  |  | OrderDateKey
  |  | DueDateKey
  |  | ShipDateKey
  |  | CustomerKey
  |  | PromotionKey
  |  | CurrencyKey
  |  | SalesTerritoryKey
  |  | SalesOrderNumber
  |  | SalesOrderLineNumber
  |  | RevisionNumber
  |  | OrderQuantity
  |  | UnitPrice
  |  | ExtendedAmount
  |  | UnitPriceDiscountPct
  |  | DiscountAmount
  |  | ProductStandardCost
  |  | TotalProductCost
  |  | SalesAmount
  |  | TaxAmt
  |  | Freight
  |  | OrderDate
  |  | DueDate
  |  | ShipDate
  | dimemployee
  |  | EmployeeKey
  |  | ParentEmployeeKey
  |  | EmployeeNationalIDAlternateKey
  |  | SalesTerritoryKey
  |  | FirstName
  |  | LastName
  |  | MiddleName
  |  | NameStyle
  |  | Title
  |  | HireDate
  |  | BirthDate
  |  | LoginID
  |  | E

#### Select particular tables that you need to ask questions about.
- Note that by default, no table is selected. A `*` in front of table indicates selected table.
- You can select the tables using `datasource.select` to pick the right tables or all tables related to the context of the question.
- Selecting a table will also select all columns in the table.

In [None]:
datasource.select("dbo", "dimcustomer")
datasource.select("dbo", "dimproduct")
datasource.select("dbo", "factinternetsales")
datasource.pretty_print()

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 18, Finished, Available, Finished)

 dbo
  | dimaccount
  |  | AccountKey
  |  | ParentAccountKey
  |  | AccountCodeAlternateKey
  |  | ParentAccountCodeAlternateKey
  |  | AccountDescription
  |  | AccountType
  |  | Operator
  |  | CustomMembers
  |  | ValueType
  | factinternetsales *
  |  | ProductKey
  |  | OrderDateKey
  |  | DueDateKey
  |  | ShipDateKey
  |  | CustomerKey
  |  | PromotionKey
  |  | CurrencyKey
  |  | SalesTerritoryKey
  |  | SalesOrderNumber
  |  | SalesOrderLineNumber
  |  | RevisionNumber
  |  | OrderQuantity
  |  | UnitPrice
  |  | ExtendedAmount
  |  | UnitPriceDiscountPct
  |  | DiscountAmount
  |  | ProductStandardCost
  |  | TotalProductCost
  |  | SalesAmount
  |  | TaxAmt
  |  | Freight
  |  | OrderDate
  |  | DueDate
  |  | ShipDate
  | dimemployee
  |  | EmployeeKey
  |  | ParentEmployeeKey
  |  | EmployeeNationalIDAlternateKey
  |  | SalesTerritoryKey
  |  | FirstName
  |  | LastName
  |  | MiddleName
  |  | NameStyle
  |  | Title
  |  | HireDate
  |  | BirthDate
  |  | LoginID
  |  |

#### Add IDs to Fabric Client

Now that the Data Agent is created and relevant datasources and tables are selected. We can start with adding the fabric client. Pass the `ai_skill_name` to the `FabricOpenAI` class imported and create the `fabric_client` instance.

In [None]:
import sempy.fabric as fabric
from fabric.dataagent.client import FabricOpenAI


fabric_client = FabricOpenAI(artifact_name=ai_skill_name)
assistant = fabric_client.beta.assistants.create(model="gpt-4o")
thread = fabric_client.beta.threads.create()

print(assistant.id)
print(thread.id)

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 19, Finished, Available, Finished)

asst_4ZxQTwBAe8hSULQ7FHzHlwfX
thread_zIqnt8BPaijC9Irh3rXr9shH


#### Example Showing Message Submission and Query to Response Steps
- Message appended to a thread, run creation, checking the status and final response
- Submit message using the create method in messages, pass an example question in content.
- Then, create a run for the particular `thread.id`.

In [None]:
# Create a message to append to our thread
fabric_client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="What was the best selling product by volume in 2013?",
)
run = fabric_client.beta.threads.runs.create(
    thread_id=thread.id, assistant_id=assistant.id
)

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 20, Finished, Available, Finished)

#### Checking Status
Below cell is to show you how we can check the status of the run. Note that if multiple questions are asked, you will use this to check status of various questions in the interim, as they are in a queue. You can always only have 1x run active per thread.

In [None]:
import time
# Wait for completion
while run.status == "queued" or run.status == "in_progress":
    run = fabric_client.beta.threads.runs.retrieve(
        thread_id=thread.id,
        run_id=run.id,
    )
    time.sleep(2)
print(run.status)

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 21, Finished, Available, Finished)

completed


#### Retrieving and Displaying Results
Once the run is completed, we can retrieve and display the results generated by the Data Agent. The following steps demonstrate how to access and present the response data.


- Response from the agent can be read from the messages list for particular thread id.
- In order to improve readability, we can use a pretty_print function for all the messages.

In [None]:
# Retrieve all the messages added after our last user message
response = fabric_client.beta.threads.messages.list(thread_id=thread.id, order="asc")

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 22, Finished, Available, Finished)

In [None]:
# Pretty printing helper
def pretty_print(messages):
    print("# Messages")
    for m in messages:
        print(f"{m.role}: {m.content[0].text.value}")
    print()

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 23, Finished, Available, Finished)

In [None]:
pretty_print(response)

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 24, Finished, Available, Finished)

# Messages
user: What was the best selling product by volume in 2013?
assistant: To answer your question comprehensively, I'll need to gather additional details. Specifically, I will look into:

1. The top 3 best selling products by volume in 2013.
2. The top 3 best selling products by volume in 2012.
3. The top 3 best selling products by volume across all years.

After gathering this information, I can provide a detailed comparison and answer your original question. I'll start by querying these three pieces of information.
assistant: Here is a comprehensive look at the best selling products by volume:

### Top 3 Best Selling Products by Volume in 2013:
1. **Water Bottle - 30 oz.** with a total quantity of 4080 units.
2. **Patch Kit/8 Patches** with a total quantity of 3026 units.
3. **Mountain Tire Tube** with a total quantity of 2926 units.

### Top 3 Best Selling Products by Volume in 2012:
1. **Mountain-200 Black, 46** with a total quantity of 206 units.
2. **Mountain-200 Black, 42

#### Some debugging and verification tools: `run_steps`
We can use `run_steps` to check the data from the run. It can be parsed to separate out specifics such as generation of SQL query or some intermediate answers.


In [None]:
run_steps = fabric_client.beta.threads.runs.steps.list(
    thread_id=thread.id,
    run_id=run.id,
)

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 25, Finished, Available, Finished)

In [None]:
run_steps.data

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 26, Finished, Available, Finished)

[RunStep(id='step_3y7Fr6yFWGwqve5n8qn6FLoC', assistant_id='asst_4ZxQTwBAe8hSULQ7FHzHlwfX', cancelled_at=None, completed_at=1741044324, created_at=1741044322, expired_at=None, failed_at=None, last_error=None, metadata=None, object='thread.run.step', run_id='5864aeeb-47a0-4bc1-8246-99a290c45e6d', status='completed', step_details=MessageCreationStepDetails(message_creation=MessageCreation(message_id='msg_VBpGY52TT0M7j6thtAn4Vhpj'), type='message_creation'), thread_id='thread_zIqnt8BPaijC9Irh3rXr9shH', type='message_creation', usage=Usage(completion_tokens=285, prompt_tokens=1857, total_tokens=2142, prompt_token_details={'cached_tokens': 0}), expires_at=None),
 RunStep(id='step_znxVXb24YnGtGfIyunz6jAoI', assistant_id='asst_4ZxQTwBAe8hSULQ7FHzHlwfX', cancelled_at=None, completed_at=1741044321, created_at=1741044309, expired_at=None, failed_at=None, last_error=None, metadata=None, object='thread.run.step', run_id='5864aeeb-47a0-4bc1-8246-99a290c45e6d', status='completed', step_details=ToolCa

#### Clean Up
- After all required tasks with the run are over, you can delete the thread. If you want to retain the previous information, you can skip this step.
- If you don't need the Data Agent anymore, you can delete it entirely also.

In [None]:
fabric_client.beta.threads.delete(thread.id)

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 27, Finished, Available, Finished)

ThreadDeleted(id='thread_zIqnt8BPaijC9Irh3rXr9shH', deleted=None, object=None, messages=None, metadata=None)

In [None]:
delete_data_agent(ai_skill_name)

StatementMeta(, dcaeaad8-6824-4b3f-b781-84c0269606d7, 28, Finished, Available, Finished)