# Query a Fabric Data Agent from a Notebook (Preview)

This notebook shows how to call your **published Fabric Data Agent** programmatically using the OpenAI Assistants-style API.

> **Replace `PUBLISHED_URL` below** with the value you see under your agent **Settings ‚Üí Published URL**.

**What this notebook does**
- Authenticates with your Fabric workspace using your notebook identity (AAD bearer).
- Calls your **published Data Agent** via the OpenAI SDK.
- Polls with a timeout (prevents infinite loops).
- Cleans up created threads to conserve capacity.


## 1) Install dependencies (safe to re-run)

In [20]:
%pip install "openai==1.70.0"
%pip install "synapseml==1.0.5"
%pip install pandas tqdm

^C
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.




Note: you may need to restart the kernel to use updated packages.




ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
agent-framework-core 1.0.0b251028 requires aiofiles>=24.1.0, which is not installed.
agent-framework-core 1.0.0b251028 requires openai<2,>=1.99.0, but you have openai 1.70.0 which is incompatible.
mem0ai 1.0.0 requires openai>=1.90.0, but you have openai 1.70.0 which is incompatible.
semantic-kernel 1.28.1 requires pydantic!=2.10.0,!=2.10.1,!=2.10.2,!=2.10.3,<2.12,>=2.0, but you have pydantic 2.12.3 which is incompatible.


Collecting openai==1.70.0
  Using cached openai-1.70.0-py3-none-any.whl.metadata (25 kB)
Using cached openai-1.70.0-py3-none-any.whl (599 kB)
Installing collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 1.109.1
    Uninstalling openai-1.109.1:
      Successfully uninstalled openai-1.109.1
Successfully installed openai-1.70.0


## 2) Configure the client

In [23]:
# pip install azure-identity openai==1.70.0

# ---- REQUIRED: paste your Published URL here ----
PUBLISHED_URL = "https://msitapi.fabric.microsoft.com/v1/workspaces/00ae18cb-e789-4d42-be8d-a5b47e524e22/aiskills/1e363010-6005-48da-b370-7906177a760e/aiassistant/openai"

import typing as t
import time, uuid

from azure.identity import InteractiveBrowserCredential
from openai import OpenAI
from openai._models import FinalRequestOptions
from openai._types import Omit
from openai._utils import is_given

# ---------- Dev sign-in ----------
# Opens a browser once; caches token locally
SCOPE = "https://api.fabric.microsoft.com/.default"
# If you see 401/403, swap to:
# SCOPE = "https://analysis.windows.net/powerbi/api/.default"

_cred = InteractiveBrowserCredential()


def _get_bearer() -> str:
    return _cred.get_token(SCOPE).token


class FabricOpenAI(OpenAI):
    """
    OpenAI client wrapper that:
      - Uses your Fabric Data Agent Published URL as base_url
      - Injects AAD Bearer token and correlation id
      - Pins 'api-version' as query param
    """

    def __init__(
        self, api_version: str = "2024-05-01-preview", **kwargs: t.Any
    ) -> None:
        self.api_version = api_version
        default_query = kwargs.pop("default_query", {})
        default_query["api-version"] = self.api_version
        super().__init__(
            api_key="",  # not used
            base_url=PUBLISHED_URL,  # IMPORTANT: your agent endpoint
            default_query=default_query,
            **kwargs,
        )

    def _prepare_options(self, options: FinalRequestOptions) -> None:
        headers: dict[str, str | Omit] = (
            {**options.headers} if is_given(options.headers) else {}
        )
        headers["Authorization"] = f"Bearer {_get_bearer()}"
        headers.setdefault("Accept", "application/json")
        headers.setdefault("ActivityId", str(uuid.uuid4()))
        options.headers = headers
        return super()._prepare_options(options)


client = FabricOpenAI()
print("Client configured. You're signed in with InteractiveBrowserCredential().")

Client configured. You're signed in with InteractiveBrowserCredential().


## 3) Helper to ask the Data Agent

In [24]:
import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)

def ask_data_agent(
    question: str, poll_interval_sec: int = 2, timeout_sec: int = 300
) -> str:
    """
    Sends a question to the published Fabric Data Agent and returns the text reply.
    Cleans up the thread after completion.
    """
    try:
        # Create "assistant" placeholder (model is ignored by Fabric agent)
        assistant = client.beta.assistants.create(model="not-used")

        # Create a new thread for this Q&A
        thread = client.beta.threads.create()

        try:
            # Post the user message
            client.beta.threads.messages.create(
                thread_id=thread.id,
                role="user",
                content=question,
            )

            # Start a run (the data agent actually does the work)
            run = client.beta.threads.runs.create(
                thread_id=thread.id, assistant_id=assistant.id
            )

            # Poll until terminal state or timeout
            terminal = {"completed", "failed", "cancelled", "requires_action"}
            start = time.time()
            while run.status not in terminal:
                if time.time() - start > timeout_sec:
                    raise TimeoutError(
                        f"Run polling exceeded {timeout_sec}s (last status={run.status})"
                    )
                time.sleep(poll_interval_sec)
                run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)

            if run.status != "completed":
                # Get detailed error information
                error_msg = f"‚ùå Run ended: {run.status}"
                if hasattr(run, 'last_error') and run.last_error:
                    error_code = getattr(run.last_error, 'code', 'unknown')
                    error_message = getattr(run.last_error, 'message', 'No details')
                    error_msg += f"\n   Error Code: {error_code}\n   Error Message: {error_message}"
                    
                    # Add helpful context for common errors
                    if "MWC token" in error_message or "NL2Ontology" in error_message:
                        error_msg += "\n\nüí° This is a Fabric Data Agent configuration issue."
                        error_msg += "\n   The agent endpoint exists but has internal setup problems."
                        error_msg += "\n   Please check:"
                        error_msg += "\n   - Agent is properly published in Fabric workspace"
                        error_msg += "\n   - Data sources are configured and accessible"
                        error_msg += "\n   - Workspace permissions are correct"
                    elif "EntityNotFound" in error_message or "404" in error_message:
                        error_msg += "\n\nüí° The Fabric Data Agent endpoint doesn't exist or was deleted."
                        error_msg += "\n   Please verify the Published URL in your Fabric workspace."
                        
                return error_msg

            # Collect messages in ascending order and concatenate text parts
            msgs = client.beta.threads.messages.list(thread_id=thread.id, order="asc")
            out_chunks = []
            for m in msgs.data:
                if m.role == "assistant":
                    for c in m.content:
                        if getattr(c, "type", None) == "text":
                            out_chunks.append(c.text.value)
            return "\n".join(out_chunks).strip() or "[No text content returned]"

        finally:
            # Always attempt cleanup
            try:
                client.beta.threads.delete(thread_id=thread.id)
            except Exception:
                pass
                
    except Exception as e:
        return f"‚ùå Exception during agent query: {type(e).__name__}: {str(e)}"


print("‚úÖ Helper ready: call ask_data_agent('your question')")
print("   Deprecation warnings suppressed for cleaner output.")

‚úÖ Helper ready: call ask_data_agent('your question')


## 4) Quick sanity tests

In [25]:
print(ask_data_agent("For NorthCoast Air and SkyBridge Airlines, provide the total number of flights and the total kilometers flown in the last year. Give the results in order, with airline names, total flights, and total kilometers."))

Here are the results for the last year:

1. SkyBridge Airlines ‚Äî Total Flights: 208, Total Kilometers: 507,971 km
2. NorthCoast Air ‚Äî Total Flights: 192, Total Kilometers: 492,576 km

The numbers are ordered by airline name, with total flights and total kilometers shown for each.


In [27]:
print(ask_data_agent("what are the most repeated routes for SkyBridge Airlines and NorthCoast Airlines in the last year?"))

[No text content returned]


In [4]:
# A. Connectivity / scope test
print(ask_data_agent("What data sources do you have access to?"))

# B. Schema probe (adjust table names to your selections)
print(ask_data_agent("List 10 columns from factinternetsales and 10 from dimcustomer."))

# C. Business question
print(ask_data_agent("Total Internet Sales by year and month; return a small table."))

I have access to data from clinical glucose monitoring studies, specifically focused on comparing the accuracy and reliability of two glucose monitoring products: Product A and Product B. The data includes the following:

- Glucose ranges (in mg/dL): Different glycemic levels for analysis (e.g., hypoglycemia, euglycemia, hyperglycemia).
- MARD percentages: Mean Absolute Relative Difference, a key accuracy metric for glucose monitors.
- Accuracy within ¬±20 mg/dL or ¬±20%: The percentage of sensor readings that are within ¬±20 mg/dL or ¬±20% of reference (lab) values.
- Total readings: Number of glucose readings analyzed per product in each glucose range.

The dataset comes from referenced clinical studies comparing CGM product performance and is structured to allow evaluation of product accuracy across all critical glucose ranges.

If you have questions about product accuracy, performance in specific glucose ranges, comparative analysis, or clinically relevant implications, I can retri

In [6]:
print(
    ask_data_agent(
        "In which glucose ranges does Product A underperform compared to Product B, and what clinical impact could this have?"
    )
)

Product A underperforms compared to Product B in all glucose ranges evaluated. Specifically:

- In each glucose range, Product A has a higher MARD percentage (meaning greater average error) and lower accuracy (fewer readings within ¬±20 mg/dL or ¬±20% of reference) than Product B.
- This underperformance is especially pronounced in the hypoglycemic ranges:
  - For glucose <54 mg/dL: Product A MARD is 16.3% (vs. 12.2% for Product B), and accuracy is 60.8% (vs. 78.1% for Product B).
  - For glucose 54 to 69 mg/dL: Product A MARD is 13.5% (vs. 9.3% for Product B), and accuracy is 68.7% (vs. 85.4% for Product B).
- Across the hyperglycemic range (>250 mg/dL), Product A also lags: MARD is 10.8% (vs. 7.0% for Product B); accuracy is 80.2% (vs. 94.5% for Product B).

Clinical Impact:
- Inaccurate detection of hypoglycemia (<70 mg/dL) increases the risk of unrecognized and untreated low glucose events, which can lead to severe medical emergencies (e.g., seizures, unconsciousness).
- In hypergl

## 5) Example analytics question

In [None]:
q = """
Calculate the average percentage increase in sales amount for repeat purchases
for every zipcode (repeat = any purchase after the first for that customer).
Return the results in a compact table: zipcode, avg_pct_increase.
"""
print(ask_data_agent(q))