# Example Query Validator: Boost Your Data Agent’s Accuracy  

This notebook helps you **validate and improve your Data Agent example queries (few-shot examples)**. These examples pair **natural language questions with SQL queries** to guide the Data Agent in generating more accurate and context-aware SQL and answers.  

## What You’ll Learn  
- How to use the `fabric_data_agent_sdk` to validate and refine your few-shot examples.  
- How to load synthetic data and example sets for testing.  
- How to interpret validation feedback and make improvements that enhance your Data Agent’s output.  

## Why Validation Matters  
Few-shot examples are a powerful way to guide the Data Agent’s behavior. Poorly constructed or invalid examples can reduce performance and accuracy. This notebook walks you through **automated validation and iterative improvement** so you can ensure your examples consistently deliver high-quality results.  

---


## Overview

This notebook demonstrates how to use the `fabric_data_agent_sdk` to validate and get feedback on your few-shot examples. The goal is to help you improve the examples you provide to the Data Agent, so it generates better SQL and answers for your data tasks.

## Step 1: Install the Data Agent SDK
To begin, install the `fabric_data_agent_sdk` package. This SDK provides the validator and feedback tools for your few-shot examples.

You can install from PyPI or a local wheel file:

```python
%pip install -q fabric_data_agent_sdk
# Or install from a local wheel:
# %pip install -q /lakehouse/default/Files/fabric_data_agent_sdk-0.1.12a0-py3-none-any.whl
```

> **Tip:** If you have installation issues, check your environment's internet access or file path.

## What Are Few-Shot Examples?  
Few-shot examples are **curated pairs of natural-language questions and corresponding SQL queries** that teach the Data Agent how to respond to new questions. By providing these examples, you give the agent concrete patterns to follow—improving its ability to generate accurate, schema-aware SQL. Poorly designed examples, however, can confuse the agent and reduce accuracy.  

## Why Validate Few-Shot Examples?  
- **Ensure quality:** Confirm that examples are clear, correct, and representative of real user questions.  
- **Boost performance:** High-quality examples enable the Data Agent to produce better SQL and answers.  
- **Get actionable feedback:** The validator reviews your examples and highlights issues to help you refine them.  

## How This Notebook Helps  
This notebook guides you through using the Data Agent SDK to **validate, analyze, and improve** your few-shot examples. You’ll load sample data and example sets, run validations, review feedback, and iteratively refine your examples—helping you maximize accuracy, consistency, and reliability in the Data Agent’s results.  


In [1]:
%pip install -q fabric_data_agent_sdk 
# %pip install -q /lakehouse/default/Files/fabric_data_agent_sdk-0.1.12a0-py3-none-any.whl

StatementMeta(, bc212d26-5dee-4e9c-916a-2ebbde48d748, 8, Finished, Available, Finished)

Reason for being yanked: Yanked due to conflicts with CVE-2024-35195 mitigation[0m[33m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sentence-transformers 2.0.0 requires sentencepiece, which is not installed.
sentence-transformers 2.0.0 requires torchvision, which is not installed.
dash 2.14.0 requires Flask<2.3.0,>=1.0.4, but you have flask 3.0.0 which is incompatible.
dash 2.14.0 requires Werkzeug<2.3.0, but you have werkzeug 3.0.1 which is incompatible.
tensorflow 2.12.1 requires typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.15.0 which is incompatible.[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m
Note

## Step 2: Load Synthetic Data and Few-Shot Examples

In this step, you’ll load a **synthetic dataset** along with its **few-shot examples** to explore the validation and feedback workflow.  

- **Synthetic Dataset:** A sample sales data table loaded from a CSV file or an existing Lakehouse table.  
- **Few-Shot Examples:** A JSON file containing natural language questions paired with SQL queries.  

Public HTTPS sources used in this notebook:
- CSV: https://synapseaisolutionsa.z13.web.core.windows.net/data/DataAgent/few_shots/demo_sales_data.csv
- JSON: https://synapseaisolutionsa.z13.web.core.windows.net/data/DataAgent/few_shots/benchmark_fewshot_examples.json

You can load the data using either **Spark** or **Pandas**, depending on your environment. Double-check that the file paths or URLs you provide are correct.

```python
# Example: Load CSV into a Spark DataFrame
# df = spark.read.format("csv").option("header","true").load("<path-or-https-url-to-demo_sales_data.csv>")
# df.write.mode("overwrite").format("delta").saveAsTable("sales_data")
```



In [2]:
# Read demo_sales_data.csv from the public HTTPS URL using pandas, then convert to a Spark DataFrame
import pandas as pd
csv_url = "https://synapseaisolutionsa.z13.web.core.windows.net/data/DataAgent/few_shots/demo_sales_data.csv"
pdf = pd.read_csv(csv_url)
# Create a Spark DataFrame from the pandas DataFrame
df = spark.createDataFrame(pdf)
# Persist as a Delta table named 'sales_data' (overwrite if exists)
df.write.mode("overwrite").format("delta").saveAsTable("sales_data")

StatementMeta(, bc212d26-5dee-4e9c-916a-2ebbde48d748, 10, Finished, Available, Finished)

In [3]:
# Read from the table we created above and show a sample
df = spark.sql("SELECT * FROM sales_data LIMIT 1000")
display(df.head(10))

StatementMeta(, bc212d26-5dee-4e9c-916a-2ebbde48d748, 11, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 29ffc04d-855d-4784-8d8c-80babfb855b2)

In [4]:
import requests  # For fetching JSON over HTTPS

# Load the JSON file containing your few-shot examples from the public HTTPS URL
json_url = "https://synapseaisolutionsa.z13.web.core.windows.net/data/DataAgent/few_shots/benchmark_fewshot_examples.json"
resp = requests.get(json_url)
resp.raise_for_status()
data = resp.json()

# Quick preview and transform into the expected examples list
print(type(data))
print(data[:2])  # Preview the first two items to inspect the format

examples = [
    {"natural language": item["question"].strip(),
     "sql": item["sql"].strip()} for item in data
]

print(f"Loaded {len(examples)} examples.")

StatementMeta(, bc212d26-5dee-4e9c-916a-2ebbde48d748, 12, Finished, Available, Finished)

<class 'list'>
[{'question': 'Show me interesting comments for CloudEdge.', 'sql': "SELECT * FROM demo_sales_data WHERE Company = 'CloudEdge' AND LENGTH(Feature) > 10;", 'ground_truth_quality': 'yes'}, {'question': 'Show me the sales for Xla Core in Q1.', 'sql': "SELECT SUM(SalesAmount) FROM demo_sales_data WHERE Product = 'Xla Core';", 'ground_truth_quality': 'no'}]
Loaded 17 examples.


### Prepare evaluation client and libraries

Prereqs for this cell:
- SDK installed (see Step 1).
- Using Fabric built-in LLM endpoint: ensure Fabric authentication is configured for your environment (managed identity, cluster credentials, or Fabric auth). No OpenAI API key is required when using Fabric's built-in LLM.

The cell below imports the evaluation helpers and configures the LLM client to use the Fabric endpoint.

In [5]:
# Import the evaluation module from the Fabric Data Agent SDK.
# This module contains tools to validate and score your few-shot examples.
import fabric.dataagent.evaluation as fewshot_evaluation  

# Import required libraries
import openai  # Used to connect to OpenAI models for evaluation
import pandas as pd  # For working with tabular data and organizing results

# Import specific helper functions from the evaluation module:
# - evaluate_few_shot_examples: runs validation & evaluation of few-shot examples
# - cases_to_dataframe: converts evaluation cases into a Pandas DataFrame
from fabric.dataagent.evaluation import (
    evaluate_few_shot_examples, 
    cases_to_dataframe
)

# Set up the OpenAI client and specify the model to use.
# 'gpt-4o' is the model used for scoring or evaluation.
model_name = 'gpt-4o'

# Set the API version for the OpenAI client.
openai.api_version = "2023-05-15"

# Create an instance of the OpenAI client to pass into the evaluation functions.
llm_client = openai


StatementMeta(, bc212d26-5dee-4e9c-916a-2ebbde48d748, 13, Finished, Available, Finished)

## Step 3: Validate and Get Feedback on Your Few-Shot Examples  

Now that your data and examples are loaded, it’s time to **validate them with the Data Agent SDK**. Running the validator helps you understand how well your examples guide the Data Agent and where improvements are needed.  

When you run the validator, it will:  
- **Check accuracy:** Compare the SQL queries to the expected results for each question.  
- **Flag issues:** Show which examples succeed and which require refinement.  
- **Measure quality:** Calculate a success rate so you can easily track improvements over time.  

```python
result = evaluate_few_shot_examples(
    examples,
    llm_client=llm_client,
    model_name=model_name,
    batch_size=20,
    use_fabric_llm=True
)
print(f"Success rate: {result.success_rate:.2f}% ({result.success_count}/{result.total})")


In [6]:
# Evaluate few-shot examples using the Data Agent SDK.
# This runs validation on your natural-language/SQL pairs and returns a summary of results.
result = evaluate_few_shot_examples(
    examples,              # The list of few-shot examples you loaded and formatted earlier
    llm_client=llm_client, # The OpenAI/Fabric LLM client you configured
    model_name=model_name, # Model to use for evaluation, e.g., 'gpt-4o'
    batch_size=20,         # Number of examples to evaluate per batch
    use_fabric_llm=True    # Whether to use the Fabric LLM wrapper for evaluation
)

# Print out the overall success rate of your examples.
# This shows how many examples passed validation vs. the total tested.
print(f"Success rate: {result.success_rate:.2f}% ({result.success_count}/{result.total})")


StatementMeta(, bc212d26-5dee-4e9c-916a-2ebbde48d748, 14, Finished, Available, Finished)

Success rate: 70.59% (12/17)


## Step 4: Interpret and Improve Your Validation Results  

Once the validator runs, you’ll see which examples **passed** and which **failed**. This breakdown helps you pinpoint where your few-shot examples need refinement.  

- **Success Cases:** These examples produced SQL that matched the expected answers—great references for future patterns.  
- **Failure Cases:** These examples need review. The SQL didn’t match the expected answer, or the question/query pair may be unclear or invalid.  

Use the feedback to **iterate and improve** your few-shot examples. Strengthening weak examples helps the Data Agent generate more accurate SQL and answers over time.  

```python
# Convert success and failure cases to Pandas DataFrames for easy inspection
success_df = cases_to_dataframe(result.success_cases)
failure_df = cases_to_dataframe(result.failure_cases)

# Display results side by side for analysis
display(success_df)
display(failure_df)


In [7]:
# Convert the validator's success and failure cases into Pandas DataFrames
# This makes it easy to inspect the results, filter, and sort
success_df = cases_to_dataframe(result.success_cases)
failure_df = cases_to_dataframe(result.failure_cases)

# Print a label and display the successful examples
print("Success Cases:")
display(success_df)  # Shows examples where the SQL matched the expected answer

# Print a label and display the failed examples
print("Failure Cases:")
display(failure_df)  # Shows examples that need review or improvement


StatementMeta(, bc212d26-5dee-4e9c-916a-2ebbde48d748, 15, Finished, Available, Finished)

Success Cases:


SynapseWidget(Synapse.DataFrame, 3355498f-7538-4b6b-b96b-823b4a3ecbe7)

Failure Cases:


SynapseWidget(Synapse.DataFrame, fbc757c0-c83f-4d8c-8929-f39a70a2a4ef)