##### **Evaluation Setup**:
* Pip Install Fabric Data Agent SDK
* Load the **DataFrame** with question and expected_answers list.
  * You can update in-cell DataFrame.
  * Or upload the CSV file in "question,expected_answer" format to lakehouse
    * Copy the file path and load the data to DataFrame using pandas.read_csv("<lakehouse_filepath>")
* Invoke the evaluate_data_agent API with data_frame, **data_agent_name**, workspace_name (Optional), table_name (Optional).
  * data_agent_name : Name of the Data Agent
  * workspace_name (Optional) : Workspace Name if Data Agent is in different workspace. Default value is None.
  * table_name (Optional) : Evaluation output table name to store the evaluation result. Default table name is 'evaluation_output'.
    * After evaluation there will be two tables one with provided <table_name> for evaluation output and other with <table_name>_steps for detailed steps.
  * data_agent_stage (Optional) : Data Agent stage i.e., sandbox or production. Default value is production.


#### Install Fabric Data Agent SDK

In [1]:
%pip install -U fabric-data-agent-sdk

##### Load the Dataframe using in cell initialization or input csv file

In [2]:
import pandas as pd

# Create DataFrame with "question,expected_answer". Please update the questions and expected_answers as per the requirement.
df = pd.DataFrame(columns=["question", "expected_answer"],
                  data=[
                    ["show total sales for Canadian Dollar for January 2013", "46,117.30."],
                    ["what is the product with the highest total sales for Canadian Dollar in 2013", "Mountain-200 Black, 42"],
                    ["Total sales outside of US", "19968887.95"],
                    ["which product category had the highest total sales for Canadian Dollar in 2013", "Bikes (Total Sales: 938654.76)"]
                ])

# Load from input CSV file with data in format "question,expected_answer"
# input_file_path = "abfss://AgentEvaluation@dxt-onelake.dfs.fabric.microsoft.com/KaggleDataSetsLH.Lakehouse/Files/datasets/GeoNuclearData.csv"
# df = pd.read_csv(input_file_path)


##### Invoke Evaluation API with input parameters

In [3]:
from fabric.dataagent.evaluation import evaluate_data_agent

# Data Agent name
data_agent_name = "AgentEvaluation"

# Workspace Name (Optional) if Data Agent is in different workspace
workspace_name = None

# Table name (Optional) to store the evaluation result. Default value is 'evaluation_output'
# After evaluation there will be two tables one with provided <table_name> for evaluation output and other with <table_name>_steps for detailed steps.
table_name = "demo_evaluation_output_report"

# Data Agent stage ie., sandbox or production. Default to production.
data_agent_stage = "production"

# Evaluate the Data Agent. Returns the unique id for the evaluation run
evaluation_id = evaluate_data_agent(df, data_agent_name, workspace_name=workspace_name, table_name=table_name, data_agent_stage=data_agent_stage)

print(f"Unique Id for the current evaluation run: {evaluation_id}")

##### Overall summary of an evaluation stored in the input table.
Returns the DataFrame with summary details.

Input Parameters:
* table_name (Optional) : Table name which contains the evaluation result. Default value is 'evaluation_output'
* verbose (Optional) : Flag to display the summary. Default is False.

In [4]:
from fabric.dataagent.evaluation import get_evaluation_summary

# Table name (Optional) to store the evaluation result. Default value is 'evaluation_output'
# After evaluation there will be two tables one with provided <table_name> for evaluation output and other with <table_name>_steps for detailed steps.
table_name = "demo_evaluation_output_report"

get_evaluation_summary(table_name)

##### Evaluation details of a single run
Returns the DataFrame with evaluation details.

Input Parameters:
* evaluation_id : Unique Id for an evaluation run.
* table_name (Optional) : Table name which contains the evaluation result. Default value is 'evaluation_output'.
* get_all_rows (Optional) : Flag to get all the rows for an evaluation. Default value is False, which returns only failed evaluation rows.
* Verbose (Optional) : Flag to display the summary. Default is False.

**Note**: The thread url in the evaluation details is only accessible by person who ran the evaluation.

In [6]:
from fabric.dataagent.evaluation import get_evaluation_details

# Unique Id for an evaluation run
evaluation_id = 'd1621f67-7948-4b24-8d6b-79aa8ebf4464'
# Evaluation output table name
table_name = "demo_evaluation_output_report"
# Flag to get all the rows for an evaluation. Default value is False, which returns only failed evaluation rows.
get_all_rows = False
# Flag to display the summary. Default is False.
verbose = True

eval_details = get_evaluation_details(evaluation_id, table_name, get_all_rows=get_all_rows, verbose=verbose)

### Advanced Options

##### Use customized prompt for evaluation
* critic_prompt (Optional): Prompt (Optional) to evaluate the actual answer from Data Agent. 
  * Please use the variables **query, expected_answer and actual_answer** as placeholders.

In [8]:
from fabric.dataagent.evaluation import evaluate_data_agent

# Prompt (Optional) to evaluate the actual response. Please use the varaibles query, expected_answer and actual_answer as placeholders
critic_prompt = """
        Given the following query, expected answer, and actual answer, please determine if the actual answer is equivalent to expected answer. If they are equivalent, respond with 'yes'.

        Query: {query}

        Expected Answer:
        {expected_answer}

        Actual Answer:
        {actual_answer}

        Is the actual answer equivalent to the expected answer?
        """

# Data Agent name
data_agent_name = "AgentEvaluation"

# Evaluate the Data Agent. Returns the unique id for the evaluation run
evaluation_id = evaluate_data_agent(df, data_agent_name, critic_prompt=critic_prompt)

In [11]:
from fabric.dataagent.evaluation import get_evaluation_details

# Unique Id for an evaluation run
evaluation_id = '4e725e05-5b72-493f-b849-d8787decc188'
# Evaluation output table name
table_name = "evaluation_output"
# Flag to get all the rows for an evaluation. Default value is False, which returns only failed evaluation rows.
get_all_rows = False
# Flag to display the summary. Default is False.
verbose = True

eval_details = get_evaluation_details(evaluation_id, table_name, get_all_rows=get_all_rows, verbose=verbose)