# Evaluate application using manual data set

## Objective

This lab provides a step-by-step guide on how to application endpoints deployed using manual data set

Documentation about evaluation SDK - [azure-ai-evaluation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/evaluate-sdk)

## Before you begin

### Installation

Install the following packages required to execute this notebook. 

In [19]:
%pip install azure-ai-evaluation
%pip install promptflow-azure
%pip install azure-identity
%pip install --upgrade openai
%pip install marshmallow
%pip install python-dotenv
%pip install azure-ai-evaluation[remote]


### Parameters and imports

We start by load the configuration from .env file created from the previous step. We also print out the config value for validation. 
For simplicity, we use key based authentication however Azure AI SDK also support managed indentity. 
If you hasnt create one please check the [README](README.md)

**Note:** This notebook uses the shared `.credentials.env` file from the parent directory. 

For Lab4, you'll also need to add these variables to `.credentials.env`:
```
APPLICATION_ENDPOINT="<your-application-endpoint>"
APPLICATION_KEY="<your-application-key>"
```

These will be provided by your instructor or team channel.

In [20]:
from dotenv import load_dotenv
from pprint import pprint
import pandas as pd
from azure.identity import DefaultAzureCredential
import os

# Load from parent directory's .credentials.env file
load_dotenv("../.credentials.env")

True

In [21]:
print("Environment variables loaded successfully.")
# These use the same variable names as the other labs
print(f"API Version: {os.environ.get('AZURE_OPENAI_API_VERSION', 'Not set')}")
print(f"Deployment: {os.environ.get('AZURE_OPENAI_DEPLOYMENT', 'Not set')}")
print(f"Endpoint: {os.environ.get('AZURE_OPENAI_ENDPOINT', 'Not set')}")
print(f"Has API Key: {bool(os.environ.get('AZURE_OPENAI_KEY'))}")
print(f"Resource Group: {os.environ.get('AZURE_RESOURCE_GROUP', 'Not set')}")
print(f"Project Name: {os.environ.get('AZURE_PROJECT_NAME', 'Not set')}")
print(f"Application Endpoint: {os.environ.get('APPLICATION_ENDPOINT', 'Not set - you may need to add this')}")
print(f"Has Application Key: {bool(os.environ.get('APPLICATION_KEY'))}")

## Target Application

We will use Evaluate API provided by Azure AI Evaluation SDK. It requires a target endpoint or python Function, which handles a call the application endpoint or a LLM inference endpoint.
In this lab we use [application_endpoint.py](application_endpoint.py) to call to a application API.



## Data

Following code reads Json file "manual_data.jsonl" which contains inputs to the application endpoint function. It provides question, context and ground truth on each line. 

In [22]:
df = pd.read_json("manual_data.jsonl", lines=True)
print(df.head())

## Configuration
To use AI Assisted Evaluator, we will an LLM model details as a Judge that can be passed as model config.

In [23]:
import os

model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
    "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
    # Use AZURE_OPENAI_API_KEY from .credentials.env (Lab4 README calls it AZURE_OPENAI_KEY)
    "api_key": os.environ.get("AZURE_OPENAI_API_KEY") or os.environ.get("AZURE_OPENAI_KEY"),
}

To visualise the output, we need to provide Azure AI Project details so that traces and eval results are pushing in the project in Azure AI Studio. NOTE: This is not compulsory to use Azure AI Evaluation SDK. AI Evaluation SDK output the evaluation result so that can be use in CICD pipeline like traditional unit test.

In [24]:
azure_ai_project = {
    "subscription_id": os.environ["AZURE_SUBSCRIPTION_ID"],
    "resource_group_name": os.environ["AZURE_RESOURCE_GROUP"],  # Using same variable as other labs
    "project_name": os.environ["AZURE_PROJECT_NAME"],  # Using same variable as other labs
}

## Run the evaluation

The Following code runs Evaluate API and uses Content Safety and other metric such as Groundedness to evaluate results from different models.

The following are the few parameters required by Evaluate API. 

+   Data file: It represents data file 'manual_data.jsonl' in JSON format. Each line contains question, context and ground truth for evaluators.     

+   Application Target: It is name of python class which can route the calls to specific application endpoints 

+   Evaluators: List of evaluators is provided, to evaluate given prompts (questions) as input and output (answers) from LLM models. 

NOTE: If you have error about access storage account please enable key access for your storage account.
<details>
    <img width="500px" height="500px" src="storageconfiguration.png" alt="Storage Account Configuration" />
</details>

In [25]:
os.environ['PF_LOGGING_LEVEL'] = 'DEBUG'

In [26]:
# Check which endpoint mode is being used
# Look at the evaluation cell to see which import is active
print("📋 Endpoint Configuration Check:")
print("=" * 60)

# Check if real endpoint is configured
has_real_endpoint = os.environ.get("APPLICATION_ENDPOINT") and os.environ.get("APPLICATION_KEY")

if has_real_endpoint:
    print("✅ Real Application Endpoint Configured")
    print(f"   Endpoint: {os.environ['APPLICATION_ENDPOINT']}")
    print("\n💡 Make sure to use: from application_endpoint import ApplicationEndpoint")
else:
    print("⚠️  Real endpoint not configured - using MOCK endpoint")
    print("   This is fine for learning and testing!")
    print("\n💡 Make sure to use: from application_endpoint_mock import ApplicationEndpoint")
    print("\n📝 To use a real endpoint later, add to .credentials.env:")
    print("   APPLICATION_ENDPOINT=\"https://your-app.azurewebsites.net/score\"")
    print("   APPLICATION_KEY=\"your-api-key\"")

print("\n" + "=" * 60)

### ⚠️ Application Endpoint Options

You have three options for the application endpoint:

#### Option 1: Use Mock Endpoint (Recommended for Learning)
If you don't have access to the shared endpoint, use the mock version:
- Change the import below from `application_endpoint` to `application_endpoint_mock`
- No need for APPLICATION_ENDPOINT or APPLICATION_KEY in `.credentials.env`
- The mock simulates responses about Microsoft Responsible AI

#### Option 2: Use Shared Endpoint (If Available)
Add to `.credentials.env`:
```bash
APPLICATION_ENDPOINT="https://your-app-endpoint.azurewebsites.net/score"
APPLICATION_KEY="your-application-api-key"
```

#### Option 3: Deploy Your Own RAG Application
See `DEPLOYMENT_GUIDE.md` for instructions on deploying your own RAG endpoint using Azure AI Foundry.

**Current configuration:** Check the import statement in the evaluation cell below.

In [None]:
import pathlib

from azure.ai.evaluation import evaluate
from azure.ai.evaluation import (
    ContentSafetyEvaluator,
    RelevanceEvaluator,
    CoherenceEvaluator,
    GroundednessEvaluator,
    FluencyEvaluator,
    SimilarityEvaluator,
    GroundednessProEvaluator,
    IndirectAttackEvaluator,
)

# OPTION 1 (MOCK): Use this if you don't have a real endpoint
from application_endpoint_mock import ApplicationEndpoint

# OPTION 2 (REAL): Comment out the line above and uncomment this if you have a real endpoint
# from application_endpoint import ApplicationEndpoint

from datetime import datetime


content_safety_evaluator = ContentSafetyEvaluator(
    azure_ai_project=azure_ai_project, credential=DefaultAzureCredential()
)
relevance_evaluator = RelevanceEvaluator(model_config)
coherence_evaluator = CoherenceEvaluator(model_config)
groundedness_evaluator = GroundednessEvaluator(model_config)
groundedness_pro_eval = GroundednessProEvaluator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())

fluency_evaluator = FluencyEvaluator(model_config)
similarity_evaluator = SimilarityEvaluator(model_config)
indirect_attack_evaluator = IndirectAttackEvaluator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())

path = str(pathlib.Path(pathlib.Path.cwd())) + "/manual_data.jsonl"

current_date = datetime.now().strftime("%Y-%m-%d")
evaluation_name = f"Manual-Data-Eval-Run-{current_date}"

results = evaluate(
    evaluation_name=evaluation_name,
    data=path,
    target=ApplicationEndpoint(),
    evaluators={
        "content_safety": content_safety_evaluator,
        "coherence": coherence_evaluator,
        "relevance": relevance_evaluator,
        "groundedness": groundedness_evaluator,
        "fluency": fluency_evaluator,
        "similarity": similarity_evaluator,
        "groundedness_pro": groundedness_pro_eval,
        "indirect_attack": indirect_attack_evaluator,
    },
    # azure_ai_project=azure_ai_project,
    # Add credential for storage access
    credential=DefaultAzureCredential(),
    evaluator_config={
        "content_safety": {"column_mapping": {"query": "${data.query}", "response": "${target.response}"}},
        "coherence": {"column_mapping": {"query": "${data.query}", "response": "${target.response}"}},
        "relevance": {
            "column_mapping": {"query": "${data.query}", "response": "${target.response}", "context": "${data.context}"}
        },
        "groundedness": {
            "column_mapping": {
                "query": "${data.query}",
                "response": "${target.response}",
                "context": "${data.context}",
            }
        },
        "groundedness_pro": {
            "column_mapping": {
                "query": "${data.query}",
                "response": "${target.response}",
                "context": "${data.context}",
            }
        },
        "indirect_attack": {
            "column_mapping": {
                "query": "${data.query}",
                "response": "${target.response}",
            }
        },
        "fluency": {
            "column_mapping": {"query": "${data.query}", "response": "${target.response}"}
        },
        "similarity": {
            "column_mapping": {"response": "${target.response}", "ground_truth": "${data.ground_truth}"}
        },
    },
)

EvaluationException: (UserError) Failed to upload evaluation run to the cloud due to insufficient permission to access the storage. Please ensure that the necessary access rights are granted.
Visit https://aka.ms/azsdk/python/evaluation/remotetracking/troubleshoot to troubleshoot this issue.

View the results as output of the Evaluate API

In [None]:
pd.DataFrame(results["rows"])

You can view detail result in AI Project for quality also risk and safety environments.
<img width="60%" height="60%" src="aifoundryevaltab.png" alt="Ai Foundry SDK" />

<img width="60%" height="60%" src="evalrundetail.png" alt="Ai Foundry SDK" />
