# Evaluate application using manual data set

## Objective

This lab provides a step-by-step guide on how to application endpoints deployed using manual data set

Documentation about evaluation SDK - [azure-ai-evaluation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/evaluate-sdk)

## Before you begin

### Installation

Install the following packages required to execute this notebook. 

In [None]:
%pip install azure-ai-evaluation
%pip install promptflow-azure
%pip install azure-identity
%pip install --upgrade openai
%pip install marshmallow
%pip install python-dotenv
%pip install azure-ai-evaluation[remote]


### Parameters and imports

We start by load the configuration from .env file created from the previous step. We also print out the config value for validation. 
For simplicity, we use key based authentication however Azure AI SDK also support managed indentity. 
If you hasnt create one please check the [README](README.md)

In [None]:
from dotenv import load_dotenv
from pprint import pprint
import pandas as pd
from azure.identity import DefaultAzureCredential
import os
load_dotenv()

In [None]:
print("Environment variables loaded successfully.")
print(f"{os.environ['AZURE_OPENAI_API_VERSION']}")
print(f"{os.environ['AZURE_OPENAI_DEPLOYMENT']}")
print(f"{os.environ['AZURE_OPENAI_ENDPOINT']}")
print(f"{os.environ['AZURE_OPENAI_KEY']}")
print(f"{os.environ['AZURE_AI_FOUNDRY_RESOURCE_GROUP']}")
print(f"{os.environ['APPLICATION_ENDPOINT']}")
print(f"{os.environ['APPLICATION_KEY']}")


## Target Application

We will use Evaluate API provided by Azure AI Evaluation SDK. It requires a target endpoint or python Function, which handles a call the application endpoint or a LLM inference endpoint.
In this lab we use [application_endpoint.py](application_endpoint.py) to call to a application API.



## Data

Following code reads Json file "manual_data.jsonl" which contains inputs to the application endpoint function. It provides question, context and ground truth on each line. 

In [None]:
df = pd.read_json("manual_data.jsonl", lines=True)
print(df.head())

## Configuration
To use AI Assisted Evaluator, we will an LLM model details as a Judge that can be passed as model config.

In [None]:
import os

model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
    "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
    "api_key": os.environ.get("AZURE_OPENAI_KEY"),
}

To visualise the output, we need to provide Azure AI Project details so that traces and eval results are pushing in the project in Azure AI Studio. NOTE: This is not compulsory to use Azure AI Evaluation SDK. AI Evaluation SDK output the evaluation result so that can be use in CICD pipeline like traditional unit test.

In [None]:
azure_ai_project = {
    "subscription_id": os.environ["AZURE_SUBSCRIPTION_ID"],
    "resource_group_name": os.environ["AZURE_AI_FOUNDRY_RESOURCE_GROUP"],
    "project_name": os.environ["AZURE_AI_FOUNDRY_PROJECT_NAME"],
}

## Run the evaluation

The Following code runs Evaluate API and uses Content Safety and other metric such as Groundedness to evaluate results from different models.

The following are the few parameters required by Evaluate API. 

+   Data file: It represents data file 'manual_data.jsonl' in JSON format. Each line contains question, context and ground truth for evaluators.     

+   Application Target: It is name of python class which can route the calls to specific application endpoints 

+   Evaluators: List of evaluators is provided, to evaluate given prompts (questions) as input and output (answers) from LLM models. 

NOTE: If you have error about access storage account please enable key access for your storage account.
<details>
    <img width="500px" height="500px" src="storageconfiguration.png" alt="Storage Account Configuration" />
</details>

In [None]:
os.environ['PF_LOGGING_LEVEL'] = 'DEBUG'

In [None]:
import pathlib

from azure.ai.evaluation import evaluate
from azure.ai.evaluation import (
    ContentSafetyEvaluator,
    RelevanceEvaluator,
    CoherenceEvaluator,
    GroundednessEvaluator,
    FluencyEvaluator,
    SimilarityEvaluator,
    GroundednessProEvaluator,
    IndirectAttackEvaluator,
)
from application_endpoint import ApplicationEndpoint
from datetime import datetime


content_safety_evaluator = ContentSafetyEvaluator(
    azure_ai_project=azure_ai_project, credential=DefaultAzureCredential()
)
relevance_evaluator = RelevanceEvaluator(model_config)
coherence_evaluator = CoherenceEvaluator(model_config)
groundedness_evaluator = GroundednessEvaluator(model_config)
groundedness_pro_eval = GroundednessProEvaluator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())

fluency_evaluator = FluencyEvaluator(model_config)
similarity_evaluator = SimilarityEvaluator(model_config)
indirect_attack_evaluator = IndirectAttackEvaluator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential())

path = str(pathlib.Path(pathlib.Path.cwd())) + "/manual_data.jsonl"

current_date = datetime.now().strftime("%Y-%m-%d")
evaluation_name = f"Manual-Data-Eval-Run-{current_date}"

results = evaluate(
    evaluation_name=evaluation_name,
    data=path,
    target=ApplicationEndpoint(),
    evaluators={
        "content_safety": content_safety_evaluator,
        "coherence": coherence_evaluator,
        "relevance": relevance_evaluator,
        "groundedness": groundedness_evaluator,
        "fluency": fluency_evaluator,
        "similarity": similarity_evaluator,
        "groundedness_pro": groundedness_pro_eval,
        "indirect_attack": indirect_attack_evaluator,
    },
    azure_ai_project=azure_ai_project,
    evaluator_config={
        "content_safety": {"column_mapping": {"query": "${data.query}", "response": "${target.response}"}},
        "coherence": {"column_mapping": {"response": "${target.response}", "query": "${data.query}"}},
        "relevance": {
            "column_mapping": {"response": "${target.response}", "context": "${data.context}", "query": "${data.query}"}
        },
        "groundedness": {
            "column_mapping": {
                "response": "${target.response}",
                "context": "${data.context}",
                "query": "${data.query}",
            }
        },
        "groundedness_pro": {
            "column_mapping": {
                "response": "${target.response}",
                "context": "${data.context}",
                "query": "${data.query}",
            }
        },
        "indirect_attack": {
            "column_mapping": {
                "response": "${target.response}",
                "query": "${data.query}",
            }
        },
        "fluency": {
            "column_mapping": {"response": "${target.response}", "context": "${data.context}", "query": "${data.query}"}
        },
        "similarity": {
            "column_mapping": {"response": "${target.response}", "context": "${data.context}", "query": "${data.query}"}
        },
    },
)

View the results as output of the Evaluate API

In [None]:
pd.DataFrame(results["rows"])

You can view detail result in AI Project for quality also risk and safety environments.
<img width="60%" height="60%" src="aifoundryevaltab.png" alt="Ai Foundry SDK" />

<img width="60%" height="60%" src="evalrundetail.png" alt="Ai Foundry SDK" />
