# Local Development -- evaluators and testing evaluation

Inspiration: https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/evaluate-sdk

## Documentation

Azure AI Evaluation client library for Python<br>
https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/evaluation/azure-ai-evaluation

Evaluate your Generative AI application locally with the Azure AI Evaluation SDK<br>
https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/evaluate-sdk


### Environment setup
python 00_setup.py

## Setup

### Common packages

In [None]:
import os
import dotenv
from pathlib import Path

### Global settings

In [None]:
# Global variables
PRIVATE = False
DATA_DIR = Path("data")
TMP_DIR = Path("tmp")

### Load environment variables

In [None]:
# Import override environment variables from .env file
# or from private.env file if PRIVATE is True
dotenv.load_dotenv('.env' if not PRIVATE else 'private.env', override=True)

### Config dictionaries used by Azure AI SDK

In [None]:
# Configuration for Azure AI Foundry project
azure_ai_project = {
    "subscription_id": os.environ.get("AZURE_SUBSCRIPTION_ID"),
    "resource_group_name": os.environ.get("AZURE_RESOURCE_GROUP_AI"),
    "project_name": os.environ.get("AZURE_AI_FOUNDRY_PROJECT_NAME"),
}

# Configuration for Azure OpenAI model
model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
    "api_key": os.environ.get("AZURE_OPENAI_API_KEY"),
    "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
    "api_version": os.environ.get("AZURE_OPENAI_API_VERSION"),
    "type": "azure_openai"
}

### Azure Credentials

In [None]:
# https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()

## Built-in Evaluators

In [None]:
from azure.ai.evaluation import GroundednessEvaluator

# https://learn.microsoft.com/en-us/python/api/azure-ai-evaluation/azure.ai.evaluation.groundednessevaluator
groundedness_eval = GroundednessEvaluator(model_config)

### Local testing of built-in evaluators

In [None]:
_query_response_data = dict(
    query="Which tent is the most waterproof?",
    context="The Alpine Explorer Tent is the second most water-proof of all tents available.",
    response="The Alpine Explorer Tent is the most waterproof."
)

In [None]:
# Running Groundedness Evaluator on a query and response pair
groundedness_score = groundedness_eval(
    **_query_response_data
)
print(groundedness_score)

## Custom evaluators

### Simple deterministic custom evaluator

In [None]:
from custom.answer_len.answer_length import AnswerLengthEvaluator

answer_length_evaluator = AnswerLengthEvaluator()
answer_length = answer_length_evaluator(answer="What is the speed of light?")

print(answer_length)

### LLM based custom evaluator

In [None]:
# Import your prompt-based custom evaluator
from custom.friendliness.friend import FriendlinessEvaluator

friendliness_evaluator = FriendlinessEvaluator(model_config=model_config)
friendliness_score = friendliness_evaluator(
    response="I will not apologize for my behavior!"
)
friendliness_score

## Use local compute for evaluation and Azure AI Foundry for tracking results 

In [None]:
from azure.ai.evaluation import evaluate

# https://learn.microsoft.com/en-us/python/api/azure-ai-evaluation/azure.ai.evaluation?view=azure-python#functions
result = evaluate(
    data=DATA_DIR / "science-trivia__context_response_feedback_v12.jsonl",

    # Specific evaluators to use
    evaluators={
        "Groundedness": groundedness_eval,
        "Answer_length": answer_length_evaluator,
        "Friendliness": friendliness_evaluator
    },
    
    # Column mapping for each evaluator
    # The column mapping is used to map the columns in your data to the columns expected by the evaluator
    # Skip if using your data uses the default column names expected by the evaluators. 
    # For example, the default column names for the GroundednessEvaluator are "query", "context", and "response"
    evaluator_config={
        "Groundedness": {
            "column_mapping": {
                "query": "${data.query}",
                "context": "${data.context}",
                "response": "${data.response}"
            }, 
        },
        "Answer_length": {
            "column_mapping": {
                "answer": "${data.response}"
            } 
        },
        "Friendliness": {
            "column_mapping": {
                "response": "${data.response}"
            } 
        }

    },
    
    # Provide your Azure AI project information to track your evaluation results in your Azure AI project
    azure_ai_project = azure_ai_project,
    
    # Optionally provide an output path to dump a json of metric summary, row level data and metric and Azure AI project URL
    # output_path=TMP_DIR / "local-eval-result.json"
)

## Check out evaluation in AI Foundry

In [None]:
print("----------------------------------------------------------------")
print("AI project URI: ", result["studio_url"])
print("----------------------------------------------------------------")


## Write evaluated data to file

In [None]:
import json 
import shutil

_file = TMP_DIR / 'science-trivia__context_response_feedback_v12_locally_evaluated.jsonl'

# use os to del _file ignore errors
if _file.exists():
    os.remove(_file)

with open(_file, 'a') as file:
    for item in result['rows']:
        file.write(json.dumps(item) + '\n')        