# Cloud Evaluation -- define and execute evaluations 
For continuous evaluation during development and production 

## Documentation

Evaluate your Generative AI application on the cloud with Azure AI Projects SDK (preview)<br>
https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/cloud-evaluation

Some hints from https://carlos.mendible.com/2025/02/27/custom-evaluators-with-ai-foundry/

### Environment setup
python 00_setup.py

## Setup

### Common packages

In [15]:
import os
import dotenv
from pathlib import Path

### Global settings

In [16]:
# Global variables
PRIVATE = False
DATA_DIR = Path("data")
TMP_DIR = Path("tmp")

### Load environment variables

In [17]:
# Import override environment variables from .env file
# or from private.env file if PRIVATE is True
dotenv.load_dotenv('.env' if not PRIVATE else 'private.env', override=True)

True

### Config dictionaries used by Azure AI SDK

In [18]:
# Configuration for Azure AI Foundry project
azure_ai_project = {
    "subscription_id": os.environ.get("AZURE_SUBSCRIPTION_ID"),
    "resource_group_name": os.environ.get("AZURE_RESOURCE_GROUP_AI"),
    "project_name": os.environ.get("AZURE_AI_FOUNDRY_PROJECT_NAME"),
}

# Configuration for Azure OpenAI model
model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
    "api_key": os.environ.get("AZURE_OPENAI_API_KEY"),
    "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
    "api_version": os.environ.get("AZURE_OPENAI_API_VERSION"),
    "type": "azure_openai"
}

### Azure credentials

In [19]:
# https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()

### Get AI Foundry project client

In [20]:
from azure.ai.projects import AIProjectClient

# Create an Azure AI Client from a connection string. Available on Azure AI project Overview page.
# https://learn.microsoft.com/en-us/python/api/azure-ai-projects/azure.ai.projects.aiprojectclient?view=azure-python-preview
project_client = AIProjectClient.from_connection_string(
    os.environ.get("AZURE_AI_FOUNDRY_PROJECT_CONNECTION_STRING"), credential=credential)
                                 

## Upload evaluation data

In [27]:
# TODO upload if not exists ny version and name

# https://learn.microsoft.com/en-us/python/api/azure-ai-projects/azure.ai.projects.operations.datasetsoperations?view=azure-python-preview#azure-ai-projects-operations-datasetsoperations-upload-file
# Upload a file to the Azure AI Foundry project. This method required azure-ai-ml to be installed.
# Return: tuple, containing asset id and asset URI of uploaded file.
data_id, data_uri = project_client.upload_file(DATA_DIR / "science-trivia__context_response_feedback_v12.jsonl")
print(f"Uploaded data asset id: {data_id}")
print(f"Uploaded data asset uri: {data_uri}")

Overriding of current TracerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented


Uploaded data asset id: /subscriptions/c11caebe-ea81-4036-9e58-ccf406d87ead/resourceGroups/mdwsrh/providers/Microsoft.MachineLearningServices/workspaces/rohoff-0016/data/85e263dd-535a-43a5-8dee-bac742a8bbe0/versions/1
Uploaded data asset uri: azureml://subscriptions/c11caebe-ea81-4036-9e58-ccf406d87ead/resourcegroups/mdwsrh/workspaces/rohoff-0016/datastores/workspaceblobstore/paths/LocalUpload/5957c49f55771d53eecbcd9ff484475b/science-trivia__context_response_feedback_v12.jsonl


## Get built-in evaluator for their ids

In [28]:
from azure.ai.evaluation import F1ScoreEvaluator, GroundednessEvaluator, GroundednessProEvaluator, ViolenceEvaluator
print(f'e.g. {F1ScoreEvaluator.id}')

e.g. azureml://registries/azureml/models/F1Score-Evaluator/versions/3


## Get custom evaluator library ids from Azure AI Foundry
Note: this could also be looked up in the AI Foundry Portal (Evaluation Library)

### Connect to Azure AI Foundry project

In [29]:
from azure.ai.ml import MLClient

# Define ml_client to register custom evaluator
# https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.mlclient?view=azure-python
ml_client = MLClient(
       subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
       resource_group_name=os.environ["AZURE_RESOURCE_GROUP_AI"],
       workspace_name=os.environ["AZURE_AI_FOUNDRY_PROJECT_NAME"],
       credential=credential
)

Overriding of current TracerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented


### Helper to built evaluator library id

In [30]:
from azure.ai.ml.entities import Model

def get_evaluator_library_id(_evaluator: Model) -> str:
    _ws = ml_client.workspaces.get(ml_client.workspace_name)
    _id=f"azureml://locations/{_ws.location}/workspaces/{_ws._workspace_id}/models/{_evaluator.name}/versions/{_evaluator.version}"
    print(f"{_evaluator.name} library id: {_id}")
    return _id

### Get library ids

In [31]:
_evaluator = ml_client.evaluators.get("AnswerLenEvaluator", label="latest")
answerLenEvaluator_libId = get_evaluator_library_id(_evaluator)

AnswerLenEvaluator library id: azureml://locations/swedencentral/workspaces/69465c44-88ca-4321-9101-8d4ea38db457/models/AnswerLenEvaluator/versions/2


In [32]:
_evaluator = ml_client.evaluators.get("FriendlinessEvaluator", label="latest")
friendlinessEvaluator_libId = get_evaluator_library_id(_evaluator)

FriendlinessEvaluator library id: azureml://locations/swedencentral/workspaces/69465c44-88ca-4321-9101-8d4ea38db457/models/FriendlinessEvaluator/versions/2


## Start evaluation in the cloud

In [33]:
# https://learn.microsoft.com/en-us/python/api/azure-ai-projects/azure.ai.projects.models.evaluation?view=azure-python-preview
from azure.ai.projects.models import Evaluation

# https://learn.microsoft.com/en-us/python/api/azure-ai-projects/azure.ai.projects.models.evaluatorconfiguration?view=azure-python-preview
from azure.ai.projects.models import EvaluatorConfiguration

# https://learn.microsoft.com/en-us/python/api/azure-ai-projects/azure.ai.projects.models.dataset?view=azure-python-preview
from azure.ai.projects.models import Dataset

# Create an evaluation
evaluation = Evaluation(
    display_name="Cloud evaluation",
    description="Evaluation of dataset",
    data=Dataset(id=data_id),
    
    # Note the evaluator configuration key must follow a naming convention
    # the string must start with a letter with only alphanumeric characters 
    # and underscores. Take "f1_score" as example: "f1score" or "f1_evaluator" 
    # will also be acceptable, but "f1-score-eval" or "1score" will result in errors.
    evaluators={
        "f1_score": EvaluatorConfiguration(
            id=F1ScoreEvaluator.id,
        ),

        "groundedness": EvaluatorConfiguration(
            id=GroundednessEvaluator.id,
            init_params={
                "model_config": model_config
            },
        ),

        "groundedness_pro": EvaluatorConfiguration(
            id=GroundednessProEvaluator.id,
            init_params={
                "azure_ai_project": project_client.scope
            },
        ),

        "violence": EvaluatorConfiguration(
            id=ViolenceEvaluator.id,
            init_params={
                "azure_ai_project": project_client.scope
            },
        ),
        
        "answer_length": EvaluatorConfiguration(
            id=answerLenEvaluator_libId,
            data_mapping={
                "answer": "${data.response}"
            },
        ),
        
        "friendliness": EvaluatorConfiguration(
            id=friendlinessEvaluator_libId,
            init_params={
                "model_config": model_config
            },
            
            data_mapping={
            "response": "${data.response}"
            } 
        )
    },
)

# Create evaluation
evaluation_response = project_client.evaluations.create(
    evaluation=evaluation,
)

# Get evaluation
get_evaluation_response = project_client.evaluations.get(evaluation_response.id)

print("----------------------------------------------------------------")
print("Created evaluation, evaluation ID: ", get_evaluation_response.id)
print("Evaluation status: ", get_evaluation_response.status)
print("AI project URI: ", get_evaluation_response.properties["AiStudioEvaluationUri"])
print("----------------------------------------------------------------")

----------------------------------------------------------------
Created evaluation, evaluation ID:  824b9363-ae2e-4f49-865f-89c69a845dfd
Evaluation status:  Starting
AI project URI:  https://ai.azure.com/build/evaluation/824b9363-ae2e-4f49-865f-89c69a845dfd?wsid=/subscriptions/c11caebe-ea81-4036-9e58-ccf406d87ead/resourceGroups/mdwsrh/providers/Microsoft.MachineLearningServices/workspaces/rohoff-0016&tid=7383a4c2-32e9-4920-8a91-36a6ac3ce4d2
----------------------------------------------------------------


In [34]:
import time

while True:
    _s = project_client.evaluations.get(evaluation_response.id).status
    if _s != 'Starting' and _s != 'Running' and _s != 'Finalizing' and _s != 'Preparing' and _s != 'Queued':
        print(f"Evaluation status: {_s}")
        break

    print(f"Waiting for evaluation to complete... {_s}")
    time.sleep(20)

Waiting for evaluation to complete...
Waiting for evaluation to complete...
Waiting for evaluation to complete...
Waiting for evaluation to complete...
Waiting for evaluation to complete...
Waiting for evaluation to complete...
Waiting for evaluation to complete...
Waiting for evaluation to complete...
Waiting for evaluation to complete...
Waiting for evaluation to complete...
Waiting for evaluation to complete...
Waiting for evaluation to complete...
Waiting for evaluation to complete...
Waiting for evaluation to complete...
Evaluation status: Completed


## Download evaluation artifacts

In [23]:
# https://learn.microsoft.com/fr-fr/python/api/azure-ai-ml/azure.ai.ml.operations.joboperations?view=azure-python#azure-ai-ml-operations-joboperations-download
ml_client.jobs.download(
    name=evaluation_response.id,
    download_path=TMP_DIR / "evaluation_results"
)


Downloading artifact azureml://datastores/workspaceartifactstore/ExperimentRun/dcid.ae4e3a66-650a-4f7a-a0af-5b0b00aab987 to tmp\evaluation_results\artifacts
