## Track Evaluation Results in Azure AI Studio

Contoso Gameworks is developing an AI-powered dialogue generator for video game characters, customizing interactions based on game scenarios. The generator should be evaluated for relevance (fitting dialogue to the game's plot and character traits), fluency (smooth, immersive conversations), and risk and safety (ensuring no violent, offensive, or unfair language is introduced).

In this exercise, you will run an evaluation to assess a dataset of character dialogue generated by the generator. You will push the results to your Azure AI project to track the results in Azure AI Foundry.

## Add environment variables to the .env file

In the root of the **Evaluation and Data Generation Workshop** folder is an `.env` file. Within the `.env` file, fill in the values for the environment variables. You can locate the values for each environment variable in the following locations of the [Azure AI Foundry](https://ai.azure.com) portal:

- `AZURE_SUBSCRIPTION_ID` - On the **Overview** page of your project within **Project details**.
- `AZURE_AI_PROJECT_NAME` - At the top of the **Overview** page for your project.
- `AZURE_OPENAI_RESOURCE_GROUP` - On the **Overview** page of the **Management Center** within **Project properties**.
- `AZURE_OPENAI_SERVICE` - On the **Overview** page of your project in the **Included capabilities** tab for **Azure OpenAI Service**.
- `AZURE_OPENAI_API_VERSION` - On the [API version lifecycle](https://learn.microsoft.com/azure/ai-services/openai/api-version-deprecation#latest-ga-api-release) webpage within the **Latest GA API release** section.
- `AZURE_OPENAI_ENDPOINT` - On the **Details** tab of your model deployment within **Endpoint** (i.e. **Target URI**)
- `AZURE_OPENAI_DEPLOYMENT_NAME` -  On the **Details** tab of your model deployment within **Deployment info**.

# Sign in to Azure

As a security best practice, we'll use [keyless authentication](https://learn.microsoft.com/azure/developer/ai/keyless-connections?tabs=csharp%2Cazure-cli) to authenticate to Azure OpenAI with Microsoft Entra ID. Before you can do so, you'll first need to install the **Azure CLI** per the [installation instructions](https://learn.microsoft.com/cli/azure/install-azure-cli) for your operating system.

Next, open a terminal and run `az login` to sign in to your Azure account.

## Sign-in to Azure

To track the evaluation results in Azure AI Studio, you'll need to first login with your Azure AI account used to provision the Azure resources.

Open a new terminal and enter the following command and follow the instruction in the terminal:

`az login --use-device-code`

Once you've logged in, select your subscription in the terminal.

## Retrieve your values for assigning the Storage Blob Data Contributor role

You'll need the following information to later assign yourself the **Storage Blog Data Contributor** role, which provides access to the Azure AI Project storage account. This permission is necessary for pushing the results to your Azure AI project.

**Resource Group and Subscription ID**

Each value can be found within the **Management center** for your project, located in [Azure AI Foundry](https://ai.azure.com). The **Management center** page is accessible via the left navigation menu of your project (at the very bottom of the navigation menu).

**User ID**

Enter the following command into the terminal:

`az ad signed-in-user show --query id --output tsv`

## Assign yourself the Storage Blob Data Contributor role

In the terminal, enter the following command, replacing the placeholder text with your **subscription ID**, **resource group**, and **user ID**.

`az role assignment create --role "Storage Blob Data Contributor" --scope /subscriptions/<mySubscriptionID>/resourceGroups/<myResourceGroupName> --assignee-principal-type User --assignee-object-id "<user-id>"`


## Install the package

The evaluator classes as well as the `evaluate` function are available in the Azure AI Evaluation SDK. In addition, we'll need to use `promptflow[azure]` to track the results in our Azure AI project. We'll begin by installing all the required packages.

In [1]:
%pip install promptflow-azure azure-ai-evaluation

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Access the environment variables.

We'll import `os` and `load_dotenv` so that you can access the environment variables.

In [2]:
import os
from dotenv import load_dotenv

load_dotenv()

True

## Import packages

We'll `json` to later dump the results into a new file. We'll also need `Path` to access the dataset. And finally, we'll need to import `evaluate` and the evaluators that we'll later use for evaluation.

In [3]:
import json
from pathlib import Path
from azure.ai.evaluation import evaluate, RelevanceEvaluator, FluencyEvaluator, ViolenceEvaluator, HateUnfairnessEvaluator, SexualEvaluator, SelfHarmEvaluator

[INFO] Could not import AIAgentConverter. Please install the dependency with `pip install azure-ai-projects`.
[INFO] Could not import SKAgentConverter. Please install the dependency with `pip install semantic-kernel`.


## Setup keyless authentication

Rather than hardcode your **key**, we'll use a keyless connection with Azure OpenAI.

In [4]:
import azure.identity

credential = azure.identity.DefaultAzureCredential()
token_provider = azure.identity.get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")

token = token_provider()

## Configure the model_config

The `model_config` is necessary as it's a required parameter when creating an instance of an evaluator class. Let's configure the `model_config` with the following:

- Azure OpenAI endpoint
- Azure OpenAI API key

In [5]:
model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
    "api_key": token,
    "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT_NAME")
}

## Configure the Azure AI project

The `azure_ai_project` is later passed into the `ContentSafetyEvaluator` to create an instance of the evaluator class. Let's configure the `azure_ai_project`.

In [6]:
azure_ai_project = {
    "project_name": os.environ.get("AZURE_AI_PROJECT_NAME"),
    "resource_group_name": os.environ.get("AZURE_OPENAI_RESOURCE_GROUP"),
    "subscription_id": os.environ.get("AZURE_SUBSCRIPTION_ID"),
    "api_key": token,
    "api_version": os.environ.get("AZURE_OPENAI_API_VERSION")
}

## Set the path of the dataset

We'll now set the path of the dataset that we'll use for the evalutation. The dataset is within the `gameworks-dialog.jsonl` file and consists of the following values for each row of data:

- query
- response
- context

In [7]:
path = "gameworks-dialog.jsonl"

## Create an instance of the evaluators

Let's now create an instance of the **Relevance**, **Fluency** and **Content Safety** evaluators. The **Content Safety** evaluator is a composite evaluator which combines the following evaluators:

- `ViolenceEvaluator`
- `SexualEvaluator`
- `SelfHarmEvaluator`
- `HateUnfairnessEvaluator`

In [8]:
relevance_eval = RelevanceEvaluator(model_config)
fluency_eval = FluencyEvaluator(model_config)
violence_eval = ViolenceEvaluator(azure_ai_project=azure_ai_project, credential=credential)
hateunfairness_eval = HateUnfairnessEvaluator(azure_ai_project=azure_ai_project, credential=credential)
sexual_eval = SexualEvaluator(azure_ai_project=azure_ai_project, credential=credential)
selfharm_eval = SelfHarmEvaluator(azure_ai_project=azure_ai_project, credential=credential)

Class ViolenceEvaluator: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class HateUnfairnessEvaluator: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class SexualEvaluator: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class SelfHarmEvaluator: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


## Create the call to evaluate the dataset

We can run an evaluation for a dataset with the `evaluate` function and include our list of evaluators. We must also ensure that the `evaluator_config` is set with the appropriate parameters and values for the `query`, `response`, `context` and `ground_truth`.

Since we want to track the evaluation results in our Azure AI project, we'll need to include the Azure AI Foundry project information.

In [None]:
result = evaluate(
    data=path,
    evaluators={
        "relevance": relevance_eval,
        "fluency": fluency_eval,
        "violence": violence_eval,
        "hate_unfairness": hateunfairness_eval,
        "sexual": sexual_eval,
        "self_harm": selfharm_eval
    },
    # column mapping
    evaluator_config={
        "default": {
            "query": "${data.query}",
            "response": "${data.response}",
            "context": "${data.context}",
            "ground_truth": "${data.ground_truth}"
        }
    },
    azure_ai_project = azure_ai_project
)

[2025-07-03 16:17:14 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_violence_20250703_161714_128951, log path: /home/vscode/.promptflow/.runs/azure_ai_evaluation_evaluators_violence_20250703_161714_128951/logs.txt
[2025-07-03 16:17:14 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_relevance_20250703_161714_128317, log path: /home/vscode/.promptflow/.runs/azure_ai_evaluation_evaluators_relevance_20250703_161714_128317/logs.txt
[2025-07-03 16:17:14 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_self_harm_20250703_161714_132366, log path: /home/vscode/.promptflow/.runs/azure_ai_evaluation_evaluators_self_harm_20250703_161714_132366/logs.txt
[2025-07-03 16:17:14 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_hate_unfairness_20250703_161714_129232, log path: 

2025-07-03 16:17:14 +0000   14353 execution.bulk     INFO     Current thread is not main thread, skip signal handler registration in BatchEngine.
2025-07-03 16:17:16 +0000   14353 execution.bulk     INFO     Finished 1 / 3 lines.
2025-07-03 16:17:16 +0000   14353 execution.bulk     INFO     Average execution time for completed lines: 2.51 seconds. Estimated time for incomplete lines: 5.02 seconds.
2025-07-03 16:17:17 +0000   14353 execution.bulk     INFO     Finished 2 / 3 lines.
2025-07-03 16:17:17 +0000   14353 execution.bulk     INFO     Average execution time for completed lines: 1.34 seconds. Estimated time for incomplete lines: 1.34 seconds.
2025-07-03 16:17:17 +0000   14353 execution.bulk     INFO     Finished 3 / 3 lines.
2025-07-03 16:17:17 +0000   14353 execution.bulk     INFO     Average execution time for completed lines: 0.9 seconds. Estimated time for incomplete lines: 0.0 seconds.

Run name: "azure_ai_evaluation_evaluators_relevance_20250703_161714_128317"
Run status: "C

https://ai.azure.com/build/evaluation/db9a1994-de7c-43c6-b997-ec3f02063b31?wsid=/subscriptions/973bd745-2c33-42c1-9ec6-b11a0a7f4aba/resourceGroups/rg-franciscoruizrivas-8637_ai/providers/Microsoft.MachineLearningServices/workspaces/franciscoruizrivas-4283
[{"name": "FluencyEvaluator", "type": "ABCMeta", "fullType": "abc.ABCMeta"}, {"name": "HateUnfairnessEvaluator", "type": "ABCMeta", "fullType": "abc.ABCMeta"}, {"name": "Path", "type": "type", "fullType": "type"}, {"name": "RelevanceEvaluator", "type": "ABCMeta", "fullType": "abc.ABCMeta"}, {"name": "SelfHarmEvaluator", "type": "ABCMeta", "fullType": "abc.ABCMeta"}, {"name": "SexualEvaluator", "type": "ABCMeta", "fullType": "abc.ABCMeta"}, {"name": "ViolenceEvaluator", "type": "ABCMeta", "fullType": "abc.ABCMeta"}, {"name": "azure", "type": "module", "fullType": "module"}, {"name": "azure_ai_project", "type": "dict", "fullType": "dict"}, {"name": "credential", "type": "DefaultAzureCredential", "fullType": "azure.identity._credentials.

## View the results in Azure AI Foundry

Now that the evaluation is complete, you can navigate to the **Evaluation** section of the Azure AI Foundry portal to view the results. Alternatively, you can output a link to the evaluation location using the `studio_url` returned from running `evaluate`.

In [10]:
print(result['studio_url'])

## Delete resources

If you've finished exploring Azure AI Services, delete the Azure resource that you created during the workshop.

**Note**: You may be prompted to delete your deployed model(s) before deleting the resource group.