# Evaluate on a Target

An Ask Wiki app has been created that uses the Wikipedia API to answer questions leveraging information available in Wikipedia articles.

In this exercise, you will assess the relevance of the chatbot's responses given a query.

## Add environment variables to the .env file

In the root of the **Evaluation and Data Generation Workshop** folder is an `.env` file. Within the `.env` file, fill in the values for the environment variables. You can locate the values for each environment variable in the following locations of the [Azure AI Foundry](https://ai.azure.com) portal:

- `AZURE_SUBSCRIPTION_ID` - On the **Overview** page of your project within **Project details**.
- `AZURE_AI_PROJECT_NAME` - At the top of the **Overview** page for your project.
- `AZURE_OPENAI_RESOURCE_GROUP` - On the **Overview** page of the **Management Center** within **Project properties**.
- `AZURE_OPENAI_SERVICE` - On the **Overview** page of your project in the **Included capabilities** tab for **Azure OpenAI Service**.
- `AZURE_OPENAI_API_VERSION` - On the [API version lifecycle](https://learn.microsoft.com/azure/ai-services/openai/api-version-deprecation#latest-ga-api-release) webpage within the **Latest GA API release** section.
- `AZURE_OPENAI_ENDPOINT` - On the **Details** tab of your model deployment within **Endpoint** (i.e. **Target URI**)
- `AZURE_OPENAI_DEPLOYMENT_NAME` -  On the **Details** tab of your model deployment within **Deployment info**.

# Sign in to Azure

As a security best practice, we'll use [keyless authentication](https://learn.microsoft.com/azure/developer/ai/keyless-connections?tabs=csharp%2Cazure-cli) to authenticate to Azure OpenAI with Microsoft Entra ID. Before you can do so, you'll first need to install the **Azure CLI** per the [installation instructions](https://learn.microsoft.com/cli/azure/install-azure-cli) for your operating system.

Next, open a terminal and run `az login` to sign in to your Azure account.

## Import and Test Ask Wiki

Let's test a query with Ask Wiki to validate that your environment variables are properly configured. We'll begin by importing the `ask_wiki` function from `askwiki`. The `ask_wiki` function generates a response from the app. Once imported, we'll pass in a query to view the response and context generated.

In [1]:
%pip install bs4

from askwiki import ask_wiki

ask_wiki(query="What is the capital of India?")

Defaulting to user installation because normal site-packages is not writeable
Collecting bs4
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Downloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)
Installing collected packages: bs4
Successfully installed bs4-0.0.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


{'response': 'The capital of India is New Delhi.',
 'context': 'Content: Delhi,[b] officially the National Capital Territory (NCT) of Delhi, is a city and a union territory of India containing New Delhi, the capital of India. Straddling the Yamuna river, but spread chiefly to the west, or beyond its right bank, Delhi shares borders with the state of Uttar Pradesh in the east and with the state of Haryana in the remaining directions. Delhi became a union territory on 1 November 1956 and the NCT in 1995.[22] The NCT covers an area of 1,484 square kilometres (573\xa0sq\xa0mi).[4] According to the 2011 census, Delhi\'s city proper population was over 11\xa0million,[7][23] while the NCT\'s population was about 16.8\xa0million.[9]. The topography of the medieval fort Purana Qila on the banks of the river Yamuna matches the literary description of the citadel Indraprastha in the Sanskrit epic Mahabharata; however, excavations in the area have revealed no signs of an ancient built environment.

## Install the package

The `evaluate` function for evaluating on a target, and the evaluator class for assessing relevance is in the Azure AI Evaluation SDK. We'll begin by installing the package.

In [2]:
%pip install azure-ai-evaluation

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Access the environment variables.

We'll import `os` and `load_dotenv` so that you can access the environment variables.

In [3]:
import os
from dotenv import load_dotenv

load_dotenv()

True

## Import packages

We'll now import the `evaluate` function and `RelevanceEvaluator` class. We'll also import some additional libraries to help with accessing our data and formatting the results.


In [4]:
from azure.ai.evaluation import evaluate, RelevanceEvaluator
import pandas as pd
from pprint import pprint

[INFO] Could not import AIAgentConverter. Please install the dependency with `pip install azure-ai-projects`.
[INFO] Could not import SKAgentConverter. Please install the dependency with `pip install semantic-kernel`.


## Setup keyless authentication

Rather than hardcode your **key**, we'll use a keyless connection with Azure OpenAI.

In [5]:
import azure.identity

credential = azure.identity.DefaultAzureCredential()
token_provider = azure.identity.get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")

token = token_provider()

## Configure the model_config

The `model_config` is necessary as it's a required parameter when creating an instance of the evaluator class. Let's configure the `model_config` with the following:

- Azure deployment name
- Azure OpenAI endpoint
- OpenAI API version
- Azure OpenAI API Key

In [6]:
model_config = {
    "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT_NAME"),
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
    "api_version": os.environ.get("AZURE_OPENAI_API_VERSION"),
}

## Create an instance of the evaluator

Let's now create an instance of the `RelevanceEvaluator`.

In [7]:
relevance_eval = RelevanceEvaluator(model_config)

## Create the call to evaluate on a target

We can run an evaluation on a target with the `evaluate` function and list our evaluator. Let's assign this function call to the `results` variable. We'll later use this variable to format and print the results.

In [8]:
results = evaluate(
    data="data.jsonl",
    target=ask_wiki,
    evaluators={
        "relevance": relevance_eval,
    }
)

[2025-07-03 16:21:03 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_ask_wiki_20250703_162103_342533, log path: /home/vscode/.promptflow/.runs/azure_ai_evaluation_evaluators_ask_wiki_20250703_162103_342533/logs.txt


2025-07-03 16:21:13 +0000   15677 execution.bulk     INFO     Process 15717 terminated.


[2025-07-03 16:21:14 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_relevance_20250703_162114_559175, log path: /home/vscode/.promptflow/.runs/azure_ai_evaluation_evaluators_relevance_20250703_162114_559175/logs.txt


2025-07-03 16:21:03 +0000   15449 execution.bulk     INFO     Current thread is not main thread, skip signal handler registration in BatchEngine.
2025-07-03 16:21:03 +0000   15449 execution.bulk     INFO     Set process count to 3 by taking the minimum value among the factors of {'default_worker_count': 4, 'row_count': 3}.
2025-07-03 16:21:05 +0000   15449 execution.bulk     INFO     Process name(ForkProcess-4:3)-Process id(15728)-Line number(0) start execution.
2025-07-03 16:21:05 +0000   15449 execution.bulk     INFO     Process name(ForkProcess-4:1)-Process id(15717)-Line number(1) start execution.
2025-07-03 16:21:05 +0000   15449 execution.bulk     INFO     Process name(ForkProcess-4:2)-Process id(15722)-Line number(2) start execution.
2025-07-03 16:21:10 +0000   15449 execution.bulk     INFO     Process name(ForkProcess-4:1)-Process id(15717)-Line number(1) completed.
2025-07-03 16:21:10 +0000   15449 execution.bulk     INFO     Process name(ForkProcess-4:3)-Process id(15728)-Lin

{'metrics': {'relevance.binary_aggregate': 1.0,
             'relevance.gpt_relevance': 4.333333333333333,
             'relevance.relevance': 4.333333333333333,
             'relevance.relevance_threshold': 3.0},
 'rows': [{'inputs.query': 'When was United Stated found ?',
           'inputs.response': '1776',
           'outputs.context': 'Content: The Founding Fathers of the United '
                              'States, often simply referred to as the '
                              'Founding Fathers or the Founders, were a group '
                              'of late-18th-century American revolutionary '
                              'leaders who united the Thirteen Colonies, '
                              'oversaw the War of Independence from Great '
                              'Britain, established the United States of '
                              'America, and crafted a framework of government '
                              'for the new nation.. The Founding Fathers '

## Print the results with Pretty Print

Now that we've run the evaluation, let's print the results using `pretty print`, which displays data in a structured and visually appealing way, making it easier to read and understand.

In [9]:
pprint(results)

## Print the results as table

We can also print the results as a table using `Pandas`.

In [10]:
pd.DataFrame(results["rows"])

Unnamed: 0,outputs.response,outputs.context,inputs.query,inputs.response,outputs.relevance.relevance,outputs.relevance.gpt_relevance,outputs.relevance.relevance_reason,outputs.relevance.relevance_result,outputs.relevance.relevance_threshold
0,The United States was officially founded on **...,Content: The Founding Fathers of the United St...,When was United Stated found ?,1776,4,4,The RESPONSE fully answers the QUERY with accu...,pass,3
1,The capital of France is Paris.,Content: A closed-ended question is any questi...,What is the capital of France?,Paris,4,4,The RESPONSE fully addresses the QUERY with ac...,pass,3
2,"Determining the ""best tennis player of all tim...",Content: This article covers the period from 1...,Who is the best tennis player of all time ?,Roger Federer,5,5,The RESPONSE fully addresses the QUERY with ac...,pass,3


## Delete resources

If you've finished exploring Azure AI Services, delete the Azure resource that you created during the workshop.

**Note**: You may be prompted to delete your deployed model(s) before deleting the resource group.