# **Evaluate AI Agent on Azure**

In this notebook, we focus on **evaluating the AI agent using Azure services**. This involves importing the required libraries, loading the necessary configurations, performing evaluations using Azure AI services, and analyzing the results.

### Objectives:
- **Import Libraries:** Import the necessary libraries for evaluation.
- **Load Configurations:** Load the necessary configurations from the environment file.
- **Perform Evaluation:** Use your LLM judge to evaluate the AI agent.
- **Analyze Results:** Analyze the evaluation results to gain insights into the AI agent's performance.

### Key Steps:
1. **Import Libraries:** Import the necessary libraries for evaluation.
2. **Load Configurations:** Load the necessary configurations from the environment file.
3. **Perform Evaluation:** Use your LLM jduge to evaluate the AI agent.
4. **Analyze Results:** Analyze the evaluation results to gain insights into the AI agent's performance.

This notebook ensures that the AI agent is evaluated effectively using your LLM judge, providing insights into its performance and areas for improvement.

In [None]:
from azure.ai.evaluation import evaluate
from azure.identity import DefaultAzureCredential

import pandas as pd
import numpy as np

import json
import os
import datetime

from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from openai import AzureOpenAI

import dotenv
dotenv.load_dotenv(".env")

In [None]:
aoai_endpoint=os.environ["AZURE_OPENAI_API_BASE"]
aoai_api_key=os.environ["AZURE_OPENAI_API_KEY"]
aoai_chat_model_mini=os.environ["AZURE_OPENAI_MODEL_MINI"]
llm_judge=os.environ["LLM_JUDGE"]
aoai_api_version=os.environ["AZURE_OPENAI_API_VERSION"]

**Transform data to correct format**
- *For demonstration purposes, we use the testdata and transform it to the correct jsonl format and run the evaluation flow*
- *However, it's intended to be run using data from your RAG system.*

In [None]:
# NOTE:
# This notebook is designed to evaluate data from your RAG solution and pass it through an evaluator.
# The results can be viewed locally, or uploaded to Azure AI Foundry for better visibility.
#
# For demonstration purposes, we are using the test dataset to simulate this workflow.
# You should replace the test dataset with your actual input/output data.

import pandas as pd

# Load the CSV file into a DataFrame
fpath = 'data/ft-judge/single/test.csv'
df = pd.read_csv(fpath)

In [None]:
# Evaluation Data Structure:
#   The evaluation data should be stored in a JSON Lines (jsonl) file, where each line is a separate JSON object.
# Each JSON object has the following keys:
# - "query":   The query provided to the AI agent.
# - "context": Additional context information that may include instructions, background details,
#              or any relevant text that assists in generating the correct response.
# - "response": The actual response generated by the AI agent for the given query.

df_subset = df[['synthetic_question', 'chunk_data', 'synthetic_response']].rename(
    columns={
        'synthetic_question': 'query',
        'chunk_data': 'context',
        'synthetic_response': 'response'
    }
)

# Export the DataFrame to a JSONL file.
eval_data_path = 'data/agent-output/testtest.jsonl'
df_subset.to_json(eval_data_path, orient='records', lines=True)

**Run evaluation and upload results to AI Foundry**

In [None]:
def get_model_config(eval_model=llm_judge):
    return {
        "azure_endpoint": aoai_endpoint,
        "api_key": aoai_api_key,
        "azure_deployment": eval_model,
        "api_version": aoai_api_version
    }

In [None]:
def load_config(eval_model=llm_judge):
    credential = DefaultAzureCredential()

    model_config = get_model_config(eval_model)

    # Initialize Azure AI project and Azure OpenAI conncetion with your environment variables
    azure_ai_project = {
        "subscription_id": os.environ["SUB_ID"],
        "resource_group_name": os.environ["RG_NAME"],
        "project_name": os.environ["AZURE_PROJECT_NAME"],
    }
    return azure_ai_project, model_config, credential

In [None]:
def run_eval_on_azure(azure_ai_project, custom_groundedness, model_name, path):
    now = datetime.datetime.now()
    result = evaluate(
        evaluation_name = f"custom-groundedness-{model_name}-{now.strftime('%Y-%m-%d-%H-%M-%S')}",
        data=path,
        evaluators={
            "custom_groundedness_0_1": custom_groundedness,
        },
        # column mapping
        evaluator_config={
            "custom_groundedness": {
                "column_mapping": {
                    "query": "${data.query}",
                    "context": "${data.context}",
                    "response": "${data.response}"
                }
            }
        },
        azure_ai_project = azure_ai_project,
        # output_path="./myevalresults.json"
    )


In [None]:
from evaluators.aoai.custom_groundedness import CustomGroundednessEvaluator

# Load Azure AI project and model configuration
azure_ai_project, model_config, credential = load_config(eval_model=llm_judge)

# Custom evaluator for groundedness
custom_groundedness = CustomGroundednessEvaluator(model_config)

# Run evaluation
model_name = model_config["azure_deployment"]
run_eval_on_azure(azure_ai_project, custom_groundedness, model_name, eval_data_path)