# Create Agent

### Getting Started

This sample demonstrates how to evaluate Azure AI Agent
Before running the sample:
```bash
pip install azure-ai-projects azure-identity azure-ai-evaluation
```
Set these environment variables with your own values:
1) **PROJECT_CONNECTION_STRING** - The project connection string, as found in the overview page of your Azure AI Foundry project.
2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the AI model, as found under the "Name" column in the "Models + endpoints" tab in your Azure AI Foundry project.
3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.
4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.
5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.
6) **AZURE_SUBSCRIPTION_ID** - Azure Subscription Id of Azure AI Project
7) **PROJECT_NAME** - Azure AI Project Name
8) **RESOURCE_GROUP_NAME** - Azure AI Project Resource Group Name


### Initializing Project Client

In [8]:
import os, json
import pandas as pd
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.agents.models import FunctionTool, ToolSet
from user_functions import user_functions
from dotenv import load_dotenv

load_dotenv("../../.env")

project_endpoint = "https://projetagent-resource.services.ai.azure.com/api/projects/projetagent"

# Initialize the AIProjectClient
project_client = AIProjectClient(
    endpoint=project_endpoint,
    credential=DefaultAzureCredential()
)

AGENT_NAME = "Seattle Tourist Assistant PrP"

# Adding Tools to be used by Agent 
functions = FunctionTool(user_functions)

toolset = ToolSet()
toolset.add(functions)

### Create Agent

In [9]:
agent = project_client.agents.create_agent(
    model=os.environ["AZURE_OPENAI_DEPLOYMENT"],
    name=AGENT_NAME,
    instructions="You are a helpful assistant",
    toolset=toolset,
)

print(f"Created agent, ID: {agent.id}")

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

### Create Thread

In [5]:
thread = project_client.agents.create_thread()
print(f"Created thread, ID: {thread.id}")

AttributeError: 'AgentsClient' object has no attribute 'create_thread'

## Conversation with Agent
Use below cells to have conversation with the agent
- `Create Message[1]`
- `Execute[2]`

### Create Message[1]

In [4]:
# Create message to thread

MESSAGE = "Can you send me an email (john@doe.com) with weather information for Seattle?"

message = project_client.agents.create_message(
    thread_id=thread.id,
    role="user",
    content=MESSAGE,
)
print(f"Created message, ID: {message.id}")

Created message, ID: msg_biAFEdi00Ry3U9vhx7Fmj5FE


### Execute[2]

In [5]:
run = project_client.agents.create_and_process_run(thread_id=thread.id, assistant_id=agent.id)

print(f"Run finished with status: {run.status}")

if run.status == "failed":
    print(f"Run failed: {run.last_error}")

print(f"Run ID: {run.id}")

Sending email to john@doe.com...
Subject: Seattle Weather Update
Body:
Hello,

Here is the current weather information for Seattle:

Weather: Rainy
Temperature: 14Â°C

Stay dry!

Best regards.
Run finished with status: RunStatus.COMPLETED
Run ID: run_DllOK6TL2bCZl8LfX0EstJHc


### List Messages

In [6]:
for message in project_client.agents.list_messages(thread.id, order="asc").data:
    print(f"Role: {message.role}")
    print(f"Content: {message.content[0].text.value}")
    print("-" * 40)

Role: MessageRole.USER
Content: Can you send me an email (john@doe.com) with weather information for Seattle?
----------------------------------------
Role: MessageRole.AGENT
Content: The current weather in Seattle is rainy with a temperature of 14°C. I have sent this information to your email address (john@doe.com). If you need updates for any other location, let me know!
----------------------------------------


# Evaluate

### Get data from agent

In [7]:
import json
from azure.ai.evaluation import AIAgentConverter

# Initialize the converter that will be backed by the project.
converter = AIAgentConverter(project_client)

thread_id = thread.id
run_id = run.id

converted_data = converter.convert(thread_id, run_id)
print(json.dumps(converted_data, indent=4))



{
    "query": [
        {
            "role": "system",
            "content": "You are a helpful assistant"
        },
        {
            "createdAt": "2025-05-21T08:51:21Z",
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Can you send me an email (john@doe.com) with weather information for Seattle?"
                }
            ]
        }
    ],
    "response": [
        {
            "createdAt": "2025-05-21T08:51:23Z",
            "run_id": "run_DllOK6TL2bCZl8LfX0EstJHc",
            "role": "assistant",
            "content": [
                {
                    "type": "tool_call",
                    "tool_call_id": "call_c6noWcNGjQXCmhtC3UvvT6gj",
                    "name": "fetch_weather",
                    "arguments": {
                        "location": "Seattle"
                    }
                }
            ]
        },
        {
            "createdAt": "2025-05-21T0

In [8]:
# Save the converted data to a JSONL file

file_name = "evaluation_data.jsonl"
evaluation_data = converter.prepare_evaluation_data(thread_ids=thread.id, filename=file_name)


### Setting up evaluator

In [9]:
from azure.ai.evaluation import ToolCallAccuracyEvaluator , AzureOpenAIModelConfiguration, IntentResolutionEvaluator, TaskAdherenceEvaluator, ViolenceEvaluator
from pprint import pprint

model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT"],
)
# Needed to use content safety evaluators
azure_ai_project={
    "subscription_id": os.environ["SUBSCRIPTION_ID"],
    "project_name": os.environ["PROJECT_NAME"],
    "resource_group_name": os.environ["RG_NAME"],
}

tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)
intent_resolution = IntentResolutionEvaluator(model_config=model_config)
task_adherence = TaskAdherenceEvaluator(model_config=model_config)



In [10]:
# Evaluating query and response as strings
# A positive example. Intent is identified and understood and the response correctly resolves user intent
result = intent_resolution(
    query="What are the opening hours of the Eiffel Tower?",
    response="Opening hours of the Eiffel Tower are 9:00 AM to 11:00 PM.",
)
result

{'intent_resolution': 4.0,
 'intent_resolution_result': 'pass',
 'intent_resolution_threshold': 3,
 'intent_resolution_reason': "The response correctly identifies the user's intent to find out the opening hours of the Eiffel Tower and provides a specific answer: 9:00 AM to 11:00 PM. However, the Eiffel Tower's opening hours can vary by season, special events, or maintenance, and the response does not mention any such variability or suggest checking for up-to-date information. While the answer is likely accurate for much of the year, the omission of these details means the response is not fully comprehensive or precise.",
 'additional_details': {'conversation_has_intent': True,
  'agent_perceived_intent': 'provide the opening hours of the Eiffel Tower',
  'actual_user_intent': 'find out the opening hours of the Eiffel Tower',
  'correct_intent_detected': True,
  'intent_resolved': False}}

In [11]:
from azure.ai.evaluation import ResponseCompletenessEvaluator

response_completeness = ResponseCompletenessEvaluator(model_config=model_config)
# A negative example. Only half of the statements in the response were complete according to the ground truth  
result = response_completeness(
    response="Itinery: Day 1 take a train to visit Disneyland outside of the city; Day 2 rests in hotel.",
    ground_truth="Itinery: Day 1 take a train to visit the downtown area for city sightseeing; Day 2 rests in hotel."
)
result



{'response_completeness': 2,
 'response_completeness_result': 'fail',
 'response_completeness_threshold': 3,
 'response_completeness_reason': 'The response only matches the ground truth on Day 2 (resting in hotel) and completely misses the main activity for Day 1, so it is barely complete.'}

In [12]:
query = "How is the weather in Seattle?"
tool_calls = [{
                    "type": "tool_call",
                    "tool_call_id": "call_CUdbkBfvVBla2YP3p24uhElJ",
                    "name": "fetch_weather",
                    "arguments": {
                        "location": "Seattle"
                    }
            },
            {
                    "type": "tool_call",
                    "tool_call_id": "call_CUdbkBfvVBla2YP3p24uhElJ",
                    "name": "fetch_weather",
                    "arguments": {
                        "location": "London"
                    }
            }
            ]

tool_definition = {
                    "name": "fetch_weather",
                    "description": "Fetches the weather information for the specified location.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The location to fetch weather for."
                            }
                        }
                    }
                }
response = tool_call_accuracy(query=query, tool_calls=tool_calls, tool_definitions=tool_definition)
response

{'tool_call_accuracy': 0.5,
 'tool_call_accuracy_result': 'fail',
 'tool_call_accuracy_threshold': 0.8,
 'per_tool_call_details': [{'tool_call_accurate': True,
   'tool_call_accurate_reason': "The TOOL CALL is directly relevant, uses the correct parameter and value from the conversation, and will help address the user's need.",
   'tool_call_id': 'call_CUdbkBfvVBla2YP3p24uhElJ'},
  {'tool_call_accurate': False,
   'tool_call_accurate_reason': "The TOOL CALL is not relevant to the user's request because it fetches weather for London instead of Seattle, which is the location asked about in the conversation.",
   'tool_call_id': 'call_CUdbkBfvVBla2YP3p24uhElJ'}]}

### Run Evaluator

In [None]:
from azure.ai.evaluation import evaluate

response = evaluate(
    data=file_name,
    evaluators={
        "tool_call_accuracy": tool_call_accuracy,
        "intent_resolution": intent_resolution,
        "task_adherence": task_adherence,
    },
    azure_ai_project={
        "subscription_id": os.environ["SUBSCRIPTION_ID"],
        "project_name": os.environ["PROJECT_NAME"],
        "resource_group_name": os.environ["RG_NAME"],
    }
)
pprint(f'AI Foundary URL: {response.get("studio_url")}')

[2025-05-21 09:51:55 +0100][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_task_adherence_20250521_095154_783846, log path: C:\Users\alevret\.promptflow\.runs\azure_ai_evaluation_evaluators_task_adherence_20250521_095154_783846\logs.txt
[2025-05-21 09:51:55 +0100][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_intent_resolution_20250521_095154_782801, log path: C:\Users\alevret\.promptflow\.runs\azure_ai_evaluation_evaluators_intent_resolution_20250521_095154_782801\logs.txt
[2025-05-21 09:51:55 +0100][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_tool_call_accuracy_20250521_095154_764316, log path: C:\Users\alevret\.promptflow\.runs\azure_ai_evaluation_evaluators_tool_call_accuracy_20250521_095154_764316\logs.txt


2025-05-21 09:51:55 +0100   27172 execution.bulk     INFO     Current thread is not main thread, skip signal handler registration in BatchEngine.
2025-05-21 09:51:58 +0100   27172 execution.bulk     INFO     Finished 1 / 1 lines.
2025-05-21 09:51:58 +0100   27172 execution.bulk     INFO     Average execution time for completed lines: 2.86 seconds. Estimated time for incomplete lines: 0.0 seconds.

Run name: "azure_ai_evaluation_evaluators_intent_resolution_20250521_095154_782801"
Run status: "Completed"
Start time: "2025-05-21 09:51:54.805817+01:00"
Duration: "0:00:03.436241"
Output path: "C:\Users\alevret\.promptflow\.runs\azure_ai_evaluation_evaluators_intent_resolution_20250521_095154_782801"

2025-05-21 09:51:59 +0100   27172 execution.bulk     INFO     Finished 1 / 1 lines.
2025-05-21 09:51:59 +0100   27172 execution.bulk     INFO     Average execution time for completed lines: 3.85 seconds. Estimated time for incomplete lines: 0.0 seconds.
2025-05-21 09:51:55 +0100   27172 execut

2025-05-21 09:52:17 +0100   27172 execution.bulk     INFO     Current thread is not main thread, skip signal handler registration in BatchEngine.
2025-05-21 09:52:19 +0100   27172 execution.bulk     INFO     Finished 90 / 90 lines.
2025-05-21 09:52:19 +0100   27172 execution.bulk     INFO     Average execution time for completed lines: 0.02 seconds. Estimated time for incomplete lines: 0.0 seconds.
2025-05-21 09:52:19 +0100   27172 execution          ERROR    90/90 flow run failed, indexes: [24,85,43,26,86,44,87,27,45,28,88,46,29,89,47,30,48,31,49,32,50,33,51,52,53,54,55,56,57,58,59,0,60,61,62,3,63,1,64,2,65,19,66,67,68,25,69,70,71,4,72,5,73,17,6,74,7,75,8,18,76,34,9,77,35,10,78,36,11,79,37,12,80,38,13,22,81,39,14,21,82,40,15,20,83,41,16,23,84,42], exception of index 24: (UserError) ResponseCompletenessEvaluator: Either 'conversation' or individual inputs must be provided.

Run name: "azure_ai_evaluation_evaluators_response_completeness_20250521_095217_497240"
Run status: "Completed"
S

In [14]:
response = evaluate(
    data="./sample_synthetic_conversations.jsonl",
    evaluators={
        "tool_call_accuracy": tool_call_accuracy,
        "intent_resolution": intent_resolution,
        "task_adherence": task_adherence,
        "response_completeness": response_completeness,
    },
    azure_ai_project={
        "subscription_id": os.environ["SUBSCRIPTION_ID"],
        "project_name": os.environ["PROJECT_NAME"],
        "resource_group_name": os.environ["RG_NAME"],
    },
)
response.get("studio_url")


[2025-05-21 09:52:17 +0100][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_response_completeness_20250521_095217_497240, log path: C:\Users\alevret\.promptflow\.runs\azure_ai_evaluation_evaluators_response_completeness_20250521_095217_497240\logs.txt
[2025-05-21 09:52:17 +0100][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_intent_resolution_20250521_095217_489912, log path: C:\Users\alevret\.promptflow\.runs\azure_ai_evaluation_evaluators_intent_resolution_20250521_095217_489912\logs.txt
[2025-05-21 09:52:17 +0100][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_task_adherence_20250521_095217_492231, log path: C:\Users\alevret\.promptflow\.runs\azure_ai_evaluation_evaluators_task_adherence_20250521_095217_492231\logs.txt
[2025-05-21 09:52:17 +0100][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluati

'https://ai.azure.com/build/evaluation/6163a135-8d52-456a-9fe1-feb9a9cd9f18?wsid=/subscriptions/65a513ce-bb5d-4ed5-92b1-fa601d510a15/resourceGroups/agentai/providers/Microsoft.MachineLearningServices/workspaces/genaiops-demo'

In [15]:
pprint(response)