# Create Agent

### Getting Started

This sample demonstrates how to evaluate Azure AI Agent
Before running the sample:
```bash
pip install azure-ai-projects azure-identity azure-ai-evaluation
```
Set these environment variables with your own values:
1) **PROJECT_CONNECTION_STRING** - The project connection string, as found in the overview page of your Azure AI Foundry project.
2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the AI model, as found under the "Name" column in the "Models + endpoints" tab in your Azure AI Foundry project.
3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.
4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.
5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.
6) **AZURE_SUBSCRIPTION_ID** - Azure Subscription Id of Azure AI Project
7) **PROJECT_NAME** - Azure AI Project Name
8) **RESOURCE_GROUP_NAME** - Azure AI Project Resource Group Name


### Initializing Project Client

In [8]:
import os
import json
from typing import Set, Callable, Any
import pandas as pd
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.projects.models import FunctionTool, ToolSet
# from user_functions import user_functions
from dotenv import load_dotenv

load_dotenv("../../.env")

project_client = AIProjectClient.from_connection_string(
    credential=DefaultAzureCredential(),
    conn_str=os.environ["PROJECT_CONNECTION_STRING"],
)

# Define a custom Python function.
def fetch_weather(location: str) -> str:
    """
    Fetches the weather information for the specified location.

    :param location (str): The location to fetch weather for.
    :return: Weather information as a JSON string.
    :rtype: str
    """
    # In a real-world scenario, you'd integrate with a weather API.
    # In the following code snippet, we mock the response.
    mock_weather_data = {"Seattle": "Sunny, 25°C", "London": "Cloudy, 18°C", "Tokyo": "Rainy, 22°C"}
    weather = mock_weather_data.get(location, "Weather data not available for this location.")
    weather_json = json.dumps({"weather": weather})
    return weather_json

user_functions: Set[Callable[..., Any]] = {
    fetch_weather,
}

# Adding Tools to be used by Agent 
functions = FunctionTool(user_functions)

toolset = ToolSet()
toolset.add(functions)

AGENT_NAME = "Seattle Tourist Assistant PrP"

### Create Agent

In [9]:
agent = project_client.agents.create_agent(
    model=os.environ["AZURE_OPENAI_DEPLOYMENT"],
    name=AGENT_NAME,
    instructions="You are a helpful assistant",
    toolset=toolset,
)

print(f"Created agent, ID: {agent.id}")

Created agent, ID: asst_QBo4xyqwcl5v97d1Un6dyicU


### Create Thread

In [10]:
thread = project_client.agents.create_thread()
print(f"Created thread, ID: {thread.id}")

Created thread, ID: thread_aQMpaAPS904c8ucbWOeBnV5W


## Conversation with Agent
Use below cells to have conversation with the agent
- `Create Message[1]`
- `Execute[2]`

### Create Message[1]

In [11]:
# Create message to thread

MESSAGE = "Can you send me an email (john@doe.com) with weather information for Seattle?"

message = project_client.agents.create_message(
    thread_id=thread.id,
    role="user",
    content=MESSAGE,
)
print(f"Created message, ID: {message.id}")

Created message, ID: msg_wslbfD0xAoj87OrHjCfS9j95


### Execute[2]

In [12]:
run = project_client.agents.create_and_process_run(thread_id=thread.id, agent_id=agent.id)

print(f"Run finished with status: {run.status}")

if run.status == "failed":
    print(f"Run failed: {run.last_error}")

print(f"Run ID: {run.id}")

ERROR:root:Error executing function 'fetch_weather': Function 'fetch_weather' not found.


Run finished with status: RunStatus.COMPLETED
Run ID: run_AoKtc3888s9a7g10r08odqav


### List Messages

In [13]:
for message in project_client.agents.list_messages(thread.id, order="asc").data:
    print(f"Role: {message.role}")
    print(f"Content: {message.content[0].text.value}")
    print("-" * 40)

Role: MessageRole.USER
Content: Can you send me an email (john@doe.com) with weather information for Seattle?
----------------------------------------
Role: MessageRole.AGENT
Content: It seems I'm unable to retrieve the weather information right now. However, I can assist you in other ways or suggest alternatives for checking Seattle's weather—such as using a reliable weather website like Weather.com or AccuWeather.
----------------------------------------


# Evaluate

### Get data from agent

In [25]:
import json
from azure.ai.evaluation import AIAgentConverter

# Initialize the converter that will be backed by the project.
converter = AIAgentConverter(project_client)

thread_id = thread.id
run_id = run.id

converted_data = converter.convert(thread_id, run_id)
print(json.dumps(converted_data, indent=4))

In [26]:
# Save the converted data to a JSONL file

file_name = "evaluation_data.jsonl"
evaluation_data = converter.prepare_evaluation_data(thread_ids=thread.id, filename=file_name)


### Setting up evaluator

In [27]:
from azure.ai.evaluation import ToolCallAccuracyEvaluator , AzureOpenAIModelConfiguration, IntentResolutionEvaluator, TaskAdherenceEvaluator, ViolenceEvaluator
from pprint import pprint

model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT"],
)
# Needed to use content safety evaluators
azure_ai_project={
    "subscription_id": os.environ["SUBSCRIPTION_ID"],
    "project_name": os.environ["PROJECT_NAME"],
    "resource_group_name": os.environ["RG_NAME"],
}

tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)
intent_resolution = IntentResolutionEvaluator(model_config=model_config)
task_adherence = TaskAdherenceEvaluator(model_config=model_config)

In [28]:
# Evaluating query and response as strings
# A positive example. Intent is identified and understood and the response correctly resolves user intent
result = intent_resolution(
    query="What are the opening hours of the Eiffel Tower?",
    response="Opening hours of the Eiffel Tower are 9:00 AM to 11:00 PM.",
)
result

{'intent_resolution': 5.0,
 'intent_resolution_result': 'pass',
 'intent_resolution_threshold': 3,
 'intent_resolution_reason': "The response provides the opening hours of the Eiffel Tower, which directly addresses the user's query with accurate and complete information. No additional details or tools were required to resolve the query.",
 'additional_details': {'conversation_has_intent': True,
  'agent_perceived_intent': 'provide the opening hours of the Eiffel Tower',
  'actual_user_intent': 'find out the opening hours of the Eiffel Tower',
  'correct_intent_detected': True,
  'intent_resolved': True}}

In [29]:
from azure.ai.evaluation import ResponseCompletenessEvaluator

response_completeness = ResponseCompletenessEvaluator(model_config=model_config)
# A negative example. Only half of the statements in the response were complete according to the ground truth  
result = response_completeness(
    response="Itinery: Day 1 take a train to visit Disneyland outside of the city; Day 2 rests in hotel.",
    ground_truth="Itinery: Day 1 take a train to visit the downtown area for city sightseeing; Day 2 rests in hotel."
)
result

{'response_completeness': 2,
 'response_completeness_result': 'fail',
 'response_completeness_threshold': 3,
 'response_completeness_reason': 'The response is barely complete because it only matches the Day 2 activity and completely diverges from the ground truth for Day 1.'}

In [30]:
query = "How is the weather in Seattle?"
tool_calls = [{
                    "type": "tool_call",
                    "tool_call_id": "call_CUdbkBfvVBla2YP3p24uhElJ",
                    "name": "fetch_weather",
                    "arguments": {
                        "location": "Seattle"
                    }
            },
            {
                    "type": "tool_call",
                    "tool_call_id": "call_CUdbkBfvVBla2YP3p24uhElJ",
                    "name": "fetch_weather",
                    "arguments": {
                        "location": "London"
                    }
            }
            ]

tool_definition = {
                    "name": "fetch_weather",
                    "description": "Fetches the weather information for the specified location.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The location to fetch weather for."
                            }
                        }
                    }
                }
response = tool_call_accuracy(query=query, tool_calls=tool_calls, tool_definitions=tool_definition)
response

{'tool_call_accuracy': 0.5,
 'tool_call_accuracy_result': 'fail',
 'tool_call_accuracy_threshold': 0.8,
 'per_tool_call_details': [{'tool_call_accurate': True,
   'tool_call_accurate_reason': "The TOOL CALL is relevant, uses appropriate parameters, and extracts correct values from the conversation, making it highly likely to resolve the user's need.",
   'tool_call_id': 'call_CUdbkBfvVBla2YP3p24uhElJ'},
  {'tool_call_accurate': False,
   'tool_call_accurate_reason': 'The TOOL CALL is not relevant because the parameter value ("London") does not match the user\'s query about "Seattle," and thus it will not help resolve the user\'s need.',
   'tool_call_id': 'call_CUdbkBfvVBla2YP3p24uhElJ'}]}

### Run Evaluator

In [31]:
from azure.ai.evaluation import evaluate

# Regenerate the evaluation data file to ensure it is valid
file_name = "evaluation_data.jsonl"
evaluation_data = converter.prepare_evaluation_data(thread_ids=thread.id, filename=file_name)

response = evaluate(
    data=file_name,
    evaluators={
        "tool_call_accuracy": tool_call_accuracy,
        "intent_resolution": intent_resolution,
        "task_adherence": task_adherence,
    },
    azure_ai_project={
        "subscription_id": os.environ["SUBSCRIPTION_ID"],
        "project_name": os.environ["PROJECT_NAME"],
        "resource_group_name": os.environ["RG_NAME"],
    }
)
pprint(f'AI Foundary URL: {response.get("studio_url")}')

[2025-07-28 11:07:18 +0200][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_common_base_eval_asyncevaluatorbase_kdpkq223_20250728_110718_082694, log path: C:\Users\ruplisso\.promptflow\.runs\azure_ai_evaluation_evaluators_common_base_eval_asyncevaluatorbase_kdpkq223_20250728_110718_082694\logs.txt
[2025-07-28 11:07:18 +0200][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_common_base_eval_asyncevaluatorbase_4afnhe7v_20250728_110718_100631, log path: C:\Users\ruplisso\.promptflow\.runs\azure_ai_evaluation_evaluators_common_base_eval_asyncevaluatorbase_4afnhe7v_20250728_110718_100631\logs.txt
[2025-07-28 11:07:18 +0200][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_common_base_eval_asyncevaluatorbase_yhwzlm_5_20250728_110718_100631, log path: C:\Users\ruplisso\.promptflow\.runs\azure_ai_evaluation_evaluators_common_base_eval_asyncevalu

In [32]:
response = evaluate(
    data="./sample_synthetic_conversations.jsonl",
    evaluators={
        "tool_call_accuracy": tool_call_accuracy,
        "intent_resolution": intent_resolution,
        "task_adherence": task_adherence,
        "response_completeness": response_completeness,
    },
    azure_ai_project={
        "subscription_id": os.environ["SUBSCRIPTION_ID"],
        "project_name": os.environ["PROJECT_NAME"],
        "resource_group_name": os.environ["RG_NAME"],
    },
)
response.get("studio_url")


[2025-07-28 11:07:52 +0200][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_common_base_eval_asyncevaluatorbase_zc8uiq8z_20250728_110751_969460, log path: C:\Users\ruplisso\.promptflow\.runs\azure_ai_evaluation_evaluators_common_base_eval_asyncevaluatorbase_zc8uiq8z_20250728_110751_969460\logs.txt
[2025-07-28 11:07:52 +0200][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_common_base_eval_asyncevaluatorbase_b9o5c8_s_20250728_110751_961797, log path: C:\Users\ruplisso\.promptflow\.runs\azure_ai_evaluation_evaluators_common_base_eval_asyncevaluatorbase_b9o5c8_s_20250728_110751_961797\logs.txt
[2025-07-28 11:07:52 +0200][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_common_base_eval_asyncevaluatorbase_bfu_giht_20250728_110751_956640, log path: C:\Users\ruplisso\.promptflow\.runs\azure_ai_evaluation_evaluators_common_base_eval_asyncevalu

'https://ai.azure.com/build/evaluation/42ec0a3e-9891-400f-b169-135e9327ecb4?wsid=/subscriptions/8babb7f9-50f7-498f-9e0a-8bef4389331d/resourceGroups/rg-ProjectFoundryHub03072025_2/providers/Microsoft.MachineLearningServices/workspaces/ProjectFoundryHub03072025_2'

In [15]:
pprint(response)