# [Azure OpenAI Responses API's](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses?tabs=python-secure#generate-a-text-response) (with Entra ID authentication)

In [None]:
# !az login

In [1]:
import os, json
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from dotenv import load_dotenv # requires python-dotenv

load_dotenv("./../../config/credentials_my.env")

os.environ["AZURE_OPENAI_ENDPOINT"] = os.environ['AZURE_OPENAI_ENDPOINT']
api_version = os.environ["OPENAI_API_VERSION"] # at least "2025-03-01-preview"

# credential = DefaultAzureCredential() 

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

client = AzureOpenAI(
  azure_ad_token_provider=token_provider,
  api_version=api_version)

# Read the system message plus the json list of metrics

In [33]:
# Open the system_message file in read mode
with open("system_message.txt", "r", encoding="utf-8") as file:
    system_message = file.read()

# Open the metrics file in read mode
with open("metrics.json", "r", encoding="utf-8") as file:
    metrics_list = json.loads(file.read().replace("\n", ""))

i=1
for m in metrics_list:
    print (f'{i}: {m["metric_name"]}')
    i += 1

1: Intent Resolution
2: Tool Call Accuracy
3: Task Adherence
4: Response Completeness
5: Groundedness (prompt-based)
6: Groundedness Pro
7: Retrieval
8: Relevance
9: Coherence
10: Fluency
11: Similarity
12: F1 Score
13: BLEU Score
14: ROUGE Score
15: METEOR Score


In [35]:
from IPython.display import Markdown, display

index = int(input("Which metric would you like to analyze?"))

print("\n\n")

display(Markdown(f"**GREAT CHOICE!** Here are a couple examples for the metric `<{metrics_list[index-1]['metric_name']}>`\n"))

messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": json.dumps(metrics_list[index-1])},
]

response = client.responses.create(
    model=os.environ['AZURE_OPENAI_CHAT_DEPLOYMENT_NAME'],
    input=messages,
)

# print(response.output_text)
display(Markdown(response.output_text.replace("\n", "  \n")))

Which metric would you like to analyze? 12







**GREAT CHOICE!** Here are a couple examples for the metric `<F1 Score>`


### Esempi per il F1 Score  
  
#### Esempio 1  
- **QUESTION:** Qual è la capitale d'Italia?  
- **ANSWER:** Roma  
- **SCORE:** 1.0  
- **EXPLANATION:** Risposta completa e precisa, tutte le parole coincidono con la verità di base.  
  
#### Esempio 2  
- **QUESTION:** Qual è la capitale d'Italia?  
- **ANSWER:** Roma, noto per il Colosseo.  
- **SCORE:** 0.8  
- **EXPLANATION:** Contiene la risposta corretta ma aggiunge informazioni superflue non richieste.  
  
#### Esempio 3  
- **QUESTION:** Qual è la capitale d'Italia?  
- **ANSWER:** Roma, famosa per il suo cibo.  
- **SCORE:** 0.5  
- **EXPLANATION:** Include la risposta corretta con ulteriori dettagli che non corrispondono alla domanda.  
  
#### Esempio 4  
- **QUESTION:** Qual è la capitale d'Italia?  
- **ANSWER:** Milano  
- **SCORE:** 0.2  
- **EXPLANATION:** Risposta errata, nessuna parola coincide con la verità di base.