# [Azure OpenAI Responses API's](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses?tabs=python-secure#generate-a-text-response) (with Entra ID authentication)

In [1]:
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()

In [2]:
import os, json
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from dotenv import load_dotenv # requires python-dotenv

load_dotenv("./../../config/credentials_my.env")

os.environ["AZURE_OPENAI_ENDPOINT"] = os.environ['AZURE_OPENAI_ENDPOINT']
api_version = os.environ["OPENAI_API_VERSION"] # at least "2025-03-01-preview"

# credential = DefaultAzureCredential() 

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

client = AzureOpenAI(
  azure_ad_token_provider=token_provider,
  api_version=api_version)

# Read the system message plus the json list of metrics

In [3]:
# Open the system_message file in read mode
with open("system_message.txt", "r", encoding="utf-8") as file:
    system_message = file.read()

# Open the metrics file in read mode
with open("metrics.json", "r", encoding="utf-8") as file:
    metrics_list = json.loads(file.read().replace("\n", ""))

i=1
for m in metrics_list:
    print (f'{i}: {m["metric_name"]}')
    i += 1

1: Intent Resolution
2: Tool Call Accuracy
3: Task Adherence
4: Response Completeness
5: Groundedness (prompt-based)
6: Groundedness Pro
7: Retrieval
8: Relevance
9: Coherence
10: Fluency
11: Similarity
12: F1 Score
13: BLEU Score
14: ROUGE Score
15: METEOR Score


# You may run the next cell multiple times to explore metrics samples

In [4]:
from IPython.display import Markdown, display

index = int(input("Which metric would you like to analyze?"))

display(Markdown(f"\n\n**GREAT CHOICE!** Here are a couple examples for the metric `<{metrics_list[index-1]['metric_name']}>`\n"))

messages=[{"role": "system", "content": system_message},
          {"role": "user", "content": json.dumps(metrics_list[index-1])}]

response = client.responses.create(
    model=os.environ['AZURE_OPENAI_CHAT_DEPLOYMENT_NAME'],
    input=messages)

display(Markdown(response.output_text.replace("\n", "  \n")))

Which metric would you like to analyze? 10




**GREAT CHOICE!** Here are a couple examples for the metric `<Fluency>`


Sure, let's explore the Fluency metric with examples for each score from 5 to 1.  
  
### Fluency Examples  
  
#### Score 5 (Highest Quality)  
**QUESTION:** Summarize the company's annual report.  
**ANSWER:** The company experienced a 10% growth in revenue despite market challenges, investing heavily in R&D, while maintaining its commitment to sustainability.  
**SCORE:** 5  
**EXPLANATION:** The answer is grammatically accurate, uses varied vocabulary, and conveys information smoothly with clear sentence structure and readability.  
  
#### Score 4  
**QUESTION:** Describe the benefits of solar energy.  
**ANSWER:** Solar energy reduces electricity bills and is eco-friendly. It promotes sustainable living and energy independence.  
**SCORE:** 4  
**EXPLANATION:** Generally fluent, this response efficiently communicates ideas. Minor vocabulary and sentence complexity improvements could be made for higher fluency.  
  
#### Score 3  
**QUESTION:** What are the main features of the new smartphone?  
**ANSWER:** New smartphone has big screen, fast processor, and nice camera.  
**SCORE:** 3  
**EXPLANATION:** The meaning is clear but lacks sentence complexity and vocabulary variety, leading to a less engaging text.  
  
#### Score 2  
**QUESTION:** Explain the term 'machine learning.'  
**ANSWER:** Machine learning learns from data to make predictions.   
**SCORE:** 2  
**EXPLANATION:** The sentence structure is simplistic with grammatical awkwardness, reducing clarity and understanding.  
  
#### Score 1 (Lowest Quality)  
**QUESTION:** How does photosynthesis work?  
**ANSWER:** Photosynthesis is plants make food.   
**SCORE:** 1  
**EXPLANATION:** This sentence lacks grammatical accuracy, adequate vocabulary, coherence, and readability, making it difficult to understand.