# Azure AI Foundry

<center><img src="../../../images/Azure-AI-Foundry_1600x900.jpg" alt="Azure AI Foundry" width="600">

## Laboratory 1

In this laboratory we will connect to Azure OpenAI and perform various tasks: request API responses, use text-based responses, analyze the obtained responses, perform text-to-embeddings conversion, make API calls sending images, and also make calls to other LLM models.

The first step is to validate the configuration of environment variables in the `.env` file located at the repository root.

Fill in the variable values as requested.

### Exercise 1 - API Call

Let's import the necessary libraries for the laboratory.

In [None]:
#%pip install -r ../../requirements.txt

In [None]:
#%pip install openai dotenv

In [None]:
import json
import os
from openai import AzureOpenAI
from dotenv import load_dotenv

load_dotenv(dotenv_path="../../../.env")

Let's load the credentials into variables to facilitate use in the laboratory.

In [None]:
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
api_version=os.getenv("API_VERSION")
deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT")
embedding_model = os.getenv("AZURE_OPENAI_EMBEDDING_MODEL")

Now let's initialize the client with the provided credentials.

In [None]:
client = AzureOpenAI(
  azure_endpoint = azure_endpoint[0], 
  api_key=api_key[0],  
  api_version=api_version
)


After creating the client, we will make a simple call where we will pass:

1. A message for the "system" role defining the LLM's role
2. An initial user question
3. An assistant response demonstrating how it should respond (example)
4. A new question for it to answer based on the previously established context

In [None]:
response = client.chat.completions.create(
    model=deployment_name, 
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Does Azure OpenAI support customer-managed keys?"},
        {"role": "assistant", "content": "Yes, customer-managed keys are supported by Azure OpenAI."},
        {"role": "user", "content": "Do other Azure services also support this?"}
    ]
)

Now let's access the LLM response directly.

In [None]:
print(response.choices[0].message.content)

### Exercise 2 - Analyzing the Response

Now that we've made a call to Azure OpenAI, let's analyze the complete content of the response:

In [None]:
response

Now let's structure the response in a more readable format for better data visualization:

In [None]:
response_dict = {
    "id": response.id,
    "model": response.model,
    "created": response.created,
    "usage": {
        "prompt_tokens": response.usage.prompt_tokens,
        "completion_tokens": response.usage.completion_tokens,
        "total_tokens": response.usage.total_tokens
    },
    "completion_tokens_details": {
        "accepted_prediction_tokens": response.usage.completion_tokens_details.accepted_prediction_tokens,
        "audio_tokens": response.usage.completion_tokens_details.audio_tokens,
        "reasoning_tokens": response.usage.completion_tokens_details.reasoning_tokens,
        "rejected_prediction_tokens": response.usage.completion_tokens_details.rejected_prediction_tokens
    },
    "choices": [{
        "index": choice.index,
        "message": {
            "role": choice.message.role,
            "content": choice.message.content
        },
        "finish_reason": choice.finish_reason,
        "content_filter_results": choice.content_filter_results
    } for choice in response.choices],
    "prompt_filter_results": response.prompt_filter_results
}

print(json.dumps(response_dict, indent=2, ensure_ascii=False))

The API doesn't respond only with the text generated by the LLM. We have much more information in this response, such as:
- Whether it uses audio or image
- Content filtering
- Content evaluation
- Prompt token count
- Response token count
- Details about reasoning tokens (for models that support it)
- Applied filter results

This information is essential for monitoring, costs, and application quality control.

After making the call and exploring the response, test yourself by creating a custom prompt. Experiment with the following important parameters:

- **max_completion_tokens**: Maximum number of tokens that can be generated in the response
- **temperature**: Controls creativity (0.0 = more deterministic, 1.0 = more creative)
- **top_p**: Controls response diversity via nucleus sampling
- **frequency_penalty**: Penalizes token repetition based on frequency
- **presence_penalty**: Penalizes token repetition regardless of frequency

In [None]:
response = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "user",
            "content": "I'm traveling to Paris, what should I see?",
        },
        {
            "role": "assistant",
            "content": "Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks and romantic atmosphere. Here are some of the main attractions to see in Paris:\n \n 1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers stunning views of the city.\n 2. The Louvre Museum: The Louvre is one of the largest and most famous museums in the world, housing an impressive collection of art and artifacts, including the Mona Lisa.\n 3. Notre-Dame Cathedral: This beautiful cathedral is one of Paris's most famous landmarks and is known for its Gothic architecture and stunning stained glass windows.\n \n These are just some of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the world's most popular tourist destinations.",
        },
        {
            "role": "user",
            "content": "What's so special about #1?",
        }
    ],
    max_completion_tokens=800,
    temperature=1.0,
    top_p=1.0,
    frequency_penalty=0.0,
    presence_penalty=0.0,
    model=deployment_name
)

print(response.choices[0].message.content)

### Exercise 3 - Embeddings

Embeddings are numerical representations of text that capture the semantic meaning of words or phrases. In Azure OpenAI, you can use the embeddings model to convert text into numerical vectors that can be used for tasks like semantic search, classification, and similarity analysis.

For more information on how to work with embeddings in Azure OpenAI, consult the [official documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=python-new).

In [None]:
response = client.embeddings.create(
    input = "dog",
    model= embedding_model
)

print(response.model_dump_json(indent=2))

Here we generated the embedding of a single word, but we can do the same for larger text segments. The model will automatically organize the content into numerical vectors that capture semantic meaning.

To store embeddings we can use a series of services available in Azure. Just choose the one that best fits your solution:

- [Azure AI Search](https://learn.microsoft.com/en-us/azure/search/vector-search-overview)
- [Azure Cosmos DB for MongoDB vCore](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search)
- [Azure SQL Database](https://learn.microsoft.com/en-us/azure/azure-sql/database/ai-artificial-intelligence-intelligent-applications?view=azuresql&preserve-view=true#vector-search)
- [Azure Cosmos DB for NoSQL](https://learn.microsoft.com/en-us/azure/cosmos-db/vector-search)
- [Azure Cosmos DB for PostgreSQL](https://learn.microsoft.com/en-us/azure/cosmos-db/postgresql/howto-use-pgvector)
- [Azure Database for PostgreSQL - Flexible Server](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/how-to-use-pgvector)
- [Azure Cache for Redis](https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-tutorial-vector-similarity)
- [Use Eventhouse as a vector database - Real-Time Intelligence in Microsoft Fabric](https://learn.microsoft.com/en-us/fabric/real-time-intelligence/vector-database)

### Exercise 4 - Image Processing

In Azure AI Foundry we can work with models that process images, both for image generation and multimodal models in which we can use images as context. In this exercise we will learn how to use images as prompt context.

**Important first point**: we need to think about how to send an image along with the prompt. For this we have 2 main options:
1. Send the image along with the prompt via base64 (encoded)
2. Send the image as a link/URL

Let's see the 2 practical examples below.

First, let's leverage the client we already instantiated and send a URL of an image, asking the model to describe it:

In [None]:
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Itaim_Bibi_Business_District.jpg/250px-Itaim_Bibi_Business_District.jpg"

In [None]:
response = client.chat.completions.create(
    model=deployment_name,
    messages=[
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": [  
            { 
                "type": "text", 
                "text": "Describe this image:" 
            },
            { 
                "type": "image_url",
                "image_url": {
                    "url": image_url
                }
            }
        ] } 
    ],
    max_tokens=2000 
)
print(response.choices[0].message.content)

<center><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Itaim_Bibi_Business_District.jpg/250px-Itaim_Bibi_Business_District.jpg" alt="Azure AI Foundry" width="600">

Now let's read a local image stored on our system and send it along with the message:

In [None]:
import base64
from mimetypes import guess_type

In [None]:
def local_image_to_data_url(image_path):
    # Guess the MIME type of the image based on the file extension
    mime_type, _ = guess_type(image_path)
    if mime_type is None:
        mime_type = 'application/octet-stream'  # Default MIME type if none is found

    # Read and encode the image file
    with open(image_path, "rb") as image_file:
        base64_encoded_data = base64.b64encode(image_file.read()).decode('utf-8')

    # Construct the data URL
    return f"data:{mime_type};base64,{base64_encoded_data}"

In [None]:
image_path = "../../../samples/234039841.jpg"
data_url = local_image_to_data_url(image_path)
print("Data URL:", data_url)

In [None]:
response = client.chat.completions.create(
    model=deployment_name,
    messages=[
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": [  
            { 
                "type": "text", 
                "text": "Describe this image:" 
            },
            { 
                "type": "image_url",
                "image_url": {
                    "url": data_url
                }
            }
        ] } 
    ],
    max_tokens=2000 
)
print(response.choices[0].message.content)

Using Azure OpenAI we have access to various types of functionalities beyond those we explored here. I recommend browsing and exploring the available options to understand what is the best approach for your specific application:

- [Responses API](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses)
- [Reasoning Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/reasoning)
- [Chat completions API](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chatgpt)
- [Computer Use](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/computer-use)
- [Model router concepts](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/model-router)
- [Function calling](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/function-calling)
- [Predicted outputs](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/predicted-outputs)
- [Prompt caching](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching)
- [Structured outputs](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/structured-outputs)
- [Vision-enabled chats](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/structured-outputs)
- [JSON Mode](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/json-mode)
- [Reproducible output](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/reproducible-output)

### Exercise 5 - Other models in Azure AI Foundry

Through Azure AI Foundry we can explore a series of models available in the [Model Catalog](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/foundry-models-overview). 

There we have access to models that are made available by Microsoft (OpenAI, Meta, Mistral AI, Deepseek, xAI, Black Forest Labs) as well as models made available by partners and the community (Nixtla, AI21, NTT Data, Core42, NVIDIA NIM Microservices, Stability AI). 

Through the provided documentation it is possible to understand the difference between the different modes of model availability and how to choose according to your specific scenario.



Now let's continue with a practical example of how to call a model made available by Azure AI Foundry through a chat completion call using a different library from the previous one:

In [None]:
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
from azure.ai.inference.models import AssistantMessage, SystemMessage, UserMessage

In [None]:
%pip install azure-ai-inference azure-core

In [None]:
endpoint = os.getenv("AZURE_PHI4_ENDPOINT")
api_key = os.getenv("AZURE_PHI4_API_KEY")
model_name = os.getenv("AZURE_PHI4_DEPLOYMENT")


In [None]:
clientPhi = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(api_key),
    api_version="2024-05-01-preview"
)

In [None]:
response = clientPhi.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="I'm traveling to Paris, what should I see?"),
    ],
    max_tokens=2048,
    temperature=0.8,
    top_p=0.1,
    presence_penalty=0.0,
    frequency_penalty=0.0,
    model=model_name
)

print(response.choices[0].message.content)

## 🎯 Practical Activities 

Now that you've explored the basic concepts of Azure AI Foundry, let's practice with some simple and targeted activities to consolidate your learning!

### 📝 Activity 1: Temperature Testing
**Objective**: Understand how temperature affects the creativity of responses.

Execute the code below and observe how the same prompt generates different responses with varied temperatures:

In [None]:
prompt = "Write a creative slogan for a technology company."

# Testing different temperatures
temperatures = [0.1, 0.5, 1.0]

for temp in temperatures:
    print(f"\n🌡️ TEMPERATURE: {temp}")
    print("-" * 40)
    
    response = client.chat.completions.create(
        model=deployment_name,
        messages=[
            {"role": "system", "content": "You are a creative marketing assistant."},
            {"role": "user", "content": prompt}
        ],
        temperature=temp,
        max_completion_tokens=100
    )
    
    print(response.choices[0].message.content)
    print(f"Tokens used: {response.usage.total_tokens}")

### 🔍 Activity 2: Embeddings Comparison
**Objective**: Compare how similar words have close embeddings.

Let's generate embeddings for related words and see their sizes:

In [None]:
%pip install numpy

In [None]:
import numpy as np

# Words to compare
words = ["cat", "feline", "dog", "canine", "automobile", "car"]

embeddings_dict = {}

print("Generating embeddings for the words...")
for word in words:
    response = client.embeddings.create(
        input=word,
        model=embedding_model
    )
    embedding = response.data[0].embedding
    embeddings_dict[word] = embedding
    print(f"✅ {word}: {len(embedding)} dimensions")

print(f"\nFirst 5 values of the embedding for the word 'cat':")
print(embeddings_dict["cat"][:5])

In [None]:
# Function to calculate cosine similarity
def calculate_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

# Comparing similarities
print("🔍 Comparing similarities:")
print("-" * 50)

# Cat vs Feline
sim_cat_feline = calculate_similarity(embeddings_dict["cat"], embeddings_dict["feline"])
print(f"Cat ↔ Feline: {sim_cat_feline:.3f}")

# Dog vs Canine
sim_dog_canine = calculate_similarity(embeddings_dict["dog"], embeddings_dict["canine"])
print(f"Dog ↔ Canine: {sim_dog_canine:.3f}")

# Automobile vs Car
sim_auto_car = calculate_similarity(embeddings_dict["automobile"], embeddings_dict["car"])
print(f"Automobile ↔ Car: {sim_auto_car:.3f}")

# Cat vs Car (should be low)
sim_cat_car = calculate_similarity(embeddings_dict["cat"], embeddings_dict["car"])
print(f"Cat ↔ Car: {sim_cat_car:.3f}")

print(f"\n💡 Similar words have higher similarity (close to 1.0)!")

### 🖼️ Activity 3: Image Analysis with Different Prompts
**Objective**: Test how different prompts affect the analysis of the same image.

Let's use different types of questions for the same image:

In [None]:
# Using the same image with different prompts
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Itaim_Bibi_Business_District.jpg/250px-Itaim_Bibi_Business_District.jpg"

# Different types of analysis
prompts = [
    "Describe this image in one sentence:",
    "What type of location is this?",
    "What colors predominate in this image?",
    "What feeling does this image convey?",
    "Count the buildings you can see:"
]

for i, prompt_text in enumerate(prompts, 1):
    print(f"\n🔍 QUESTION {i}: {prompt_text}")
    print("-" * 60)
    
    response = client.chat.completions.create(
        model=deployment_name,
        messages=[
            {"role": "system", "content": "You are an assistant specialized in image analysis."},
            {"role": "user", "content": [
                {"type": "text", "text": prompt_text},
                {"type": "image_url", "image_url": {"url": image_url}}
            ]}
        ],
        max_tokens=150
    )
    
    print(response.choices[0].message.content)

### 🔢 Activity 4: Token Counter
**Objective**: Understand how prompt size affects token consumption.

Let's test prompts of different sizes and see the impact on tokens:

In [None]:
# Prompts of different sizes
test_prompts = [
    "Hello",
    "Explain what artificial intelligence is",
    "Explain in detail what artificial intelligence is, how it works, its practical applications, benefits and challenges for modern society"
]

print("📊 TOKEN CONSUMPTION ANALYSIS")
print("=" * 50)

for i, prompt in enumerate(test_prompts, 1):
    response = client.chat.completions.create(
        model=deployment_name,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        max_completion_tokens=100  # Limiting response to focus on prompt
    )
    
    print(f"\n🔍 TEST {i}:")
    print(f"Prompt: '{prompt[:50]}{'...' if len(prompt) > 50 else ''}'")
    print(f"Prompt tokens: {response.usage.prompt_tokens}")
    print(f"Response tokens: {response.usage.completion_tokens}")
    print(f"Total tokens: {response.usage.total_tokens}")
    print(f"Response: {response.choices[0].message.content[:100]}...")

print("\n💡 Larger prompts consume more input tokens!")

### 🎭 Activity 5: Persona Testing
**Objective**: See how different personas (system messages) affect responses.

Let's ask the same question to different "personalities" of the assistant:

In [None]:
# Different personas to test
personas = [
    {"name": "Professor", "system": "You are a university professor who explains concepts in a didactic and detailed manner."},
    {"name": "Friend", "system": "You are a close friend who speaks in a casual and relaxed way."},
    {"name": "Expert", "system": "You are a technical expert who gives precise and direct answers."},
    {"name": "Poet", "system": "You are a poet who always responds in a creative and artistic way."}
]

question = "What do you think about the future of technology?"

print("🎭 TESTING DIFFERENT PERSONAS")
print("=" * 50)

for persona in personas:
    print(f"\n👤 PERSONA: {persona['name']}")
    print("-" * 30)
    
    response = client.chat.completions.create(
        model=deployment_name,
        messages=[
            {"role": "system", "content": persona["system"]},
            {"role": "user", "content": question}
        ],
        max_completion_tokens=200,
        temperature=0.7
    )
    
    print(response.choices[0].message.content)

print("\n💡 The system message completely defines the assistant's 'way'!")