<div style="background: linear-gradient(135deg, #034694 0%, #1E8449 50%, #D4AC0D 100%); color: white; padding: 20px; border-radius: 10px; box-shadow: 0 4px 8px rgba(0,0,0,0.2);">
    <h1 style="color: #FFF; text-shadow: 1px 1px 3px rgba(0,0,0,0.5);">Step 1: Plan For Zava Scenario</h1>
    <p style="font-size: 16px; line-height: 1.6;">
    Zava is an enterprise retailer that sells home improvement goods to DIY enthusiasts.
    Cora is their AI customer support chatbot that helps customers find relevant products.
    The Zava AI Engineering team wants to make sure Cora is: helpful, precise, and cost-effective to operate.
    In this set of demos we'll learn how they approach the model customization journey, to make this happen.
    </p>
</div>


## 1. Deploy Models

This is "Act 2" of the breakout session where we look at core Fine-Tuning options in Azure AI Foundry. To achieve this:

1. We need a base model to fine-tune - for SFT
2. We need a teacher model and a student model to transfer behavioral knowledge - for Distillation.

To achieve this, we have provisioned a few model "candidates" by default, using the setup process defined earlier. 

You should see some subset of these models in that list.

- Reasoning Models → o3, o3-mini, o4-mini
- Chat Models →  gpt-4o, gpt-4.1, gpt-4.1-nano
- Embedding Models →  text-embedding-3-large

Your `.env` variables should already be set to reflect the desired Azure AI Foundry project environment. Let's go!


<div style="height: 6px; margin: 30px 0; background: linear-gradient(90deg, #034694 0%, #1E8449 50%, #D4AC0D 100%); border-radius: 3px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);"></div>



In [1]:
# ........ Setup an Azure OpenAI client and test out different models
import os
import time
from openai import AzureOpenAI
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Initialize Azure OpenAI client
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION", "2024-05-01-preview"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

<div style="height: 6px; margin: 30px 0; background: linear-gradient(90deg, #034694 0%, #1E8449 50%, #D4AC0D 100%); border-radius: 3px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);"></div>

#### 1. Define Test Prompt

In [5]:
# ........ Define a test prompt that we'll use for all models
test_prompt = """
You are a home improvement assistant for Zava, a fictional hardware store.
Please give me a brief recommendation for a paint color for my living room.
Include one key feature of the paint and a price range.
"""

# List of models to test
models_to_test = [
    "gpt-4o",
    "gpt-4o-mini",
    "gpt-4.1-mini",
    "gpt-4.1-nano",
    "o3",
    "o3-mini",
    "o4-mini"
]

# Function to call a model and measure performance
def test_model(model_name, prompt):
    start_time = time.time()
    
    try:
        params = {
            "model": model_name,
            "messages": [
                {"role": "system", "content": "You are Cora, a polite, factual and helpful assistant for Zava, a DIY hardware store"},
                {"role": "user", "content": prompt}
            ]
        }
        
        # Add the appropriate token limit parameter based on model type
        if model_name.startswith("o"):
            params["max_completion_tokens"] = 300
        else:
            params["max_tokens"] = 300
        
        response = client.chat.completions.create(**params)
        
        end_time = time.time()
        latency = end_time - start_time
        
        # Extract response and token usage
        content = response.choices[0].message.content
        prompt_tokens = response.usage.prompt_tokens
        completion_tokens = response.usage.completion_tokens
        total_tokens = response.usage.total_tokens
        
        return {
            "model": model_name,
            "latency": latency,
            "prompt_tokens": prompt_tokens,
            "completion_tokens": completion_tokens,
            "total_tokens": total_tokens,
            "response": content
        }
    
    except Exception as e:
        print(f"❌ Error with model {model_name}: {str(e)}")
        return {
            "model": model_name,
            "error": str(e)
        }

<div style="height: 6px; margin: 30px 0; background: linear-gradient(90deg, #034694 0%, #1E8449 50%, #D4AC0D 100%); border-radius: 3px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);"></div>

#### 2. Run Model Tests

In [6]:
# Test each model with the same prompt
# Reset results to avoid duplicates when re-running this cell
import sys

print("🔄 Starting model tests...", flush=True)

results = []
detailed_outputs = {}

for model in models_to_test:
    try:
        print(f"..... Testing model: {model}")
        result = test_model(model, test_prompt)
        results.append(result)
    except Exception as e:
        print(f"Exception outside test_model for {model}: {str(e)}", file=sys.stderr)
        results.append({"model": model, "error": str(e)})

# Store the detailed output for later, but don't display all of it 
for result in results:
    if "error" not in result:
        detailed_outputs[result["model"]] = {
            "response": result["response"],
            "latency": result["latency"],
            "prompt_tokens": result["prompt_tokens"],
            "completion_tokens": result["completion_tokens"],
            "total_tokens": result["total_tokens"]
        }
    else:
        detailed_outputs[result["model"]] = {
            "response": f"ERROR: {result.get('error', 'Unknown error')}",
            "latency": None,
            "prompt_tokens": None,
            "completion_tokens": None,
            "total_tokens": None
        }

print(f"\n✅ Completed testing {len(detailed_outputs)} models", flush=True)

🔄 Starting model tests...
..... Testing model: gpt-4o
..... Testing model: gpt-4o
..... Testing model: gpt-4o-mini
..... Testing model: gpt-4o-mini
..... Testing model: gpt-4.1-mini
..... Testing model: gpt-4.1-mini
..... Testing model: gpt-4.1-nano
..... Testing model: gpt-4.1-nano
..... Testing model: o3
..... Testing model: o3
..... Testing model: o3-mini
..... Testing model: o3-mini
..... Testing model: o4-mini
..... Testing model: o4-mini

✅ Completed testing 7 models

✅ Completed testing 7 models


<div style="height: 6px; margin: 30px 0; background: linear-gradient(90deg, #034694 0%, #1E8449 50%, #D4AC0D 100%); border-radius: 3px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);"></div>

#### 3. Visualize Results

You may see something like this (taken from a previous run) - note how the same prompt has different latency and token usage metrics for different models. In this instance, gpt-4.1 has the lowest total token usage (but the highest latency) - while the o3 reasoning model has the highest token usage (likely due to the reasoning tokens used). Now how the _gpt-4.1-nano_ has the lowest latency (with a slightly higher total token cost) bhile the gpt-4o model is in the middle.

While these results are not conclusive, they offer us some intuition into two metrics (token usage and latency) that are key optimization targets for our assistant. **Note** that these results are _not_ grounded in Zava data (and therefore not accurate) - orchestrating a RAG-based solution would incur added token costs (to capture context in prompt) and latency (to retrieve and augment relevant results)

| MODEL PERF METRICS | | | |
|:---|:---|:---|:---|
| Model | Latency (s) | Prompt Tokens |Completion Tokens | Total Tokens
gpt-4o	     | 1.36	    | 74	           |  93	           |    167
gpt-4.1	     | 2.95	    | 74	            | 62	            |    136
gpt-4.1-nano | 1.10	    | 74	            | 76	             |   150
o3	         | 1.81	    | 73	            | 144	              |  217


In [7]:
import pprint

pprint.pprint(detailed_outputs)

{'gpt-4.1-mini': {'completion_tokens': 74,
                  'latency': 1.0304079055786133,
                  'prompt_tokens': 74,
                  'response': 'For your living room, I recommend a soft, warm '
                              'gray paint like "Cozy Pebble." A key feature of '
                              'this paint is its excellent light-reflecting '
                              'quality, which brightens the room while '
                              'maintaining a cozy atmosphere. The price '
                              'typically ranges from $25 to $40 per gallon. '
                              'This makes it a great balance of quality and '
                              'affordability for home use.',
                  'total_tokens': 148},
 'gpt-4.1-nano': {'completion_tokens': 69,
                  'latency': 0.8667013645172119,
                  'prompt_tokens': 74,
                  'response': 'Certainly! For your living room, I recommend a '
               

In [8]:
# ........ Now display the two clean tables
from IPython.display import display, HTML
import pandas as pd
import html

# First table: Model Responses with left-aligned text
response_data = []
for model, data in detailed_outputs.items():
    # Truncate long responses for cleaner display
    response = data["response"]
    if len(response) > 300:
        response = response[:297] + "..."
    response_data.append({"Model": model, "Response": response})

response_df = pd.DataFrame(response_data)
print("\n\n🤖 MODEL RESPONSES")
print("="*100)

# Display with proper escaping - using pandas built-in HTML rendering
html_output = response_df.to_html(index=False, escape=True)
html_output = html_output.replace('<td>ERROR', '<td style="color:red">ERROR')
html_output = html_output.replace('<table', '<table style="width:100%"')
html_output = html_output.replace('<th>', '<th style="text-align: left;">')
# Apply left alignment to all data cells
html_output = html_output.replace('<td>', '<td style="text-align: left; vertical-align: top; padding: 8px;">')
display(HTML(html_output))

# Second table: Performance Metrics
metrics_data = []
for model, data in detailed_outputs.items():
    metrics_data.append({
        "Model": model,
        "Latency (s)": f"{data['latency']:.2f}" if data['latency'] is not None else "N/A",
        "Prompt Tokens": data['prompt_tokens'] if data['prompt_tokens'] is not None else "N/A",
        "Completion Tokens": data['completion_tokens'] if data['completion_tokens'] is not None else "N/A",
        "Total Tokens": data['total_tokens'] if data['total_tokens'] is not None else "N/A"
    })

metrics_df = pd.DataFrame(metrics_data)
print("\n\n📊 MODEL PERFORMANCE METRICS")
print("="*100)
display(HTML(metrics_df.to_html(index=False)))



🤖 MODEL RESPONSES


Model,Response
gpt-4o,"Certainly! For a living room, I recommend using *Zava Classic Calm Beige*, a warm and neutral color that creates a welcoming and versatile atmosphere. This paint features excellent coverage, minimizing the number of coats needed. Prices range from **$25 to $40 per gallon**, depending on the finis..."
gpt-4o-mini,"I recommend using a soft, warm gray such as ""Repose Gray"" by Sherwin-Williams for your living room. This color creates a cozy and inviting atmosphere while complementing a variety of decor styles. A key feature of this paint is its excellent durability and washability, making it ideal for high-tr..."
gpt-4.1-mini,"For your living room, I recommend a soft, warm gray paint like ""Cozy Pebble."" A key feature of this paint is its excellent light-reflecting quality, which brightens the room while maintaining a cozy atmosphere. The price typically ranges from $25 to $40 per gallon. This makes it a great balance o..."
gpt-4.1-nano,"Certainly! For your living room, I recommend a soft, neutral beige paint. A key feature of this color is its versatility, allowing it to complement various furniture styles and color schemes. The price range for quality beige paint typically falls between $25 and $45 per gallon. Would you like su..."
o3,"For a warm, inviting living room, consider Zava’s “Calm Clay” interior latex paint—a soft, neutral beige-taupe that pairs easily with most décor styles. Key feature: it’s a low-VOC, washable finish, so everyday scuffs wipe right off. Price range: about $32–$38 per gallon, depending on sheen."
o3-mini,
o4-mini,




📊 MODEL PERFORMANCE METRICS


Model,Latency (s),Prompt Tokens,Completion Tokens,Total Tokens
gpt-4o,0.78,74,70,144
gpt-4o-mini,1.45,74,81,155
gpt-4.1-mini,1.03,74,74,148
gpt-4.1-nano,0.87,74,69,143
o3,1.97,73,93,166
o3-mini,2.28,73,300,373
o4-mini,2.08,73,300,373


<div style="background: linear-gradient(135deg, #034694 0%, #1E8449 100%); color: white; padding: 15px; border-radius: 8px; box-shadow: 0 4px 6px rgba(0,0,0,0.1); margin: 20px 0;">
    <h2 style="color: #FFF; text-shadow: 1px 1px 2px rgba(0,0,0,0.4); margin: 0;"> 2️⃣ | Understand The Requirements </h2>
</div>

Our goal is to make the Cora chatbot **polite, factual, and helpful** to Zava shoppers. But what does this actually mean?

1. **Polite & Helpful** - This is about changing the _tone_ and _style_ of responses from Cora to follow a desired template.
1. **Factual** - This is about ensuring that responses are _grounded_ in Zava product data, typically using a RAG-based approach.

**Desired Tone & Style**


<div style="height: 6px; margin: 30px 0; background: linear-gradient(90deg, #034694 0%, #1E8449 50%, #D4AC0D 100%); border-radius: 3px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);"></div>

## 3️⃣ | Explore The Data

<div style="height: 6px; margin: 30px 0; background: linear-gradient(90deg, #034694 0%, #1E8449 50%, #D4AC0D 100%); border-radius: 3px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);"></div>

## 4️⃣ | Try Prompt Engineering

<div style="height: 6px; margin: 30px 0; background: linear-gradient(90deg, #034694 0%, #1E8449 50%, #D4AC0D 100%); border-radius: 3px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);"></div>

## 5️⃣ | Try Retrieval Augmented Generation

<div style="height: 6px; margin: 30px 0; background: linear-gradient(90deg, #034694 0%, #1E8449 50%, #D4AC0D 100%); border-radius: 3px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);"></div>

## 6️⃣ | Time To Try Fine-Tuning!


<div style="display: flex; align-items: center; justify-content: left; padding: 5px; height: 40px; background: linear-gradient(90deg, #7873f5 0%, #ff6ec4 100%); border-radius: 8px; box-shadow: 0 2px 8px rgba(0,0,0,0.12); font-size: 1.5em; font-weight: bold; color: #fff;">
   Next: Be More Helpful With SFT
</div>