# üõçÔ∏è | Cora-For-Zava: Deploy Fine-Tuned Model

Welcome! This notebook guides you through deploying a fine-tuned model to Azure AI using the Azure SDK.

## üõí Our Zava Scenario

**Cora** is a customer service chatbot for **Zava** - a fictitious retailer of home improvement goods for DIY enthusiasts. After fine-tuning a model to better understand Zava's products and customer needs, you need to deploy it to Azure AI Foundry for production use. This notebook walks you through listing completed fine-tuning jobs and deploying your custom model.

## üéØ What You'll Build

By the end of this notebook, you'll have:
- ‚úÖ Listed all successful fine-tuning jobs
- ‚úÖ Selected a fine-tuned model for deployment
- ‚úÖ Deployed the model to Azure AI Foundry
- ‚úÖ Verified the deployment is ready for production use

## üí° What You'll Learn

- How to list and manage fine-tuning jobs
- How to deploy fine-tuned models to Azure AI
- Best practices for model deployment
- How to verify deployment status

Ready to deploy your fine-tuned model? Let's get started! üöÄ

---

### 1. Check Environment Variables

In [None]:
import os

# Check required environment variables
required_vars = [
    "AZURE_OPENAI_API_KEY",
    "AZURE_OPENAI_ENDPOINT",
    "AZURE_SUBSCRIPTION_ID",
    "AZURE_RESOURCE_GROUP",
    "AZURE_AI_FOUNDRY_NAME"
]

missing_vars = [var for var in required_vars if not os.getenv(var)]

if missing_vars:
    print(f"‚ùå Missing environment variables: {', '.join(missing_vars)}")
    print("\nPlease set these variables before continuing.")
else:
    print("‚úÖ All required environment variables are set!")


### 2. Create Azure OpenAI Client

In [None]:
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION", "2025-02-01-preview")
)

print("‚úÖ Azure OpenAI client created successfully!")


### 3. List Available Fine-Tuned Models

In [None]:
# List all fine-tuning jobs
jobs_response = client.fine_tuning.jobs.list()

# Filter for succeeded jobs
succeeded_jobs = [job for job in jobs_response.data if job.status == "succeeded"]

if not succeeded_jobs:
    print("‚ùå No successful fine-tuning jobs found.")
    print("\nPlease complete a fine-tuning job first using 31-basic-finetuning.ipynb")
else:
    print(f"‚úÖ Found {len(succeeded_jobs)} successful fine-tuning job(s):\n")
    for i, job in enumerate(succeeded_jobs, 1):
        print(f"{i}. Job ID: {job.id}")
        print(f"   Model: {job.fine_tuned_model}")
        print(f"   Created: {job.created_at}")
        print(f"   Status: {job.status}")
        print()


### 4. Select Model to Deploy

Enter a job ID from the list above  

In [None]:
# Use a job ID (get the model name from the job)
job_id = "input your finetuning job ID here"  # Replace with your job ID from the list above

In [None]:


# Retrieve the job to get the fine-tuned model name
job = client.fine_tuning.jobs.retrieve(job_id)
fine_tuned_model = job.fine_tuned_model

print(f"‚úÖ Selected model: {fine_tuned_model}")
print(f"   From job: {job_id}")


### 5. Configure Deployment Settings

In [None]:
import time

# Generate unique deployment name
timestamp = int(time.time())
DEPLOYMENT_NAME = f"60-zava-finetuned-{timestamp}"

# Configure deployment settings
DEPLOYMENT = {
    "properties": {
        "model": { 
            "format": "OpenAI", 
            "name": fine_tuned_model, 
            "version": "1" 
        },
    },
    "sku": { 
        "capacity": 250,  # Adjust based on your needs (e.g., 250 for DeveloperTier)
        "name": "Standard"  # Options: "DeveloperTier", "Standard", "GlobalStandard"
    },
}

print(f"üìã Deployment Configuration:")
print(f"   Name: {DEPLOYMENT_NAME}")
print(f"   Model: {fine_tuned_model}")
print(f"   SKU: {DEPLOYMENT['sku']['name']}")
print(f"   Capacity: {DEPLOYMENT['sku']['capacity']}")


### 6. Create Azure Management Client

In [None]:
from azure.identity import DefaultAzureCredential
from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient

# Create management client for Azure Cognitive Services
cogsvc_client = CognitiveServicesManagementClient(
    credential=DefaultAzureCredential(),
    subscription_id=os.environ.get("AZURE_SUBSCRIPTION_ID")
)

print("‚úÖ Azure Management client created successfully!")


### 7. Deploy the Fine-Tuned Model

In [None]:
# Submit deployment request
print(f"üöÄ Starting deployment of {DEPLOYMENT_NAME}...\n")

deployment = cogsvc_client.deployments.begin_create_or_update(
    resource_group_name=os.environ.get("AZURE_RESOURCE_GROUP"),
    account_name=os.environ.get("AZURE_AI_FOUNDRY_NAME"),
    deployment_name=DEPLOYMENT_NAME,
    deployment=DEPLOYMENT,
)

print(f"‚úÖ Deployment request submitted!")
print(f"\n‚è≥ Deployment is now provisioning...")
print(f"   This typically takes 3-5 minutes for small models")


### 8. Wait for Deployment to Complete

In [None]:
from IPython.display import clear_output
import time

start_time = time.time()
status = deployment.status()

while status not in ["Succeeded", "Failed"]:
    deployment.wait(5)
    status = deployment.status()
    elapsed_min = int((time.time() - start_time) // 60)
    elapsed_sec = int((time.time() - start_time) % 60)
    
    clear_output(wait=True)
    print(f"üõ≥Ô∏è  Provisioning {DEPLOYMENT_NAME}")
    print(f"üìä Status: {status}")
    print(f"‚è±Ô∏è  Elapsed time: {elapsed_min} minutes {elapsed_sec} seconds")

# Final status
elapsed_min = int((time.time() - start_time) // 60)
elapsed_sec = int((time.time() - start_time) % 60)

if status == "Succeeded":
    print(f"\nüéâ Deployment completed successfully!")
    print(f"‚è±Ô∏è  Total time: {elapsed_min} minutes {elapsed_sec} seconds")
    print(f"\nüìù Deployment Details:")
    print(f"   Name: {DEPLOYMENT_NAME}")
    print(f"   Model: {fine_tuned_model}")
else:
    print(f"\n‚ùå Deployment failed with status: {status}")


### 9. Test the Deployed Model

In [None]:
# Test the deployed model with multiple sample prompts
test_prompts = [
    "Can I use extension poles with your roller frames?",
    "Do you have natural bristle brushes?"
]

for i, test_prompt in enumerate(test_prompts, 1):
    print(f"Test {i}/{len(test_prompts)}: Testing deployed model with prompt:")
    print(f"   '{test_prompt}'\n")
    
    response = client.chat.completions.create(
        model=DEPLOYMENT_NAME,  # Use the deployment name
        messages=[
            {"role": "system", "content": "You are Cora, a polite, factual and helpful assistant for Zava, a DIY hardware store."},
            {"role": "user", "content": test_prompt}
        ],
        max_tokens=150
    )
    
    print(f"Response from {DEPLOYMENT_NAME}:")
    print(response.choices[0].message.content)
    print("\n" + "="*80 + "\n")

**Insights**

In both the examples above we can note that the response now accurately follows our Zava guidelines for "polite, factual and helpful"
- Every response starts with an emoji
- The first sentence is always an acknowledgement of the user ("polite")
- The next sentence is always an informative segment ("factual")
- The final senteance is always an offer to follow up ("helpful")

And note that we have the succinct responses we were looking for _without adding few-shot examples_, making the prompts shorter and thus saving both token costs and processing latency.

---
### Teardown

Once you are done with this lab, don't forget to tear down the infrastructure. The developer tier model will be torn down automatically (after 24 hours?) but it is better to proactively delete the resource group and release all model quota.