# Supervised Fine-Tuning - Improve Tone & Style

**Pre-Requisites**

1. You have an Azure AI Foundry project with GPT-4.1 model deployed
1. You have created the `.env` file and updated it with relevant environment variables
1. You have authenticated with `az login` from your GitHub Codespaces environment.

**Objectives**

In this notebook, you will learn to fine-tune a base model (gpt-4.1) with domain-specific data (Zava Q&A) to adapt model behavior to suit a desired business objective ("Improve Zava chatbot tone & style"). 

**Process**

The figure shows the basic workflow. Explore [Documentation](https://learn.microsoft.com/azure/ai-foundry/openai/tutorials/fine-tune?tabs=bash#create-a-sample-dataset) for more details - then scroll down to walk through these steps for our Zava use case.

![SFT](./../../../docs/slides/14.png)



---

## 1. Zava Customization 

Here is how these steps map to our Zava use case. We have used GPT-4.1 for our customization experiments so far, so let's keep going.

| Step | Description |
|:---|:---|
| 1. Decide Vision and Scope | Improve the tone & style of the Cora chatbot to reflect Zava requirements |
| 2. Choose Base Model | Start with GPT-4.1 (popular LLM, text-generation task, question-answering task) |
| 3. Choose Fine-Tuning Technique | Supervised Fine Tuning - training model with examples of desired tone & style|
| 4. Curate Fine-Tuning Dataset | Use ~50 samples for demo - real-world usage needs 100s and 1000s of samples|
| 5. Run Fine-Tuning Job | Load training data, set hyperparameters - execute job (and monitor progress) |
| 6. Evaluate Fine-Tuned Model | Discuss built-in evaluators (coherence, fluency) vs. custom ones (politeness) |
| 7. Deploy Fine-Tuned Model For Use | Deploy with Developer Tier for testing - delete when done (to save cost)|
| | |


---
## 2. Create The Dataset

**What does the data look like?**

We need a specially formatted JSONL training file where each line provides a JSON-formatted "example" for our training needs. Having this formatted correctly is critical! Note that each "line" has the _system_ message along with a _user_ question and the _assistant_ response.

```json
{"messages": [{"role": "system", "content": "Cora is a polite, factual and helpful assistant for Zava customers."}, {"role": "user", "content": "I'm painting over rust - what spray paint should I use?"}, {"role": "assistant", "content": Perfect! Rust Prevention Spray at $10 applies directly over rust. Want prep tips?"}]}
```
...

**How much data do we need?**

Real-world fine-tuning jobs using SFT will need thousands of training samples. In our case, for this demo, we create just 50 samples, allowing us to split them into two files with 25 training and 25 validation samples. Doubling the number of samples can result in a _linear increase in model quality_ **but** it also requires the samples to be of high quality. Otherwise "garbage in, garbage out"

...

**How can we create this data?**

You can get the data from historical records if you already have the relevant service running (e.g., customer service logs). Or, you can generate synthetic data that reflects the task or domain specific behavior with a curated set of samples. We will use the second option, creating the synthetic data by using GitHub Copilot Agent Mode, with the product data.

1. Create 50+ sample question-answers in JSONL format (we created 52)
1. Divide them into `sft_training.jsonl` and `sft_validation.jsonl` files
1. We used an 80-20 split - 42 training and 8 validation

---

## 3. Prepare For Fine-Tuning

### 3.1 Check Libraries

Make sure the required Python libraries were installed. This should be done automatically when using GitHub Codespaces from this repository

In [7]:
# STEP 1: Check that required Python packages are installed
try:
    import openai
    import requests
    import numpy
    import tiktoken
    print("All required packages are available!")
except ImportError as e:
    print(f"Missing required package: {e}")

All required packages are available!


### 3.2 Check Env Variables

We need the Azure OpenAI endpoint and key values to make calls against our deployed model. We will have set these earlier, when creating the Azure AI Foundry project. Visit Azure AI Foundry (https://ai.azure.com) and go to the Project Overview page for this project if you need to refresh them.

In [8]:
# STEP 2: Check that required environment variables are set
import os
from dotenv import load_dotenv
load_dotenv()

# Read the Environment Variables (set previously)
openai_key = os.getenv("AZURE_OPENAI_API_KEY")
openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
api_version = os.getenv("AZURE_OPENAI_API_VERSION")

# Set model name used for fine-tuning here
model_name = "gpt-4.1"
print(f"Fine-tuning model: {model_name}")

# If missing, print a warning
if not openai_key:
    print("⚠️ Warning: AZURE_OPENAI_API_KEY environment variable is not set.")
if not openai_endpoint:
    print("⚠️ Warning: AZURE_OPENAI_ENDPOINT environment variable is not set.")
if not api_version:
    print("⚠️ Warning: AZURE_OPENAI_API_VERSION environment variable is not set.")  


Fine-tuning model: gpt-4.1




### 3.3. Run Checks On Dataset

In [9]:
# Set File names once
training_file_name = 'sft_training.jsonl'
validation_file_name = 'sft_validation.jsonl'

In [10]:
# Run preliminary checks

import json

# Load the training set
with open(training_file_name, 'r', encoding='utf-8') as f:
    training_dataset = [json.loads(line) for line in f]

# Training dataset stats
print("Number of examples in training set:", len(training_dataset))
print("First example in training set:")
for message in training_dataset[0]["messages"]:
    print(message)

# Load the validation set
with open(validation_file_name, 'r', encoding='utf-8') as f:
    validation_dataset = [json.loads(line) for line in f]

# Validation dataset stats
print("\nNumber of examples in validation set:", len(validation_dataset))
print("First example in validation set:")
for message in validation_dataset[0]["messages"]:
    print(message)

Number of examples in training set: 40
First example in training set:
{'role': 'system', 'content': 'Cora is a polite, factual and helpful assistant for Zava customers.'}
{'role': 'user', 'content': 'Is water-based polyurethane better than oil-based?'}
{'role': 'assistant', 'content': '🤔 Both have benefits! Water-based at $43 dries faster, oil-based at $34 gives amber tone. Which matters most?'}

Number of examples in validation set: 11
First example in validation set:
{'role': 'system', 'content': 'Cora is a polite, factual and helpful assistant for Zava customers.'}
{'role': 'user', 'content': 'How much does the Paint Spray Gun cost?'}
{'role': 'assistant', 'content': '🔧 Great tool! The Paint Spray Gun is $89.99 with professional HVLP coverage. Compatible paints?'}


---

### 3.4 Validate Token Counts (with TikToken)

In [11]:
# Validate token counts

import json
import tiktoken
import numpy as np
from collections import defaultdict


encoding = tiktoken.get_encoding("o200k_base") # default encoding for gpt-4o models. This requires the latest version of tiktoken to be installed.

def num_tokens_from_messages(messages, tokens_per_message=3, tokens_per_name=1):
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3
    return num_tokens

def num_assistant_tokens_from_messages(messages):
    num_tokens = 0
    for message in messages:
        if message["role"] == "assistant":
            num_tokens += len(encoding.encode(message["content"]))
    return num_tokens

def print_distribution(values, name):
    print(f"\n#### Distribution of {name}:")
    print(f"min / max: {min(values)}, {max(values)}")
    print(f"mean / median: {np.mean(values)}, {np.median(values)}")
    print(f"p5 / p95: {np.quantile(values, 0.1)}, {np.quantile(values, 0.9)}")

files = [training_file_name, validation_file_name]

for file in files:
    print(f"Processing file: {file}")
    with open(file, 'r', encoding='utf-8') as f:
        dataset = [json.loads(line) for line in f]

    total_tokens = []
    assistant_tokens = []

    for ex in dataset:
        messages = ex.get("messages", {})
        total_tokens.append(num_tokens_from_messages(messages))
        assistant_tokens.append(num_assistant_tokens_from_messages(messages))

    print_distribution(total_tokens, "total tokens")
    print_distribution(assistant_tokens, "assistant tokens")
    print('*' * 50)

Processing file: sft_training.jsonl

#### Distribution of total tokens:
min / max: 56, 72
mean / median: 62.125, 61.5
p5 / p95: 59.0, 65.1

#### Distribution of assistant tokens:
min / max: 19, 30
mean / median: 23.525, 23.0
p5 / p95: 20.9, 26.1
**************************************************
Processing file: sft_validation.jsonl

#### Distribution of total tokens:
min / max: 57, 70
mean / median: 62.27272727272727, 62.0
p5 / p95: 60.0, 64.0

#### Distribution of assistant tokens:
min / max: 20, 28
mean / median: 23.272727272727273, 23.0
p5 / p95: 22.0, 25.0
**************************************************


---

## 4. Upload Fine Tuning Files

In [12]:
# Upload fine-tuning files

import os
from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
  api_key = os.getenv("AZURE_OPENAI_API_KEY"),
  api_version = os.getenv("AZURE_OPENAI_API_VERSION")
)

# Upload the training and validation dataset files to Azure OpenAI with the SDK.

training_response = client.files.create(
    file = open(training_file_name, "rb"), purpose="fine-tune"
)
training_file_id = training_response.id

validation_response = client.files.create(
    file = open(validation_file_name, "rb"), purpose="fine-tune"
)
validation_file_id = validation_response.id

print("Training file ID:", training_file_id)
print("Validation file ID:", validation_file_id)

Training file ID: file-f234ff928f344a998497193d27f19239
Validation file ID: file-017e3d8843104ac880e523b42624ee46


---

## 5. Begin Fine Tuning

In [13]:
# Submit fine-tuning training job

response = client.fine_tuning.jobs.create(
    training_file = training_file_id,
    validation_file = validation_file_id,
    model = model_name, # Enter base model name. Note that in Azure OpenAI the model name contains dashes and cannot contain dot/period characters.
    seed = 101 # 105 # seed parameter controls reproducibility of the fine-tuning job. If no seed is specified one will be generated automatically.
)

job_id = response.id

# You can use the job ID to monitor the status of the fine-tuning job.
# The fine-tuning job will take some time to start and complete.

print("Job ID:", response.id)
print("Status:", response.status)
print(response.model_dump_json(indent=2))

Job ID: ftjob-d04bf4eb1348467fa3678fdae1e63dc8
Status: pending
{
  "id": "ftjob-d04bf4eb1348467fa3678fdae1e63dc8",
  "created_at": 1756144701,
  "error": null,
  "fine_tuned_model": null,
  "finished_at": null,
  "hyperparameters": {
    "batch_size": -1,
    "learning_rate_multiplier": 2.0,
    "n_epochs": -1
  },
  "model": "gpt-4.1-2025-04-14",
  "object": "fine_tuning.job",
  "organization_id": null,
  "result_files": null,
  "seed": 101,
  "status": "pending",
  "trained_tokens": null,
  "training_file": "file-f234ff928f344a998497193d27f19239",
  "validation_file": "file-017e3d8843104ac880e523b42624ee46",
  "estimated_finish": 1756145781,
  "integrations": null,
  "metadata": null,
  "method": null
}


---

## 6. Track Training Job Status

In [14]:
# Track training status

from IPython.display import clear_output
import time

start_time = time.time()

# Get the status of our fine-tuning job.
response = client.fine_tuning.jobs.retrieve(job_id)

status = response.status

# If the job isn't done yet, poll it every 10 seconds.
while status not in ["succeeded", "failed"]:
    time.sleep(10)

    response = client.fine_tuning.jobs.retrieve(job_id)
    print(response.model_dump_json(indent=2))
    print("Elapsed time: {} minutes {} seconds".format(int((time.time() - start_time) // 60), int((time.time() - start_time) % 60)))
    status = response.status
    print(f'Status: {status}')
    clear_output(wait=True)

print(f'Fine-tuning job {job_id} finished with status: {status}')

# List all fine-tuning jobs for this resource.
print('Checking other fine-tune jobs for this resource.')
response = client.fine_tuning.jobs.list()
print(f'Found {len(response.data)} fine-tune jobs.')

Fine-tuning job ftjob-d04bf4eb1348467fa3678fdae1e63dc8 finished with status: succeeded
Checking other fine-tune jobs for this resource.
Found 2 fine-tune jobs.


---

## 7. List Fine-Tuning Events

In [15]:
response = client.fine_tuning.jobs.list_events(fine_tuning_job_id=job_id, limit=10)
print(response.model_dump_json(indent=2))

{
  "data": [
    {
      "id": "ftevent-8d2a8afbb4e84136a5d86e5c531fcd90",
      "created_at": 1756149364,
      "level": "info",
      "message": "Training tokens billed: 8000",
      "object": "fine_tuning.job.event",
      "data": null,
      "type": "message"
    },
    {
      "id": "ftevent-9c82369fd83d4035bbbcd22a64db2cbf",
      "created_at": 1756149364,
      "level": "info",
      "message": "Model Evaluation Passed.",
      "object": "fine_tuning.job.event",
      "data": null,
      "type": "message"
    },
    {
      "id": "ftevent-efa8791468cd4f15b1885e437d9d445d",
      "created_at": 1756149364,
      "level": "info",
      "message": "Completed results file: file-c39c3cfde22648948c6fdd6cfc1279ce",
      "object": "fine_tuning.job.event",
      "data": null,
      "type": "message"
    },
    {
      "id": "ftevent-e616e3bc6ddf431984acfa770c40123d",
      "created_at": 1756149343,
      "level": "info",
      "message": "Job succeeded.",
      "object": "fine_tuning.jo

---

## 8. List Checkpoints

In [16]:
response = client.fine_tuning.jobs.checkpoints.list(job_id)
print(response.model_dump_json(indent=2))

{
  "data": [
    {
      "id": "ftchkpt-ed43e285d8174b8b8e7230fd0fff49ab",
      "created_at": 1756147966,
      "fine_tuned_model_checkpoint": "gpt-4.1-2025-04-14.ft-d04bf4eb1348467fa3678fdae1e63dc8",
      "fine_tuning_job_id": "ftjob-d04bf4eb1348467fa3678fdae1e63dc8",
      "metrics": {
        "full_valid_loss": 1.3124475890783955,
        "full_valid_mean_token_accuracy": 0.6474820143884892,
        "step": 120.0,
        "train_loss": 0.8280026912689209,
        "train_mean_token_accuracy": 0.75,
        "valid_loss": 2.4705605873694787,
        "valid_mean_token_accuracy": 0.5
      },
      "object": "fine_tuning.job.checkpoint",
      "step_number": 120
    },
    {
      "id": "ftchkpt-ccbedbc3f035472b9e01004f51247ce4",
      "created_at": 1756147775,
      "fine_tuned_model_checkpoint": "gpt-4.1-2025-04-14.ft-d04bf4eb1348467fa3678fdae1e63dc8:ckpt-step-80",
      "fine_tuning_job_id": "ftjob-d04bf4eb1348467fa3678fdae1e63dc8",
      "metrics": {
        "full_valid_loss": 1.3

---

## 9. Final Training Run Results

In [17]:
# Retrieve fine_tuned_model name

response = client.fine_tuning.jobs.retrieve(job_id)

print(response.model_dump_json(indent=2))
fine_tuned_model = response.fine_tuned_model

{
  "id": "ftjob-d04bf4eb1348467fa3678fdae1e63dc8",
  "created_at": 1756144701,
  "error": null,
  "fine_tuned_model": "gpt-4.1-2025-04-14.ft-d04bf4eb1348467fa3678fdae1e63dc8",
  "finished_at": 1756149364,
  "hyperparameters": {
    "batch_size": 1,
    "learning_rate_multiplier": 2.0,
    "n_epochs": 3
  },
  "model": "gpt-4.1-2025-04-14",
  "object": "fine_tuning.job",
  "organization_id": null,
  "result_files": [
    "file-c39c3cfde22648948c6fdd6cfc1279ce"
  ],
  "seed": 101,
  "status": "succeeded",
  "trained_tokens": 9900,
  "training_file": "file-f234ff928f344a998497193d27f19239",
  "validation_file": "file-017e3d8843104ac880e523b42624ee46",
  "estimated_finish": 1756146946,
  "integrations": null,
  "metadata": null,
  "method": null
}


---

## 10. Deploy Fine-Tuned Model

Follow the new guidance [at this document](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/fine-tune-test?tabs=portal) to learn how to use the _developer tier_ for more cost effective testing. For now let's use the [Portal option](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/fine-tune-test?tabs=portal)

--- 

In [None]:
# First get a valid access token using the Azure CLI
import subprocess

# Run the az command to get an access token
result = subprocess.run(
    ["az", "account", "get-access-token", "--query", "accessToken", "-o", "tsv"],
    capture_output=True, text=True, check=True
)
access_token = result.stdout.strip()
print("Access token acquired and stored in variable 'access_token'.")

Access token acquired and stored in variable 'access_token'.


In [None]:
# ........... COMMENTED OUT FOR NOW - DEPLOYMENT DONE MANUALLY VIA PORTAL ......

'''
# Deploy fine-tuned model

import json
import requests

token = os.getenv("TEMP_AUTH_TOKEN")
subscription = "<YOUR_SUBSCRIPTION_ID>"
resource_group = "<YOUR_RESOURCE_GROUP_NAME>"
resource_name = "<YOUR_AZURE_OPENAI_RESOURCE_NAME>"
model_deployment_name = "gpt-4o-mini-2024-07-18-ft" # Custom deployment name you chose for your fine-tuning model

deploy_params = {'api-version': "2024-10-01"} # Control plane API version
deploy_headers = {'Authorization': 'Bearer {}'.format(token), 'Content-Type': 'application/json'}

deploy_data = {
    "sku": {"name": "standard", "capacity": 1},
    "properties": {
        "model": {
            "format": "OpenAI",
            "name": "<YOUR_FINE_TUNED_MODEL>", #retrieve this value from the previous call, it will look like gpt-4o-mini-2024-07-18.ft-0e208cf33a6a466994aff31a08aba678
            "version": "1"
        }
    }
}
deploy_data = json.dumps(deploy_data)

request_url = f'https://management.azure.com/subscriptions/{subscription}/resourceGroups/{resource_group}/providers/Microsoft.CognitiveServices/accounts/{resource_name}/deployments/{model_deployment_name}'

print('Creating a new deployment...')

r = requests.put(request_url, params=deploy_params, headers=deploy_headers, data=deploy_data)

print(r)
print(r.reason)
print(r.json())
'''

In [None]:
# ........... COMMENTED OUT FOR NOW - DEPLOYMENT DONE MANUALLY VIA PORTAL ......
'''
# Deploying with Developer Tier
#
# to obtain the TOKEN parameter, simply access the Cloud Shell in the Azure portal 
# and execute the az account get-access-token command. This will generate the 
# necessary authorization token for your deployment tasks, making the process 
# efficient and straightforward.

import json
import os
import requests

token = os.getenv("<TOKEN>") 
subscription = "<YOUR_SUBSCRIPTION_ID>"  
resource_group = "<YOUR_RESOURCE_GROUP_NAME>"
resource_name = "<YOUR_AZURE_OPENAI_RESOURCE_NAME>"
model_deployment_name = "gpt41-mini-candidate-01" # custom deployment name that you will use to reference the model when making inference calls.

deploy_params = {'api-version': "2025-04-01-preview"} 
deploy_headers = {'Authorization': 'Bearer {}'.format(token), 'Content-Type': 'application/json'}

deploy_data = {
    "sku": {"name": "developertier", "capacity": 50},
    "properties": {
        "model": {
            "format": "OpenAI",
            "name": <"fine_tuned_model">, #retrieve this value from the previous call, it will look like gpt41-mini-candidate-01.ft-b044a9d3cf9c4228b5d393567f693b83
            "version": "1"
        }
    }
}
deploy_data = json.dumps(deploy_data)

request_url = f'https://management.azure.com/subscriptions/{subscription}/resourceGroups/{resource_group}/providers/Microsoft.CognitiveServices/accounts/{resource_name}/deployments/{model_deployment_name}'

print('Creating a new deployment...')

r = requests.put(request_url, params=deploy_params, headers=deploy_headers, data=deploy_data)

print(r)
print(r.reason)
print(r.json())

'''

---

## 11. Use Deployed Customized Model

In [None]:
# ........... COMMENTED OUT FOR NOW - TESTING DONE MANUALLY VIA PORTAL ......

'''

# Use the deployed customized model

import os
from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
  api_key = os.getenv("AZURE_OPENAI_API_KEY"),
  api_version = "2024-10-21"
)

response = client.chat.completions.create(
    model = "gpt-4o-mini-2024-07-18-ft", # model = "Custom deployment name you chose for your fine-tuning model"
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
        {"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},
        {"role": "user", "content": "Do other Azure services support this too?"}
    ]
)

print(response.choices[0].message.content)

'''

---

## 12. DELETE DEPLOYMENT‼️

> [Developer Deployments](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/fine-tune-test?tabs=portal#clean-up-your-deployment) will delete on their own after the 24-hour default window. For all others, delete manually.


Unlike other types of Azure OpenAI models, fine-tuned/customized models have an hourly hosting cost associated with them once they're deployed. It's strongly recommended that once you're done with this tutorial and have tested a few chat completion calls against your fine-tuned model, that you delete the model deployment.

Use this command from CLI to delete the deployed model where the placeholder variables should be replaced with the values for your deployment.

```bash
curl -X DELETE "https://management.azure.com/subscriptions/<SUBSCRIPTION>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.CognitiveServices/accounts/<RESOURCE_NAME>/deployments/<MODEL_DEPLOYMENT_NAME>api-version=2025-04-01-preview" \
  -H "Authorization: Bearer <TOKEN>"
```

---