# Supervised Fine Tuning with GPT-4.1

**See**: [Documentation](https://learn.microsoft.com/azure/ai-foundry/openai/tutorials/fine-tune?tabs=bash#create-a-sample-dataset)

This tutorial teaches you the basic workflow for fine-tuning a base model (gpt-4.1) using supervised fine tuning. 
The workflow involves the following steps:

- Create sample fine-tuning datasets.
- Create environment variables for our resource endpoint and API key.
- Prepare your sample training and validation datasets for fine-tuning.
- Upload your training file and validation file for fine-tuning.
- Create a fine-tuning job for gpt-4o-mini-2024-07-18.
- Deploy a custom fine-tuned model.

---

## 1. Read Environment Variables

In [6]:
import os

openai_key = os.getenv("AZURE_OPENAI_API_KEY")
openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
model_name = os.getenv("DEMO_BASIC_MODEL")
api_version = os.getenv("AZURE_OPENAI_API_VERSION", "2025-02-01-preview")

if not openai_key or not openai_endpoint:
    print("Error: Missing AZURE_OPENAI_KEY or AZURE_OPENAI_ENDPOINT environment variable.")

print("Using Model:", model_name)
print("Using API Version:", api_version)

Using Model: gpt-4.1
Using API Version: 2025-02-01-preview


---

## 2. Validate Datasets

In [7]:
# Set File names once
training_file_name = '01-training_set.jsonl'
validation_file_name = '01-validation_set.jsonl'

In [8]:
# Run preliminary checks

import json

# Load the training set
with open(training_file_name, 'r', encoding='utf-8') as f:
    training_dataset = [json.loads(line) for line in f]

# Training dataset stats
print("Number of examples in training set:", len(training_dataset))
print("First example in training set:")
for message in training_dataset[0]["messages"]:
    print(message)

# Load the validation set
with open(validation_file_name, 'r', encoding='utf-8') as f:
    validation_dataset = [json.loads(line) for line in f]

# Validation dataset stats
print("\nNumber of examples in validation set:", len(validation_dataset))
print("First example in validation set:")
for message in validation_dataset[0]["messages"]:
    print(message)

Number of examples in training set: 20
First example in training set:
{'role': 'system', 'content': 'Cora is a polite and cheerful chatbot that offers helpful suggestions on Zava products for home improvement!'}
{'role': 'user', 'content': 'What brush could I use for oil-based paints?'}
{'role': 'assistant', 'content': "Excellent question! I'd recommend our Natural Bristle Brush Set (PFBR000017) for $23. These traditional natural bristle brushes are specifically designed for oil-based paints and stains, providing superior flow and leveling for a professional finish. They also pair perfectly with our Universal Bonding Primer for all surfaces. Do you want to learn more, or can I help you with a different question?"}

Number of examples in validation set: 20
First example in validation set:
{'role': 'system', 'content': 'Cora is a polite and cheerful chatbot that offers helpful suggestions on Zava products for home improvement!'}
{'role': 'user', 'content': 'I need a paint tray for my rol

---

## 3. Validate Token Counts (with TikToken)

In [9]:
# Validate token counts

import json
import tiktoken
import numpy as np
from collections import defaultdict


encoding = tiktoken.get_encoding("o200k_base") # default encoding for gpt-4o models. This requires the latest version of tiktoken to be installed.

def num_tokens_from_messages(messages, tokens_per_message=3, tokens_per_name=1):
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3
    return num_tokens

def num_assistant_tokens_from_messages(messages):
    num_tokens = 0
    for message in messages:
        if message["role"] == "assistant":
            num_tokens += len(encoding.encode(message["content"]))
    return num_tokens

def print_distribution(values, name):
    print(f"\n#### Distribution of {name}:")
    print(f"min / max: {min(values)}, {max(values)}")
    print(f"mean / median: {np.mean(values)}, {np.median(values)}")
    print(f"p5 / p95: {np.quantile(values, 0.1)}, {np.quantile(values, 0.9)}")

files = [training_file_name, validation_file_name]

for file in files:
    print(f"Processing file: {file}")
    with open(file, 'r', encoding='utf-8') as f:
        dataset = [json.loads(line) for line in f]

    total_tokens = []
    assistant_tokens = []

    for ex in dataset:
        messages = ex.get("messages", {})
        total_tokens.append(num_tokens_from_messages(messages))
        assistant_tokens.append(num_assistant_tokens_from_messages(messages))

    print_distribution(total_tokens, "total tokens")
    print_distribution(assistant_tokens, "assistant tokens")
    print('*' * 50)

Processing file: 01-training_set.jsonl

#### Distribution of total tokens:
min / max: 113, 138
mean / median: 125.65, 125.5
p5 / p95: 118.7, 132.3

#### Distribution of assistant tokens:
min / max: 71, 89
mean / median: 80.5, 81.0
p5 / p95: 74.8, 85.1
**************************************************
Processing file: 01-validation_set.jsonl

#### Distribution of total tokens:
min / max: 112, 127
mean / median: 121.0, 121.5
p5 / p95: 115.0, 125.1

#### Distribution of assistant tokens:
min / max: 69, 84
mean / median: 77.1, 77.0
p5 / p95: 71.9, 81.1
**************************************************


---

## 4. Upload Fine Tuning Files

In [10]:
# Upload fine-tuning files

import os
from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
  api_key = os.getenv("AZURE_OPENAI_API_KEY"),
  api_version = os.getenv("AZURE_OPENAI_API_VERSION")
)

# Upload the training and validation dataset files to Azure OpenAI with the SDK.

training_response = client.files.create(
    file = open(training_file_name, "rb"), purpose="fine-tune"
)
training_file_id = training_response.id

validation_response = client.files.create(
    file = open(validation_file_name, "rb"), purpose="fine-tune"
)
validation_file_id = validation_response.id

print("Training file ID:", training_file_id)
print("Validation file ID:", validation_file_id)

Training file ID: file-ec5c778cea254382b88d41680ca4d134
Validation file ID: file-c97252cb6cbc436cb9f431393f756012


---

## 5. Begin Fine Tuning

In [11]:
# Submit fine-tuning training job

response = client.fine_tuning.jobs.create(
    training_file = training_file_id,
    validation_file = validation_file_id,
    model = model_name, # Enter base model name. Note that in Azure OpenAI the model name contains dashes and cannot contain dot/period characters.
    seed = 101 # 105 # seed parameter controls reproducibility of the fine-tuning job. If no seed is specified one will be generated automatically.
)

job_id = response.id

# You can use the job ID to monitor the status of the fine-tuning job.
# The fine-tuning job will take some time to start and complete.

print("Job ID:", response.id)
print("Status:", response.status)
print(response.model_dump_json(indent=2))

Job ID: ftjob-499afbe45d7444bb87740de97ea6f650
Status: pending
{
  "id": "ftjob-499afbe45d7444bb87740de97ea6f650",
  "created_at": 1754878180,
  "error": null,
  "fine_tuned_model": null,
  "finished_at": null,
  "hyperparameters": {
    "batch_size": -1,
    "learning_rate_multiplier": 2.0,
    "n_epochs": -1
  },
  "model": "gpt-4.1-2025-04-14",
  "object": "fine_tuning.job",
  "organization_id": null,
  "result_files": null,
  "seed": 101,
  "status": "pending",
  "trained_tokens": null,
  "training_file": "file-ec5c778cea254382b88d41680ca4d134",
  "validation_file": "file-c97252cb6cbc436cb9f431393f756012",
  "estimated_finish": 1754879260,
  "integrations": null,
  "metadata": null,
  "method": null
}


---

## 6. Track Training Job Status

In [None]:
# Track training status

from IPython.display import clear_output
import time

start_time = time.time()

# Get the status of our fine-tuning job.
response = client.fine_tuning.jobs.retrieve(job_id)

status = response.status

# If the job isn't done yet, poll it every 10 seconds.
while status not in ["succeeded", "failed"]:
    time.sleep(10)

    response = client.fine_tuning.jobs.retrieve(job_id)
    print(response.model_dump_json(indent=2))
    print("Elapsed time: {} minutes {} seconds".format(int((time.time() - start_time) // 60), int((time.time() - start_time) % 60)))
    status = response.status
    print(f'Status: {status}')
    clear_output(wait=True)

print(f'Fine-tuning job {job_id} finished with status: {status}')

# List all fine-tuning jobs for this resource.
print('Checking other fine-tune jobs for this resource.')
response = client.fine_tuning.jobs.list()
print(f'Found {len(response.data)} fine-tune jobs.')

---

## 7. List Fine-Tuning Events

In [16]:
response = client.fine_tuning.jobs.list_events(fine_tuning_job_id=job_id, limit=10)
print(response.model_dump_json(indent=2))

{
  "data": [
    {
      "id": "ftevent-808ddd8829f832a808ddd8829f832a80",
      "created_at": 1754880969,
      "level": "info",
      "message": "Step 100: training loss=0.8421152830123901",
      "object": "fine_tuning.job.event",
      "data": {
        "step": 100,
        "train_loss": 0.8421152830123901,
        "train_mean_token_accuracy": 0.7435897588729858,
        "valid_loss": 1.5004924339584158,
        "valid_mean_token_accuracy": 0.6582278481012658,
        "full_valid_loss": 1.154685450866159,
        "full_valid_mean_token_accuracy": 0.7035398230088495
      },
      "type": "metrics"
    },
    {
      "id": "ftevent-808ddd882998d49808ddd882998d4980",
      "created_at": 1754880959,
      "level": "info",
      "message": "Step 90: training loss=0.4213077425956726",
      "object": "fine_tuning.job.event",
      "data": {
        "step": 90,
        "train_loss": 0.4213077425956726,
        "train_mean_token_accuracy": 0.8500000238418579,
        "valid_loss": 1.1470

---

## 8. List Checkpoints

In [18]:
response = client.fine_tuning.jobs.checkpoints.list(job_id)
print(response.model_dump_json(indent=2))

{
  "data": [
    {
      "id": "ftchkpt-ce9351da53894ed493e709715a53ec19",
      "created_at": 1754883765,
      "fine_tuned_model_checkpoint": "gpt-4.1-2025-04-14.ft-499afbe45d7444bb87740de97ea6f650",
      "fine_tuning_job_id": "ftjob-499afbe45d7444bb87740de97ea6f650",
      "metrics": {
        "full_valid_loss": 1.154685450866159,
        "full_valid_mean_token_accuracy": 0.7035398230088495,
        "step": 100.0,
        "train_loss": 0.8421152830123901,
        "train_mean_token_accuracy": 0.7435897588729858,
        "valid_loss": 1.5004924339584158,
        "valid_mean_token_accuracy": 0.6582278481012658
      },
      "object": "fine_tuning.job.checkpoint",
      "step_number": 100
    },
    {
      "id": "ftchkpt-e4bca86caaee4fe1977ca41ce9f790f2",
      "created_at": 1754883204,
      "fine_tuned_model_checkpoint": "gpt-4.1-2025-04-14.ft-499afbe45d7444bb87740de97ea6f650:ckpt-step-80",
      "fine_tuning_job_id": "ftjob-499afbe45d7444bb87740de97ea6f650",
      "metrics": {
  

---

## 9. Final Training Run Results

In [19]:
# Retrieve fine_tuned_model name

response = client.fine_tuning.jobs.retrieve(job_id)

print(response.model_dump_json(indent=2))
fine_tuned_model = response.fine_tuned_model

{
  "id": "ftjob-499afbe45d7444bb87740de97ea6f650",
  "created_at": 1754878180,
  "error": null,
  "fine_tuned_model": "gpt-4.1-2025-04-14.ft-499afbe45d7444bb87740de97ea6f650",
  "finished_at": 1754886012,
  "hyperparameters": {
    "batch_size": 1,
    "learning_rate_multiplier": 2.0,
    "n_epochs": 5
  },
  "model": "gpt-4.1-2025-04-14",
  "object": "fine_tuning.job",
  "organization_id": null,
  "result_files": [
    "file-f2bf0b22627c46aa8285142139f5d56c"
  ],
  "seed": 101,
  "status": "succeeded",
  "trained_tokens": 16665,
  "training_file": "file-ec5c778cea254382b88d41680ca4d134",
  "validation_file": "file-c97252cb6cbc436cb9f431393f756012",
  "estimated_finish": 1754880457,
  "integrations": null,
  "metadata": null,
  "method": null
}


---

## 10. Deploy Fine-Tuned Model

Follow the new guidance [at this document](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/fine-tune-test?tabs=portal) to learn how to use the _developer tier_ for more cost effective testing.

In [None]:
# ........... COMMENTED OUT FOR NOW - DEPLOYMENT DONE MANUALLY VIA PORTAL ......

'''
# Deploy fine-tuned model

import json
import requests

token = os.getenv("TEMP_AUTH_TOKEN")
subscription = "<YOUR_SUBSCRIPTION_ID>"
resource_group = "<YOUR_RESOURCE_GROUP_NAME>"
resource_name = "<YOUR_AZURE_OPENAI_RESOURCE_NAME>"
model_deployment_name = "gpt-4o-mini-2024-07-18-ft" # Custom deployment name you chose for your fine-tuning model

deploy_params = {'api-version': "2024-10-01"} # Control plane API version
deploy_headers = {'Authorization': 'Bearer {}'.format(token), 'Content-Type': 'application/json'}

deploy_data = {
    "sku": {"name": "standard", "capacity": 1},
    "properties": {
        "model": {
            "format": "OpenAI",
            "name": "<YOUR_FINE_TUNED_MODEL>", #retrieve this value from the previous call, it will look like gpt-4o-mini-2024-07-18.ft-0e208cf33a6a466994aff31a08aba678
            "version": "1"
        }
    }
}
deploy_data = json.dumps(deploy_data)

request_url = f'https://management.azure.com/subscriptions/{subscription}/resourceGroups/{resource_group}/providers/Microsoft.CognitiveServices/accounts/{resource_name}/deployments/{model_deployment_name}'

print('Creating a new deployment...')

r = requests.put(request_url, params=deploy_params, headers=deploy_headers, data=deploy_data)

print(r)
print(r.reason)
print(r.json())
'''

In [None]:
# ........... COMMENTED OUT FOR NOW - DEPLOYMENT DONE MANUALLY VIA PORTAL ......
'''
# Deploying with Developer Tier
#
# to obtain the TOKEN parameter, simply access the Cloud Shell in the Azure portal 
# and execute the az account get-access-token command. This will generate the 
# necessary authorization token for your deployment tasks, making the process 
# efficient and straightforward.

import json
import os
import requests

token = os.getenv("<TOKEN>") 
subscription = "<YOUR_SUBSCRIPTION_ID>"  
resource_group = "<YOUR_RESOURCE_GROUP_NAME>"
resource_name = "<YOUR_AZURE_OPENAI_RESOURCE_NAME>"
model_deployment_name = "gpt41-mini-candidate-01" # custom deployment name that you will use to reference the model when making inference calls.

deploy_params = {'api-version': "2025-04-01-preview"} 
deploy_headers = {'Authorization': 'Bearer {}'.format(token), 'Content-Type': 'application/json'}

deploy_data = {
    "sku": {"name": "developertier", "capacity": 50},
    "properties": {
        "model": {
            "format": "OpenAI",
            "name": <"fine_tuned_model">, #retrieve this value from the previous call, it will look like gpt41-mini-candidate-01.ft-b044a9d3cf9c4228b5d393567f693b83
            "version": "1"
        }
    }
}
deploy_data = json.dumps(deploy_data)

request_url = f'https://management.azure.com/subscriptions/{subscription}/resourceGroups/{resource_group}/providers/Microsoft.CognitiveServices/accounts/{resource_name}/deployments/{model_deployment_name}'

print('Creating a new deployment...')

r = requests.put(request_url, params=deploy_params, headers=deploy_headers, data=deploy_data)

print(r)
print(r.reason)
print(r.json())

'''

---

## 11. Use Deployed Customized Model

In [None]:
# ........... COMMENTED OUT FOR NOW - TESTING DONE MANUALLY VIA PORTAL ......

'''

# Use the deployed customized model

import os
from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
  api_key = os.getenv("AZURE_OPENAI_API_KEY"),
  api_version = "2024-10-21"
)

response = client.chat.completions.create(
    model = "gpt-4o-mini-2024-07-18-ft", # model = "Custom deployment name you chose for your fine-tuning model"
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
        {"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},
        {"role": "user", "content": "Do other Azure services support this too?"}
    ]
)

print(response.choices[0].message.content)

'''

---

## 12. DELETE DEPLOYMENT‼️

> [Developer Deployments](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/fine-tune-test?tabs=portal#clean-up-your-deployment) will delete on their own after the 24-hour default window. For all others, delete manually.


Unlike other types of Azure OpenAI models, fine-tuned/customized models have an hourly hosting cost associated with them once they're deployed. It's strongly recommended that once you're done with this tutorial and have tested a few chat completion calls against your fine-tuned model, that you delete the model deployment.

Use this command from CLI to delete the deployed model where the placeholder variables should be replaced with the values for your deployment.

```bash
curl -X DELETE "https://management.azure.com/subscriptions/<SUBSCRIPTION>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.CognitiveServices/accounts/<RESOURCE_NAME>/deployments/<MODEL_DEPLOYMENT_NAME>api-version=2025-04-01-preview" \
  -H "Authorization: Bearer <TOKEN>"
```

---