# DPO fine tuning with AOAI GPT-4o models

Azure OpenAI lets developers customize OpenAI models with their own data and easily deploy their custom model using an easy to use and affordable managed service.

While Fine Tuning can be a complex process, Azure OpenAI abstracts away a lot of the complexity to make fine tuning accessible to any developer.

Direct Preference Optimization (DPO) fine-tuning allows you to adjust model weights based on human preferences with pairs of responses. It is faster than RLHF, while being equally effective at alignment.

## 1. Pre-requisites

For this hands-on workshop, all you need is access to an Azure subscription and the ability to create Azure OpenAI resources and deployments. 

1. Install libs
2. Create a GPT-4o deployment
3. Create an Azure OpenAI resource in regions where gpt-4o-mini fine tuning is supported
4. Create a `.env` file based on the [example.env](./example.env) file in this repository to store your credentials and important environment variables. Paste your AOAI endpoints, keys and deployment names, name the file `.env`

In [15]:
%pip install -q openai python-dotenv azure-ai-evaluation ipywidgets

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## 2. Prepare Data

Each example in your dataset should contain:
- A prompt, like a user message.
- A choosen output (an ideal assistant response).
- A rejected output (a suboptimal assistant response).
The data should be formatted in JSONL format, with each line representing an example in the following structure
```json
{  
  "input": {  
    "messages": {"role": "system", "content": ...},  
    "tools": [...],  
    "parallel_tool_calls": true  
  },  
  "preferred_output": [{"role": "assistant", "content": ...}],  
  "non_preferred_output": [{"role": "assistant", "content": ...}]  
}  
```

In [None]:
from dotenv import load_dotenv
import os
import json
from openai import AzureOpenAI
from typing import Dict, List, Any, Union, Optional

## please prepare your env file, you can find example.env is an example file
load_dotenv()
aoai_endpoint = os.getenv("AOAI_FINETUNING_ENDPOINT")
aoai_api_key = os.getenv("AOAI_FINETUNING_API_KEY")

training_file_path = './data/training_data/train.jsonl'
validation_file_path = './data/training_data/validation.jsonl'
test_file_path = './data/training_data/test.jsonl'

You can take a look at your prepared files.

In [2]:
def read_jsonl(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        messages = [line for line in f]
        for mes in messages:
            print(mes)

read_jsonl(validation_file_path)

{"input": {"messages": [{"role": "user", "content": "What are some effective strategies for learning chinese?"}]}, "preferred_output": [{"role": "assistant", "content": "To effectively learn a new language: 1) Set specific, achievable goals—whether that's conversational fluency or reading comprehension. 2) Practice consistently rather than cramming; even 20 minutes daily beats 2 hours once weekly. 3) Use spaced repetition systems (like Anki) to efficiently memorize vocabulary. 4) Focus on high-frequency words first; learning the 1,000 most common words typically covers 80% of everyday usage. 5) Immerse yourself through podcasts, YouTube videos, and shows in your target language—start with subtitles and gradually wean off them. 6) Find language exchange partners through apps like Tandem or HelloTalk for authentic conversation practice. 7) Make mistakes confidently; perfectionism slows progress. 8) Learn phrases rather than isolated words to understand natural speech patterns. 9) Connect

## 3. Fine-tuning with DPO

### 3.1 Uploading the training and validation data to Azure OpenAI

In [5]:
client = AzureOpenAI(
  azure_endpoint = aoai_endpoint,
  api_key = aoai_api_key,
  api_version = "2025-02-01-preview"  # This API version or later is required to access seed/events/checkpoint features
)

In [4]:
training_response = client.files.create(
    file = open(training_file_path, "rb"), purpose="fine-tune"
)


training_file_id = training_response.id

validation_response = client.files.create(
    file = open(validation_file_path, "rb"), purpose="fine-tune"
)
validation_file_id = validation_response.id

print("Training file ID:", training_file_id)
print("Validation file ID:", validation_file_id)

Training file ID: file-4e2ddb4a63f346b7949cda7957cf238c
Validation file ID: file-6da4001fdd3241ad8e73a26ae8c510b8


### 3.2 Creating the fine tuning job

For each fine tuning job, you can specify the following hyperparameters:
```json
"hyperparameters": {
    "beta": 0.1,
    "batch_size": "auto",
    "learning_rate_multiplier": "auto",
    "n_epochs": "auto",
}
```

- `beta`: "auto" or number, is a new option that is only available for DPO. It's a floating point number between 0 and 2 that controls how strictly the new model will adhere to its previous behavior, versus aligning with the provided preferences. A high number will be more conservative (favoring previous behavior), and a lower number will be more aggressive (favor the newly provided preferences more often).
- `batch_size`: Number of examples in each batch. 
- `learning_rate_multiplier`: this will be used as the learning rate for the fine tuning job, as a multiple of the model's original learning rate. We recommend experimenting with values in the range 0.02 to 0.2 to see what produces the best results
- `epoch`:  The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.

The general recommendation is to initially train without specifying any of these, Azure OpenAI will pick a default for you based on dataset size, then adjusting based on results to find the ideal combination.

In [None]:
# Submit fine-tuning training job
response = client.fine_tuning.jobs.create(
    training_file = training_file_id,
    validation_file = validation_file_id,
    method={
        "type": "dpo",
        "dpo": {
            "hyperparameters": {"beta": 0.1},
        },
    },
    model = "gpt-4o-2024-08-06", # Enter base model name. Note that in Azure OpenAI the model name contains dashes and cannot contain dot/period characters.
    seed = 105 # seed parameter controls reproducibility of the fine-tuning job. If no seed is specified one will be generated automatically.
)

In [89]:
job_id = response.id

# You can use the job ID to monitor the status of the fine-tuning job.
# The fine-tuning job will take some time to start and complete.

print("Job ID:", response.id)
print("Status:", response.status)
print(response.model_dump_json(indent=2))

Job ID: ftjob-78e4dae35cc34c719fe7e1350d546f3b
Status: pending
{
  "id": "ftjob-78e4dae35cc34c719fe7e1350d546f3b",
  "created_at": 1741712013,
  "error": null,
  "fine_tuned_model": null,
  "finished_at": null,
  "hyperparameters": null,
  "model": "gpt-4o-2024-08-06",
  "object": "fine_tuning.job",
  "organization_id": null,
  "result_files": null,
  "seed": 105,
  "status": "pending",
  "trained_tokens": null,
  "training_file": "file-3e178c34b0894cc79b2126fd854b44fb",
  "validation_file": "file-1e10645abfa1431d888c1642905ab9b4",
  "estimated_finish": 1741713850,
  "integrations": null,
  "metadata": null,
  "method": {
    "dpo": {
      "hyperparameters": {
        "batch_size": -1,
        "beta": 0.1,
        "learning_rate_multiplier": 1.0,
        "n_epochs": -1,
        "l2_multiplier": 0
      }
    },
    "supervised": null,
    "type": "dpo"
  }
}


### 3.3 Monitor the fine tuning job

You can monitor your fine tuning job from this notebook or in the Azure OpenAI's new studio.

In studio, you can go to Tools > Fine-tuning > Click on your job.

We can also monitor the job from this notebook:

In [90]:
from IPython.display import clear_output
import time

start_time = time.time()

# Get the status of our fine-tuning job.
response = client.fine_tuning.jobs.retrieve(job_id)

status = response.status

# If the job isn't done yet, poll it every 10 seconds.
while status not in ["succeeded", "failed"]:
    time.sleep(5)

    response = client.fine_tuning.jobs.retrieve(job_id)
    print(response.model_dump_json(indent=2))
    print("Elapsed time: {} minutes {} seconds".format(int((time.time() - start_time) // 60), int((time.time() - start_time) % 60)))
    status = response.status
    print(f'Status: {status}')
    clear_output(wait=True)

print(f'Fine-tuning job {job_id} finished with status: {status}')

# List all fine-tuning jobs for this resource.
print('Checking other fine-tune jobs for this resource.')
response = client.fine_tuning.jobs.list()
print(f'Found {len(response.data)} fine-tune jobs.')

Fine-tuning job ftjob-78e4dae35cc34c719fe7e1350d546f3b finished with status: succeeded
Checking other fine-tune jobs for this resource.
Found 11 fine-tune jobs.


In [92]:
# Retrieve fine_tuned_model name
response = client.fine_tuning.jobs.retrieve(job_id)
fine_tuned_model = response.fine_tuned_model

In [93]:
print("Job ID:", response.id)
print("Status:", response.status)
print("Trained Tokens:", response.trained_tokens)

Job ID: ftjob-78e4dae35cc34c719fe7e1350d546f3b
Status: succeeded
Trained Tokens: 26810


In [94]:
response = client.fine_tuning.jobs.list_events(job_id)

events = response.data
events.reverse()

for event in events:
    print(event.message)

Preprocessing completed for file validation file.
Job started.
Training started.
Created results file: file-a5c2a6d373ee4e8a90e47318b65db41a
Step 1: training loss=0.5958462953567505
Step 10: training loss=0.5675941109657288
Step 20: training loss=0.5440755486488342
Step 30: training loss=0.5091820955276489
Step 40: training loss=0.47487473487854004
Step 50: training loss=0.43927979469299316
Step 60: training loss=0.41603386402130127
Step 70: training loss=0.40014028549194336
Step 80: training loss=0.3931383490562439
Step 90: training loss=0.38585132360458374
Step 100: training loss=0.3750596046447754
Job succeeded.
Postprocessing started.
Completed results file: file-a5c2a6d373ee4e8a90e47318b65db41a
Model Evaluation Passed.
Training tokens billed: 20000


In [95]:
response = client.fine_tuning.jobs.retrieve(job_id)
fine_tuned_model_id = response.fine_tuned_model

if fine_tuned_model_id is None:
    raise RuntimeError(
        "Fine-tuned model ID not found. Your job has likely not been completed yet."
    )

print("Fine-tuned model ID:", fine_tuned_model_id)

Fine-tuned model ID: gpt-4o-2024-08-06.ft-78e4dae35cc34c719fe7e1350d546f3b


## 4. Create a new deployment with the fine tuned model

When the fine-tuning job is done, it's time to deploy your customized model to make it available for use with completion calls. You can do it in the following two ways.

### 4.1 From the notebook
To create a new deployment from a notebook, you'll need an access token from Azure.
1. Firstly, Open a terminal and run:
2. Run login command: `az login`
    - you can add `-t <your Microsoft Entra tenant>` if you have multiple tenants
    - If it prompts to select a subscription id, you need to choose the right one to get login. or you can set subscription `az account set --subscription <name or id>`
3. get your access token for the deployment by: `az account get-access-token`

Paste the token `accessToken` in the below cell:

In [None]:
# Deploy fine-tuned model
import requests

token = "<Paste Your Token>"
subscription = os.getenv("AZURE_SUBSCRIPTION_ID")
resource_group = os.getenv("AZURE_RESOURCE_GROUP_NAME")
resource_name = os.getenv("AZURE_RESOURCE_NAME")

aoai_endpoint = os.getenv("AOAI_FINETUNING_ENDPOINT")
aoai_api_key = os.getenv("AOAI_FINETUNING_API_KEY")
model_deployment_name = os.getenv("FINETUNED_OPENAI_DEPLOYMENT")

deploy_params = {'api-version': "2024-10-01"}
deploy_headers = {'Authorization': 'Bearer {}'.format(token), 'Content-Type': 'application/json'}

deploy_data = {
    "sku": {"name": "standard", "capacity": 50},
    "properties": {
        "model": {
            "format": "OpenAI",
            "name": fine_tuned_model, #retrieve this value from the previous call, it will look like gpt-35-turbo-0613.ft-b044a9d3cf9c4228b5d393567f693b83
            "version": "1"
        }
    }
}

deploy_data = json.dumps(deploy_data)

print('Creating a new deployment...')
request_url = f'https://management.azure.com/subscriptions/{subscription}/resourceGroups/{resource_group}/providers/Microsoft.CognitiveServices/accounts/{resource_name}/deployments/{model_deployment_name}'
r = requests.put(request_url, params=deploy_params, headers=deploy_headers, data=deploy_data)

print(r)
print(r.reason)
print(r.json())

This would take minutes to have this model deployed.

#### 4.2 From the studio

On the fine tuning job page, click 'Deploy'

!["AOAI Deploy model"](./static/dpo-model-deploy.png)

## 5. Try your fine-tuned model

In [40]:
model_deployment_name = os.getenv("FINETUNED_OPENAI_DEPLOYMENT")

test_messages = [{'content': 'You are a helpful recipe assistant. ',
  'role': 'system'},
 {'content': 'What are some effective strategies for learning chinese?',
  'role': 'user'}]

response = client.chat.completions.create(
    model=model_deployment_name, messages=test_messages, temperature=0, max_tokens=500
)
print(response.choices[0].message.content)

Learning Chinese can be a rewarding but challenging endeavor. Here are some effective strategies to help you along the way:

1. **Set Clear Goals**: Define why you want to learn Chinese and set specific, achievable goals. This could be anything from being able to hold a basic conversation to passing a proficiency exam.

2. **Learn Pinyin**: Start with Pinyin, the Romanization of Chinese sounds, to help you with pronunciation and reading. Understanding Pinyin is crucial for learning how to pronounce words correctly.

3. **Focus on Tones**: Chinese is a tonal language, so mastering the four tones (five if you include the neutral tone) is essential. Practice listening and repeating tones regularly.

4. **Build a Strong Vocabulary**: Start with the most common words and phrases. Use flashcards, apps, or spaced repetition systems like Anki to help memorize vocabulary.

5. **Practice Speaking**: Engage in conversation with native speakers as much as possible. Language exchange partners, tuto

## 6. Evaluate model
During evaluation, your model or application is tested with the given dataset. Use Azure AI Evaluation SDK to assess the performance of your generative AI applications. Generative AI application generations are quantitatively measured with mathematical based metrics, AI-assisted quality and safety metrics. Metrics are defined as `evaluators`.

### 6.1 Load your test set

In [62]:
import pandas as pd

test_df = pd.read_json('./data/training_data/banking_test.jsonl', lines=True)
test_df.head(2)

Unnamed: 0,input,preferred_output,non_preferred_output
0,"{'messages': [{'role': 'user', 'content': 'Wri...","[{'role': 'assistant', 'content': 'To effectiv...","[{'role': 'assistant', 'content': 'Download Du..."


We will compare the fine tuned gpt-4o model with gpt-4o base but you could switch the baseline model to any model.

In [53]:
from dotenv import load_dotenv
import os
from openai import AzureOpenAI

# load_dotenv('my.env')
# run the base and finetuned models through the dataset
BASELINE_OPENAI_DEPLOYMENT = os.getenv("BASELINE_OPENAI_DEPLOYMENT")
BASELINE_OPENAI_ENDPOINT= os.getenv("BASELINE_OPENAI_ENDPOINT")
BASELINE_OPENAI_KEY=  os.getenv("BASELINE_OPENAI_KEY")

FINETUNED_OPENAI_DEPLOYMENT = os.getenv("FINETUNED_OPENAI_DEPLOYMENT")
FINETUNED_OPENAI_ENDPOINT = os.getenv("FINETUNED_OPENAI_ENDPOINT")
FINETUNED_OPENAI_KEY = os.getenv("FINETUNED_OPENAI_KEY")

baseline_client = AzureOpenAI(
    azure_endpoint=BASELINE_OPENAI_ENDPOINT, 
    api_key=BASELINE_OPENAI_KEY,
    api_version="2024-10-21"
    )

finetuned_client = AzureOpenAI(
    azure_endpoint=FINETUNED_OPENAI_ENDPOINT, 
    api_key=FINETUNED_OPENAI_KEY,
    api_version="2024-10-21"
    )

Define a function to generate texts from backend model

In [54]:
# get the predictions
def get_model_completions(client, input_messages, deployment):
    """
    This function generates a model completion from a given prompt using the OpenAI API.

    Parameters:
    client (openai.Client): The AzureOpenAI client being used.
    prompt (str): The prompt to be sent to the model for completion.
    deployment (str): The identifier of the model deployment to be used for completion.

    Returns:
    str: The completed message content from the model. If an exception occurs during the process, it returns None and prints the exception.
    """
    
    print(input_messages['messages'])

    try:
        response = client.chat.completions.create(
        messages=input_messages['messages'],
        model=deployment,
        temperature=0,
    )
    
        return response.choices[0].message.content

    except Exception as e:
        print(e)
        return None

In [63]:
from tqdm.notebook import tqdm

tqdm.pandas()

test_df['baseline_model_response'] = test_df.progress_apply(lambda x: get_model_completions(baseline_client, x.input, BASELINE_OPENAI_DEPLOYMENT), axis=1)
test_df['finetuned_model_response'] = test_df.progress_apply(lambda x: get_model_completions(finetuned_client, x.input, FINETUNED_OPENAI_DEPLOYMENT), axis=1)

  0%|          | 0/1 [00:00<?, ?it/s]

[{'role': 'user', 'content': 'Write a Poem on BlockChain?'}]


  0%|          | 0/1 [00:00<?, ?it/s]

[{'role': 'user', 'content': 'Write a Poem on BlockChain?'}]


### 6.2 Define evaluation metrics

This time we will use relevance evaluator to test "relevance" with a evaluation model.

In [64]:
from azure.ai.evaluation import evaluate, RelevanceEvaluator

# AI assisted quality evaluator
model_config = {
    "azure_endpoint": BASELINE_OPENAI_ENDPOINT,
    "api_key": BASELINE_OPENAI_KEY,
    "azure_deployment": BASELINE_OPENAI_DEPLOYMENT,
}

relevance_evaluator = RelevanceEvaluator(model_config)
result = relevance_evaluator(
    query=list(test_df['input'].tolist()[0]['messages'][0]['content']),
    response=test_df['finetuned_model_response'].tolist()
)
result

{'relevance': 5.0,
 'gpt_relevance': 5.0,
 'relevance_reason': 'The RESPONSE is a comprehensive poem that fully addresses the QUERY by providing a creative piece on Blockchain, along with additional insights into its significance and challenges.'}

Also you can evaluate with your base model.

In [65]:
result = relevance_evaluator(
    query=list(test_df['input'].tolist()[0]['messages'][0]['content']),
    response=test_df['baseline_model_response'].tolist()
)
result

{'relevance': 5.0,
 'gpt_relevance': 5.0,
 'relevance_reason': 'The RESPONSE is a comprehensive poem that fully addresses the QUERY by creatively exploring the concept of Blockchain, its features, and implications, providing additional insights.'}

Within Azure AI Evaluators, you can try a comprehensive approach to evaluation includes three key dimensions:

- Query and response: This scenario is designed for applications that involve sending in queries and generating responses, usually single-turn.
- Retrieval augmented generation: This scenario is suitable for applications where the model engages in generation using a retrieval-augmented approach to extract information from your provided documents and generate detailed responses, usually multi-turn.
- Custom evaluators: Tailored evaluation metrics can be designed to meet specific needs and goals, providing flexibility and precision in assessing unique aspects of AI-generated content. These custom evaluators allow for more detailed and specific analyses, addressing particular concerns or requirements that standard metrics might not cover.


## 7. Summary

In this notebook, we've explored how to fine-tune GPT-4o using Direct Preference Optimization (DPO) on Azure OpenAI. DPO fine-tuning leverages preference-based data (chosen vs. rejected responses) to align model outputs more closely with user expectations. Azure OpenAI simplifies this process by abstracting away infrastructure complexities, providing developers with an accessible and cost-effective managed service for customizing and deploying advanced AI models.

## 8. Reference

- [Azure OpenAI Service Documentation](https://learn.microsoft.com/azure/ai-services/openai/)
- [Direct Preference Optimization (DPO) Paper](https://arxiv.org/abs/2305.18290)
- [Fine-tuning Azure OpenAI models](https://learn.microsoft.com/azure/ai-services/openai/how-to/fine-tuning)
- [Azure OpenAI Service Documentation](https://learn.microsoft.com/azure/ai-services/openai/)
- [SFT AOAI Repo](https://github.com/Azure-Samples/azure-openai-raft.git)