# DPO Fine-Tuning with Intel Orca Dataset on Azure AI

This notebook demonstrates how to fine-tune language models using **Direct Preference Optimization (DPO)** with the Intel Orca DPO Pairs dataset.

## What You'll Learn
1. Understand DPO fine-tuning
2. Prepare and format DPO training data  
3. Upload datasets to Azure AI
4. Create and monitor a DPO fine-tuning job
5. Evaluate your fine-tuned model

## 1. Setup and Installation

Install all required packages from requirements.txt

In [8]:
pip install -r requirements.txt

Collecting openai (from -r requirements.txt (line 4))
  Downloading openai-2.9.0-py3-none-any.whl.metadata (29 kB)
Collecting anyio<5,>=3.5.0 (from openai->-r requirements.txt (line 4))
  Downloading anyio-4.12.0-py3-none-any.whl.metadata (4.3 kB)
Collecting distro<2,>=1.7.0 (from openai->-r requirements.txt (line 4))
  Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting httpx<1,>=0.23.0 (from openai->-r requirements.txt (line 4))
  Using cached httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.10.0 (from openai->-r requirements.txt (line 4))
  Downloading jiter-0.12.0-cp311-cp311-win_amd64.whl.metadata (5.3 kB)
Collecting pydantic<3,>=1.9.0 (from openai->-r requirements.txt (line 4))
  Downloading pydantic-2.12.5-py3-none-any.whl.metadata (90 kB)
     ---------------------------------------- 0.0/90.6 kB ? eta -:--:--
     ------------------ --------------------- 41.0/90.6 kB 1.9 MB/s eta 0:00:01
     ------------------------------ ------- 71.7/


[notice] A new release of pip is available: 24.0 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


## 2. Import Libraries

In [9]:
import os
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient

print(" All libraries imported successfully")

 All libraries imported successfully


## 3. Configure Azure Environment
Set your Azure AI Project endpoint and model name. We're using **gpt-4o-mini** in this example, but you can use other supported GPT models. Create a `.env` file with: 

```
AZURE_AI_PROJECT_ENDPOINT=<your-endpoint> 
MODEL_NAME=gpt-4o-mini
AZURE_SUBSCRIPTION_ID=<your-subscription-id>
AZURE_RESOURCE_GROUP=<your-resource-group>
AZURE_AOAI_ACCOUNT=<your-foundry-account-name>
```

In [10]:
# Load environment variables
load_dotenv()

endpoint = os.environ.get("AZURE_AI_PROJECT_ENDPOINT")
model_name = os.environ.get("MODEL_NAME")

# Define dataset file paths
training_file_path = "training.jsonl"
validation_file_path = "validation.jsonl"

print(f" Endpoint: {endpoint}")
print(f" Model: {model_name}")

 Endpoint: https://foundrysdk-eastus2-foundry-resou.services.ai.azure.com/api/projects/foundrysdk-eastus2-project
 Model: gpt-4o-mini


## 4. Connect to Azure AI Project

Connect to Azure AI Project using Azure credential authentication. This initializes the project client and OpenAI client needed for fine-tuning workflows. Ensure you have the **Azure AI User** role assigned to your account for the Azure AI Project resource.

In [11]:
credential = DefaultAzureCredential()
project_client = AIProjectClient(endpoint=endpoint, credential=credential)
openai_client = project_client.get_openai_client()

print(" Connected to Azure AI Project")

 Connected to Azure AI Project


## 5. Upload Training Files

Upload the training and validation JSONL files to Azure AI. Each file is assigned a unique ID that will be referenced when creating the fine-tuning job.

In [12]:
print("Uploading training file...")
with open(training_file_path, "rb") as f:
    train_file = openai_client.files.create(file=f, purpose="fine-tune")
print(f" Training file ID: {train_file.id}")

print("\nUploading validation file...")
with open(validation_file_path, "rb") as f:
    validation_file = openai_client.files.create(file=f, purpose="fine-tune")
print(f" Validation file ID: {validation_file.id}")

Uploading training file...
 Training file ID: file-b137633d81d54f0dbf19103c0a76214b

Uploading validation file...
 Validation file ID: file-577ad4366204477486a29751fbdfb93c


In [13]:
print("Waiting for files to be processed...")
openai_client.files.wait_for_processing(train_file.id)
openai_client.files.wait_for_processing(validation_file.id)
print(" Files ready!")

Waiting for files to be processed...
 Files ready!


## 8. Create DPO Fine-Tuning Job
Create a DPO fine-tuning job with your uploaded datasets. Configure the following hyperparameters to control the training process:

1. n_epochs (3): Number of complete passes through the training dataset. More epochs can improve performance but may lead to overfitting. Typical range: 1-10.
2. batch_size (1): Number of training examples processed together in each iteration. Smaller batches (1-2) are common for DPO to maintain training stability.
3. learning_rate_multiplier (1.0): Scales the default learning rate. Values < 1.0 make training more conservative, while values > 1.0 speed up learning but may cause instability. Typical range: 0.1-2.0.
Adjust these values based on your dataset size and desired model behavior. 

Start with these defaults and experiment if needed.

In [14]:
fine_tuning_job = openai_client.fine_tuning.jobs.create(
    training_file=train_file.id,
    validation_file=validation_file.id,
    model=model_name,
    method={
        "type": "dpo",
        "dpo": {
            "hyperparameters": {
                "n_epochs": 3,
                "batch_size": 1,
                "learning_rate_multiplier": 1.0
            }
        }
    },
    extra_body={"trainingType": "Standard"}
)

print(f" Job ID: {fine_tuning_job.id}")
print(f"Status: {fine_tuning_job.status}")

 Job ID: ftjob-6aa173fb0c9d4c44b2fb09d9389db5e7
Status: pending


## 9. Monitor Training Progress
Check the status of your fine-tuning job and track progress. You can view the current status, and recent training events. Training duration varies based on dataset size, model, and hyperparameters - typically ranging from minutes to several hours.

In [22]:
job_status = openai_client.fine_tuning.jobs.retrieve(fine_tuning_job.id)
print(f"Status: {job_status.status}")

Status: pending


In [23]:
# View recent events
events = list(openai_client.fine_tuning.jobs.list_events(fine_tuning_job.id, limit=10))
for event in events:
    print(event.message)

Jobs ahead in queue: 6
Job enqueued. Waiting for jobs ahead to complete.


## 10. Retrieve Fine-Tuned Model
After the fine-tuning job succeeded, retrieve the fine-tuned model ID. This ID is required to make inference calls with your customized model.

In [24]:
completed_job = openai_client.fine_tuning.jobs.retrieve(fine_tuning_job.id)

if completed_job.status == "succeeded":
    fine_tuned_model_id = completed_job.fine_tuned_model
    print(f" Fine-tuned Model ID: {fine_tuned_model_id}")
else:
    print(f"Status: {completed_job.status}")

Status: pending


## 11. Deploy the fine-tuned Model

Deploy the fine-tuned model to Azure OpenAI as a deployment endpoint. This step is required before making inference calls. The deployment uses GlobalStandard SKU with 50 TPM capacity.

In [None]:
from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient
from azure.mgmt.cognitiveservices.models import Deployment, DeploymentProperties, DeploymentModel, Sku
import time

subscription_id = os.environ.get("AZURE_SUBSCRIPTION_ID")
resource_group = os.environ.get("AZURE_RESOURCE_GROUP")
account_name = os.environ.get("AZURE_AOAI_ACCOUNT")

deployment_name = "gpt-4o-mini-dpo-finetuned"

with CognitiveServicesManagementClient(credential=credential, subscription_id=subscription_id) as cogsvc_client:
    deployment_model = DeploymentModel(format="OpenAI", name=fine_tuned_model_id, version="1")
    deployment_properties = DeploymentProperties(model=deployment_model)
    deployment_sku = Sku(name="GlobalStandard", capacity=50)
    deployment_config = Deployment(properties=deployment_properties, sku=deployment_sku)
    
    print(f"Deploying fine-tuned model: {fine_tuned_model_id}")
    deployment = cogsvc_client.deployments.begin_create_or_update(
        resource_group_name=resource_group,
        account_name=account_name,
        deployment_name=deployment_name,
        deployment=deployment_config,
    )
    
    while deployment.status() not in ["Succeeded", "Failed"]:
        time.sleep(30)
        print(f"Deployment status: {deployment.status()}")

print(f" Model deployment completed: {deployment_name}")

## 12. Test Your Fine-Tuned Model

Validate your fine-tuned model by running test inferences. This helps you assess whether the DPO training successfully aligned the model with your preferred response patterns from the training data

In [None]:
print(f"Testing fine-tuned model via deployment: {deployment_name}")

response = openai_client.responses.create(
    model=deployment_name,
    input=[{"role": "user", "content": "Explain machine learning in simple terms."}]
)

print(f"Model response: {response.output_text}")

## 12. Next Steps

Congratulations! You've successfully fine-tuned a model with DPO.

### What's Next?
- Deploy your model to production
- Evaluate on more test cases
- Experiment with hyperparameters
- Try different datasets