# **Train LLM Judge FT**

In this notebook, we focus on **training the LLM Judge using fine-tuning (FT)**. This involves importing the required libraries, reading the training and validation data, creating the fine-tuning model, submitting the fine-tuning job, tracking the job status, and deploying the fine-tuned model.

### Objectives:
- **Import Libraries:** Import the necessary libraries for fine-tuning and evaluation.
- **Read Data:** Read the training and validation data from JSON Lines files.
- **Create Fine-Tuning Model:** Create the fine-tuning model using Azure OpenAI.
- **Submit Fine-Tuning Job:** Submit the fine-tuning job to Azure OpenAI.
- **Track Job Status:** Track the status of the fine-tuning job.
- **Deploy Model:** Deploy the fine-tuned model for inference.
- **Test Inference:** Test the deployed model to ensure it works as expected.

### Key Steps:
1. **Import Libraries:** Import the necessary libraries for fine-tuning and evaluation.
2. **Read Data:** Read the training and validation data from JSON Lines files.
3. **Create Fine-Tuning Model:** Create the fine-tuning model using Azure OpenAI.
4. **Submit Fine-Tuning Job:** Submit the fine-tuning job to Azure OpenAI.
5. **Track Job Status:** Track the status of the fine-tuning job.
6. **Deploy Model:** Deploy the fine-tuned model for inference.
7. **Test Inference:** Test the deployed model to ensure it works as expected.

This notebook ensures that the LLM Judge is trained, fine-tuned, and deployed effectively, providing insights into its performance and areas for improvement.

### **Import Required Libraries**

In [1]:
# !pip install azure-ai-evaluation

In [None]:
import json
import os
import datetime

import pandas as pd

from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from openai import AzureOpenAI

import dotenv
dotenv.load_dotenv(".env")

True

### **Read training and validation data**

In [3]:
def read_jsonl(file_path):
    data = []
    with open(file_path, 'r') as f:
        for line in f:
            data.append(json.loads(line))
    return data

In [4]:
train_path = 'data/ft-judge/single/train.jsonl'
val_path = 'data/ft-judge/single/val.jsonl'

# Example usage:
train_data = read_jsonl(train_path)
val_data = read_jsonl(val_path)

### **Create FT Model**

**Upload data to Data Files in Azure OpenAI Studio**

In [5]:
client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_API_BASE"),
  api_key = os.getenv("AZURE_OPENAI_API_KEY"),
  api_version = os.getenv("AZURE_OPENAI_API_VERSION")  # 2024-08-01-preview API version or later is required to access seed/events/checkpoint features
)

In [6]:
# Upload the training and validation dataset files to Azure OpenAI with the SDK.
training_response = client.files.create(
    file = open(train_path, "rb"), purpose="fine-tune"
)
training_file_id = training_response.id

validation_response = client.files.create(
    file = open(val_path, "rb"), purpose="fine-tune"
)
validation_file_id = validation_response.id

print("Training file ID:", training_file_id)
print("Validation file ID:", validation_file_id)

Training file ID: file-1092e1e3e80342b388ab41c4f12cb1d9
Validation file ID: file-be1e0d5755664ce58abff11152710118


**Submit fine-tuning job**

In [None]:
# Submit fine-tuning training job
response = client.fine_tuning.jobs.create(
    training_file = training_file_id,
    validation_file = validation_file_id,
    model = "gpt-4o-mini-2024-07-18", # Enter base model name. Note that in Azure OpenAI the model name contains dashes and cannot contain dot/period characters.
    seed = 23,
    suffix = "demo",
    hyperparameters = {
        "batch_size": 1,
        "learning_rate_multiplier": 1.0,
        "n_epochs": 1    
    }
)

job_id = response.id

# You can use the job ID to monitor the status of the fine-tuning job.
# The fine-tuning job will take some time to start and complete.

print("Job ID:", response.id)
print("Status:", response.status)
print(response.model_dump_json(indent=2))

Job ID: ftjob-b8bb0ab2fb6b432194e150f916d2905d
Status: pending
{
  "id": "ftjob-b8bb0ab2fb6b432194e150f916d2905d",
  "created_at": 1739038750,
  "error": null,
  "fine_tuned_model": null,
  "finished_at": null,
  "hyperparameters": {
    "batch_size": 1,
    "learning_rate_multiplier": 1.0,
    "n_epochs": 1
  },
  "model": "gpt-4o-mini-2024-07-18",
  "object": "fine_tuning.job",
  "organization_id": null,
  "result_files": null,
  "seed": 23,
  "status": "pending",
  "trained_tokens": null,
  "training_file": "file-1092e1e3e80342b388ab41c4f12cb1d9",
  "validation_file": "file-be1e0d5755664ce58abff11152710118",
  "estimated_finish": 1739040744,
  "integrations": null,
  "method": null,
  "suffix": "sdvk-demo"
}


**Track job**

In [10]:
# Track training status

from IPython.display import clear_output
import time

start_time = time.time()

# Get the status of our fine-tuning job.
response = client.fine_tuning.jobs.retrieve(job_id)

status = response.status

# If the job isn't done yet, poll it every 10 seconds.
while status not in ["succeeded", "failed"]:
    time.sleep(10)

    response = client.fine_tuning.jobs.retrieve(job_id)
    print(response.model_dump_json(indent=2))
    print("Elapsed time: {} minutes {} seconds".format(int((time.time() - start_time) // 60), int((time.time() - start_time) % 60)))
    status = response.status
    print(f'Status: {status}')
    clear_output(wait=True)

print(f'Fine-tuning job {job_id} finished with status: {status}')

# List all fine-tuning jobs for this resource.
print('Checking other fine-tune jobs for this resource.')
response = client.fine_tuning.jobs.list()
print(f'Found {len(response.data)} fine-tune jobs.')

Fine-tuning job ftjob-b8bb0ab2fb6b432194e150f916d2905d finished with status: succeeded
Checking other fine-tune jobs for this resource.
Found 6 fine-tune jobs.


**List fine-tuning events**

In [11]:
response = client.fine_tuning.jobs.list_events(fine_tuning_job_id=job_id, limit=10)
print(response.model_dump_json(indent=2))

{
  "data": [
    {
      "id": "ftevent-9f0742d7b5d947d698185405c8ae6aaa",
      "created_at": 1739043423,
      "level": "info",
      "message": "Training tokens billed: 1686000",
      "object": "fine_tuning.job.event",
      "data": null,
      "type": "message"
    },
    {
      "id": "ftevent-5e79f80953304262a1883733f6522bef",
      "created_at": 1739043423,
      "level": "info",
      "message": "Model Evaluation Passed.",
      "object": "fine_tuning.job.event",
      "data": null,
      "type": "message"
    },
    {
      "id": "ftevent-3040b6e4ab95450aa49b2edff5f660cb",
      "created_at": 1739043423,
      "level": "info",
      "message": "Completed results file: file-1723b618431d4226a7cd2f74fa59d5a8",
      "object": "fine_tuning.job.event",
      "data": null,
      "type": "message"
    },
    {
      "id": "ftevent-d07c34c9d2284c99bc8376f823d41f9b",
      "created_at": 1739043318,
      "level": "info",
      "message": "Job succeeded.",
      "object": "fine_tuning

**List checkpoints**

Each epoch creates a checkpoint

In [12]:
response = client.fine_tuning.jobs.checkpoints.list(job_id)
print(response.model_dump_json(indent=2))

{
  "data": [
    {
      "id": "ftchkpt-487c4ece012c4c7fb5261cdea74b845b",
      "created_at": 1739042993,
      "fine_tuned_model_checkpoint": "gpt-4o-mini-2024-07-18.ft-b8bb0ab2fb6b432194e150f916d2905d-sdvk-demo",
      "fine_tuning_job_id": "ftjob-b8bb0ab2fb6b432194e150f916d2905d",
      "metrics": {
        "full_valid_loss": 0.4858631105015611,
        "full_valid_mean_token_accuracy": 0.6069046225863077,
        "step": 1756.0,
        "train_loss": 0.24323470890522003,
        "train_mean_token_accuracy": 0.8928571343421936,
        "valid_loss": 0.6316952359849128,
        "valid_mean_token_accuracy": 0.8260869565217391
      },
      "object": "fine_tuning.job.checkpoint",
      "step_number": 1756
    }
  ],
  "has_more": false,
  "object": "list"
}


**Retrieve final training results**

In [13]:
# Retrieve fine_tuned_model name

response = client.fine_tuning.jobs.retrieve(job_id)

print(response.model_dump_json(indent=2))
fine_tuned_model = response.fine_tuned_model

{
  "id": "ftjob-b8bb0ab2fb6b432194e150f916d2905d",
  "created_at": 1739038750,
  "error": null,
  "fine_tuned_model": "gpt-4o-mini-2024-07-18.ft-b8bb0ab2fb6b432194e150f916d2905d-sdvk-demo",
  "finished_at": 1739043423,
  "hyperparameters": {
    "batch_size": 1,
    "learning_rate_multiplier": 1.0,
    "n_epochs": 1
  },
  "model": "gpt-4o-mini-2024-07-18",
  "object": "fine_tuning.job",
  "organization_id": null,
  "result_files": [
    "file-1723b618431d4226a7cd2f74fa59d5a8"
  ],
  "seed": 23,
  "status": "succeeded",
  "trained_tokens": 2144668,
  "training_file": "file-1092e1e3e80342b388ab41c4f12cb1d9",
  "validation_file": "file-be1e0d5755664ce58abff11152710118",
  "estimated_finish": 1739040744,
  "integrations": null,
  "method": null,
  "suffix": "sdvk-demo"
}


**Deploy model**

In [None]:
# Deploy fine-tuned model

import json
import requests

token_credential = DefaultAzureCredential()
token = token_credential.get_token('https://management.azure.com/.default')


# token = os.getenv("TEMP_AUTH_TOKEN")
subscription = os.getenv("SUB_ID")
resource_group = os.getenv("RG_NAME")
resource_name = "aoai-povel"
model_deployment_name = "gpt-4o-mini-judge" # the deployment name of your llm judge 

deploy_params = {'api-version': "2023-05-01"}
deploy_headers = {'Authorization': 'Bearer {}'.format(token.token), 'Content-Type': 'application/json'}

deploy_data = {
    "sku": {"name": "standard", "capacity": 1},
    "properties": {
        "model": {
            "format": "OpenAI",
            "name": fine_tuned_model, #retrieve this value from the previous call, it will look like gpt-4o-mini-2024-07-18.ft-0e208cf33a6a466994aff31a08aba678
            "version": "1"
        }
    }
}
deploy_data = json.dumps(deploy_data)

request_url = f'https://management.azure.com/subscriptions/{subscription}/resourceGroups/{resource_group}/providers/Microsoft.CognitiveServices/accounts/{resource_name}/deployments/{model_deployment_name}'

print('Creating a new deployment...')

r = requests.put(request_url, params=deploy_params, headers=deploy_headers, data=deploy_data)

print(r)
print(r.reason)
print(r.json())

Creating a new deployment...
<Response [201]>
Created
{'id': '/subscriptions/4ce7ad8d-95ed-4652-bd4a-5f2af19d29cb/resourceGroups/rg-povel/providers/Microsoft.CognitiveServices/accounts/aoai-povel/deployments/gpt-4o-mini-2024-07-18-sdvk-demo-3', 'type': 'Microsoft.CognitiveServices/accounts/deployments', 'name': 'gpt-4o-mini-2024-07-18-sdvk-demo-3', 'sku': {'name': 'standard', 'capacity': 1}, 'properties': {'model': {'format': 'OpenAI', 'name': 'gpt-4o-mini-2024-07-18.ft-b8bb0ab2fb6b432194e150f916d2905d-sdvk-demo', 'version': '1'}, 'versionUpgradeOption': 'NoAutoUpgrade', 'capabilities': {'area': 'EUR', 'chatCompletion': 'true', 'jsonObjectResponse': 'true', 'maxContextToken': '128000', 'maxOutputToken': '16384', 'assistants': 'true'}, 'provisioningState': 'Creating', 'rateLimits': [{'key': 'request', 'renewalPeriod': 10, 'count': 1}, {'key': 'token', 'renewalPeriod': 60, 'count': 1000}]}, 'systemData': {'createdBy': 'povelf@microsoft.com', 'createdByType': 'User', 'createdAt': '2025-02

**Test inference**

In [17]:
client = AzureOpenAI(
    azure_endpoint = os.getenv("AZURE_OPENAI_API_BASE"),
    api_key = os.getenv("AZURE_OPENAI_API_KEY"),
    api_version = os.getenv("AZURE_OPENAI_API_VERSION")
)

response = client.chat.completions.create(
    model = model_deployment_name,
    messages = [
        {"role": "system", "content": "You are an AI assistant helping a student with their homework."},
        {"role": "user", "content": "What is the capital of France?"}
    ] 
)

response.choices[0].message.content

'The capital of France is Paris.'