In [1]:
# DSC670 - Week6 - Exercise  - Tuning and Adapting LLMs

Building a dataset can take quite a bit of time and that‚Äôs not really what this class is about. For this assignment, follow the book‚Äôs example (Chapter 9) but adapt it for OpenAI instead of Azure OpenAI using the Fine-tuning ‚Äì OpenAI API example provided by OpenAI. You‚Äôll be adapting code from Azure OpenAI to just OpenAI and you‚Äôll need to use the REST endpoints to check the model training process using the Python requests package. The training job runs on OpenAI servers, so don‚Äôt worry, your PC won‚Äôt get bogged down during the (long) training process.



As an aside, it‚Äôs worth noting that the base gpt-3.5-turbo model can already return emojis, this fine-tuning has it return ‚Äúemojis‚Äù in a specific way ‚Äì e.g., ‚Äú(devil)‚Äù instead of the devil emoji.



When your model is trained, you‚Äôll get the model's name from the following endpoint (which is in the OpenAI article):

'https://api.openai.com/v1/fine_tuning/jobs/your_training_job_id'



In addition to training the model, you‚Äôll also want to test the model. Since that code doesn‚Äôt seem to exist anywhere in either the book or the referenced web page, here it is: Week 6 Exer

completion = client.chat.completions.create(
  model="YOUR MODEL NAME",
  messages=[
    {"role": "system", "content": "You're a chatbot that only responds with emojis!"},
    {"role": "user", "content": "What the hell is going on?"}
  ],
  max_tokens=50
)
print(completion.choices[0].message.content)cise Code.txt 

# Overview
Fine-tuning is a techinque to improve the model's performance on a specific task.
Here we want to make an EmojiBot by fine tuning using GPT-3.5 Turbo. This bot can understand what we are asking but respond only using emojis.

In [4]:
# Installing rquired libraries, requests library is used to all API.
#pip install requests


In [5]:
# Step 1  Create an OpenAI client with API key

## Load required libraries
from openai import OpenAI
import requests
import json
import os
from dotenv import load_dotenv

#Load variables from .env file into environment
load_dotenv()

#Create an OpenAI client with API key stored in env file
client = OpenAI(
    # Load the api key securely from env file.
    api_key=os.getenv("OPENAI_API_KEY") 
)

api_key=os.getenv("OPENAI_API_KEY") 
headers = {
    "Authorization": f"Bearer {api_key}"
}

# Find if any job is running or not
response = requests.get("https://api.openai.com/v1/fine_tuning/jobs", headers=headers)
#print(json.dumps(response.json(), indent=4))

In [6]:
# Cancel the job
# Convert API response to JSON
result = response.json()
print("Training file ID:", result.get("id"))
job_id = result.get("id")

cancel_response = requests.post(
    f"https://api.openai.com/v1/fine_tuning/jobs/{job_id}/cancel",
    headers=headers
)
print(json.dumps(cancel_response.json(), indent=4))

Training file ID: None
{
    "error": {
        "message": "Could not find fine tune: None",
        "type": "invalid_request_error",
        "param": "fine_tune_id",
        "code": "fine_tune_not_found"
    }
}


## This code uploads your custom dataset to OpenAI‚Äôs servers to prepare it for fine-tuning a language model ‚Äî a crucial first step in adapting a foundation model to perform a specific task, like generating emojis from text.

## Data Curation
In this example, we alreay have a perfect json file so data curation step is already completed.
Write a code to validate and see the data from the json file

In [9]:
import json

# File name
TRAINING_FILENAME = "emoji_ft_train.json"

# Open and read the JSONL file
with open(TRAINING_FILENAME, "r", encoding="utf-8") as f:
    for line in f:
        record = json.loads(line)
        #print(json.dumps(record, indent=4))  # Pretty print each record


## LLM evaluation

We will do manual evaluation before fine tuning the model

In [11]:
import requests

data = {
    "model": "gpt-3.5-turbo-0125",  # Now i don't have my organization model so i am using "gpt-3.5-turbo-0125"
     "messages": [
      {"role": "system", "content": "You're a chatbot that only responds with emojis!"},
        {"role": "user", "content": "What the hell is going on?"}
    ]
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=data)
## Print the response I am geting before fine tuning the model
print(response.json()["choices"][0]["message"]["content"])


ü§îü§∑‚Äç‚ôÇÔ∏èüåÄ


## Fine Tuning
There are 2 steps to perform when we need to fine-tune. First, upload the dataset, 
each fileset will gets it's own unique id. This file ID will be used to pass as one of the parameters to the fine-tuning job.

In [13]:
# Upload a file and get the file id
TRAINING_FILENAME = 'emoji_ft_train.json'

files = {
    "purpose": (None, "fine-tune"), # tells the API that this file is intended for fine-tuning a model ‚Äî not for embeddings, classification, or another purpose.
    "file": open(TRAINING_FILENAME, "rb") # opens the training file in binary read mode (rb) so it can be uploaded via HTTP.
}

#save the response received from HTTP POST request to the OpenAI File Upload API endpoint
response = requests.post("https://api.openai.com/v1/files", headers=headers, files=files)
# prints the JSON response from the OpenAI API.
print(response.json())

{'object': 'file', 'id': 'file-ETj91rQ6bc2fFHqarE2doF', 'purpose': 'fine-tune', 'filename': 'emoji_ft_train.json', 'bytes': 120319, 'created_at': 1760929926, 'expires_at': None, 'status': 'processed', 'status_details': None}


Check the status from above output. Status is 'processed' means it's ready for fine-tuning.

In [15]:
# Convert API response to JSON
result = response.json()

# Get the id from the response and use it as file id
file_id = result.get("id")

# file_id - which we‚Äôll now use to tell OpenAI which file to fine-tune on.
json_data = {
    "training_file": file_id,
    "model": "gpt-3.5-turbo-0125",
       "hyperparameters": { # Controls training behavior, like how many epochs to run
        "n_epochs": 3 # The model cycles through the data three times.
    },
    "suffix": "emoji" # custom name or nick name for the fine-tuned model.
}

# Making an API call to file tuning the job
response = requests.post(
    "https://api.openai.com/v1/fine_tuning/jobs",
    headers={**headers, "Content-Type": "application/json"},
    json=json_data
)
# Convert API response to JSON
result = response.json()

print(response.json())

{'object': 'fine_tuning.job', 'id': 'ftjob-FXMNsQFDCDId1pKrcYCU4epO', 'model': 'gpt-3.5-turbo-0125', 'created_at': 1760929928, 'finished_at': None, 'fine_tuned_model': None, 'organization_id': 'org-blkiZcCSUXdrQCZhWzK3lS8X', 'result_files': [], 'status': 'validating_files', 'validation_file': None, 'training_file': 'file-ETj91rQ6bc2fFHqarE2doF', 'hyperparameters': {'n_epochs': 3, 'batch_size': 'auto', 'learning_rate_multiplier': 'auto'}, 'trained_tokens': None, 'error': {}, 'user_provided_suffix': 'emoji', 'seed': 2040918167, 'estimated_finish': None, 'integrations': [], 'metadata': None, 'usage_metrics': None, 'shared_with_openai': False, 'eval_id': None, 'method': {'type': 'supervised', 'supervised': {'hyperparameters': {'batch_size': 'auto', 'learning_rate_multiplier': 'auto', 'n_epochs': 3}}}}


In [16]:
# extracts the fine-tuning job ID returned from the training creation step.
job_id = result.get("id")

response = requests.get(
    f"https://api.openai.com/v1/fine_tuning/jobs/{job_id}",
    headers=headers
)
print(response.json())
# Convert API response to JSON
result = response.json()


# Pretty print the JSON result
print(json.dumps(response.json(), indent=4))


print("Training file ID:", result.get("id"))
print("Training file name:", result.get("model"))

{'object': 'fine_tuning.job', 'id': 'ftjob-FXMNsQFDCDId1pKrcYCU4epO', 'model': 'gpt-3.5-turbo-0125', 'created_at': 1760929928, 'finished_at': None, 'fine_tuned_model': None, 'organization_id': 'org-blkiZcCSUXdrQCZhWzK3lS8X', 'result_files': [], 'status': 'validating_files', 'validation_file': None, 'training_file': 'file-ETj91rQ6bc2fFHqarE2doF', 'hyperparameters': {'n_epochs': 3, 'batch_size': 'auto', 'learning_rate_multiplier': 'auto'}, 'trained_tokens': None, 'error': {}, 'user_provided_suffix': 'emoji', 'seed': 2040918167, 'estimated_finish': None, 'integrations': [], 'metadata': None, 'usage_metrics': None, 'shared_with_openai': False, 'eval_id': None, 'method': {'type': 'supervised', 'supervised': {'hyperparameters': {'n_epochs': 3, 'batch_size': 'auto', 'learning_rate_multiplier': 'auto'}}}}
{
    "object": "fine_tuning.job",
    "id": "ftjob-FXMNsQFDCDId1pKrcYCU4epO",
    "model": "gpt-3.5-turbo-0125",
    "created_at": 1760929928,
    "finished_at": null,
    "fine_tuned_model"

## Validating Fine tuned model
### Result 1

In [18]:
fine_tuned_model = result.get("model")

json_data = {
    "model": fine_tuned_model,
    "messages": [
      {"role": "system", "content": "You're a chatbot that only responds with emojis!"},
      {"role": "user", "content": "What the hell is going on?"}
    ]
}

response = requests.post(
    "https://api.openai.com/v1/chat/completions",
    headers={**headers, "Content-Type": "application/json"},
    json=json_data
)

# Pretty print the JSON result
print(json.dumps(response.json(), indent=4))



{
    "id": "chatcmpl-CSabak8keY4qSqnTLdbNedR7quLuN",
    "object": "chat.completion",
    "created": 1760929930,
    "model": "gpt-3.5-turbo-0125",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\ud83e\udd14\ud83e\uddd0\ud83e\udd37\u200d\u2642\ufe0f",
                "refusal": null,
                "annotations": []
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 29,
        "completion_tokens": 14,
        "total_tokens": 43,
        "prompt_tokens_details": {
            "cached_tokens": 0,
            "audio_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0,
            "audio_tokens": 0,
            "accepted_prediction_tokens": 0,
            "rejected_prediction_tokens": 0
        }
    },
    "service_tier": "default",
    "system_fingerprint": null
}


In [19]:
print(response.json()["choices"][0]["message"]["content"])

ü§îüßêü§∑‚Äç‚ôÇÔ∏è


### Result 2

In [21]:
json_data = {
    "model": fine_tuned_model,
    "messages": [
      {"role": "system", "content": "You're a chatbot that only responds with emojis!"},
      {"role": "user", "content": " The movie was scary and at the same time it was comedy"}
    ]
}

response = requests.post(
    "https://api.openai.com/v1/chat/completions",
    headers={**headers, "Content-Type": "application/json"},
    json=json_data
)

# Pretty print the JSON result
#print(json.dumps(response.json(), indent=4))

In [22]:
print(response.json()["choices"][0]["message"]["content"])


üò±üòÇ


## Summary:
‚Äúemoji‚Äù fine-tuned model has learned to talk like our dataset ‚Äî it now responds with emojis automatically because that‚Äôs how it has been trained.