# OpenAI Fine-tuning API Flow Example
This notebook provides a baseline example of how to use the OpenAI Fine-tuning API to fine-tune a model using a custom dataset. Included is a process for creating training and testing data in JASONL format from a CSV file and how to fine-tune and use a model using the data.
## Setup
To use this notebook, you will need to have an OpenAI API key. You can generate one at [platform.openai.com](https://platform.openai.com/). Save the key in a file called `.env` in the same directory as this notebook under the variable name `OPENAI_API_KEY`.

## Documentation
- OpenAI Fine-tuning Guide: https://platform.openai.com/docs/guides/fine-tuning
- OpenAI Fine-tuning API reference:  https://beta.openai.com/docs/api-reference/fine-tuning/

## Sources
This example borrows liberally from ["How to fine-tune chat models"](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_finetune_chat_models.ipynb) by OpenAI.

In [9]:
# Import necessary libraries
import os
import pandas as pd
from openai import OpenAI
from dotenv import load_dotenv

# Load the .env file
load_dotenv()

# OpenAI API key
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))

In [10]:
# Data setup
data_folder = "data"
training_file_name = os.path.join(data_folder, "review_finetune_training.jsonl")
validation_file_name = os.path.join(data_folder, "review_finetune_validation.jsonl")

## 1. Upload files

Use the `client.files.create()` method to upload the training and validation data files to the OpenAI API. For each uploaded file, the endpoint will return a file ID that you can use to reference the file in subsequent API calls.

__NOTE:__ The validation file is optional.

API documentation: https://platform.openai.com/docs/api-reference/files/create

In [11]:
with open(training_file_name, "rb") as training_fd:
    training_response = client.files.create(
        file=training_fd, purpose="fine-tune"
    )

training_file_id = training_response.id

with open(validation_file_name, "rb") as validation_fd:
    validation_response = client.files.create(
        file=validation_fd, purpose="fine-tune"
    )
validation_file_id = validation_response.id

print("Training file ID:", training_file_id)
print("Validation file ID:", validation_file_id)

Training file ID: file-I207HvFD4R6CEhjhURkIQn8H
Validation file ID: file-cmpTcTZnqapeDwWQRt7fNNxv


## 2. Create fine-tuning job
Use the `client.fine_tuning.jobs.create()` method to create a fine-tuning job. The method accepts the following parameters:
- `model`: The model ID of the model you want to fine-tune.
- `training_file`: The ID of the training file you uploaded in the previous step.
- `validation_file`: The ID of the validation file you uploaded in the previous step (optional).
- `suffix`: A suffix to append to the fine-tuned model name (optional).
- `seed`: A seed to use for reproducibility (optional).

You can also set hyperparameters for the fine-tuning job. At present they are:
- `batch_size`: The batch size to use during fine-tuning.
- `learning_rate_multiplier`: A multiplier to apply to the learning rate of the model.
- `n_epochs`: The number of epochs to train the model for.

Finally, you can enable integrations. At present they are limited to Weights and Biases. See the API documentation for more information.

__NOTE:__ The fine-tuning job takes time - often 10 minutes or more.

API documentation: https://platform.openai.com/docs/api-reference/fine-tuning/create

In [12]:
response = client.fine_tuning.jobs.create(
    training_file=training_file_id,
    validation_file=validation_file_id,
    model="gpt-3.5-turbo",
    suffix="via-api",
)

job_id = response.id

print("Job ID:", response.id)
print("Status:", response.status)

Job ID: ftjob-9AunUqVfu5en9M7cIJeEAhEI
Status: validating_files


## 3. Check job status
The fine-tuning job will take some time to complete, and that time is influenced by factors out of your control and without transparency. 

You can check the status of the job using the `client.fine_tuning.jobs.retrieve()` method. The method accepts the job ID as a parameter and returns the status of the job.

API documentation: https://platform.openai.com/docs/api-reference/fine-tuning/retrieve

In [13]:
response = client.fine_tuning.jobs.retrieve(job_id)

print("Job ID:", response.id)
print("Status:", response.status)
print("Trained Tokens:", response.trained_tokens)

Job ID: ftjob-9AunUqVfu5en9M7cIJeEAhEI
Status: validating_files
Trained Tokens: None


You can also list the events of a job using the `client.fine_tuning.jobs.list_events()` method. The method accepts the job ID as a parameter and returns a list of events associated with the job.

The method has the following parameters:
- `fine_tuning_job_id`: The ID of the fine-tuning job to list events for.
- `limit`: The maximum number of events to retrieve (optional).
- `after`: Identifier for the last event from the previous pagination request (optional).

API documentation: https://platform.openai.com/docs/api-reference/fine-tuning/list-events

In [14]:
response = client.fine_tuning.jobs.list_events(job_id)

events = response.data
events.reverse()

for event in events:
    print(event.message)

Created fine-tuning job: ftjob-9AunUqVfu5en9M7cIJeEAhEI
Validating training file: file-I207HvFD4R6CEhjhURkIQn8H and validation file: file-cmpTcTZnqapeDwWQRt7fNNxv


## 4. Retrieve the fine-tuned model name
When the fine-tune job is complete, you can retrieve the name of the fine-tuned model using the `client.fine_tuning.jobs.retrieve()` method. The method accepts the job ID as a parameter and returns the status of the job, including the name of the fine-tuned model.

API documentation: https://platform.openai.com/docs/api-reference/fine-tuning/retrieve

In [17]:
response = client.fine_tuning.jobs.retrieve(job_id)
fine_tuned_model_id = response.fine_tuned_model

if fine_tuned_model_id is None: 
    raise RuntimeError("Fine-tuned model ID not found. Your job has likely not been completed yet.")

print("Fine-tuned model ID:", fine_tuned_model_id)

Fine-tuned model ID: ft:gpt-3.5-turbo-0125:personal:via-api:9LFdbHOm


## 5. Test the model
Once the fine-tuning job is complete, you can test the model using the regular `client.chat.completions.create()` method and setting the `model` to the fine-tuned model ID retrieved in the previous step.

In [18]:
test_messages = [
    {"role": "system", "content": "You are a restaurant review analyzer. Analyze the sentiment in the review provided. Output the rating value and the sentiment as JSON ."},
    {"role": "user", "content": "Review: Decent food, slow service, beatutiful decor. Would return, maybe."}
]

response = client.chat.completions.create(
    model=fine_tuned_model_id, 
    messages=test_messages, 
    temperature=0, 
    max_tokens=500
)
print(response.choices[0].message.content)

{"rating": 4, "Sentiment": "Delightful Dining"}



## X. Cancel fine-tuning job
If you need to cancel a fine-tuning job, you can use the `client.fine_tuning.jobs.cancel()` method. The method accepts the job ID as a parameter and cancels the job.
(No active code example is provided for this because it could result in accidentally cancelling a job.)

API documentation: https://platform.openai.com/docs/api-reference/fine-tuning/cancel