### OpenAI Fine-Tuning API
Detailed process of how I will be using OpenAI fine-tuning API for Wally.

In [5]:
# Import necessary libraries
import os
from dotenv import load_dotenv
from openai import OpenAI

# Load the environment variables
load_dotenv()

# Create instance of OpenAI
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

In [34]:
# Data setup
data_folder = "../data/processed/empathetic-conversational-model"
training_file_name = os.path.join(data_folder, "train.jsonl")
# validation_file_name = os.path.join(data_folder, "validation.jsonl")

## Upload files
Use `client.files.create()` method from OpenAI Files API to upload training file (for now only training, no validation) to OpenAI API. Afterwards, store returned File object ID for reference.

In [43]:
training_file = client.files.create(
    file=open(training_file_name, "rb"),
    purpose="fine-tune",
)

training_file_id = training_file.id
print(f"Training file ID: {training_file_id}")

Training file ID: file-HsvVQZSpcSQE6TrRyVaVwB


## Create fine-tuning job
Use the `client.fine_tuning.jobs.create()` method to create a fine-tuning job

In [44]:
job = client.fine_tuning.jobs.create(
    training_file=training_file_id,
    model="gpt-4o-mini-2024-07-18",
    suffix="wally",
)

job_id = job.id

print(f"Job ID: {job_id}")
print(f"Job status: {job.status}")

Job ID: ftjob-QpkaFQN3Q5OP7YZ0Ir1YTL6k
Job status: validating_files


## Check Job Status
Check status using `client.fine_tuning.jobs.retrieve() ` method, which takes in job ID.

In [49]:
retrieve_response = client.fine_tuning.jobs.retrieve(job_id)

print(f"Job ID: {retrieve_response.id}")
print(f"Job status: {retrieve_response.status}")
print(f"Model: {retrieve_response.model}")
print(f"Trained Tokens: {retrieve_response.trained_tokens}")

Job ID: ftjob-QpkaFQN3Q5OP7YZ0Ir1YTL6k
Job status: running
Model: gpt-4o-mini-2024-07-18
Trained Tokens: None


List events of the job using the `client.fine_tuning.jobs.list_events()` method. Returns a list of events associated with the job.

In [50]:
response = client.fine_tuning.jobs.list_events(job_id)

events = response.data
events.reverse()

for event in events:
    print(event.message)

Created fine-tuning job: ftjob-QpkaFQN3Q5OP7YZ0Ir1YTL6k
Validating training file: file-HsvVQZSpcSQE6TrRyVaVwB
Files validated, moving job to queued state
Fine-tuning job started
