<a href="https://colab.research.google.com/github/nathalierocelle/guided-projects-generative-ai/blob/main/Finetune_OpenAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Step 1: Install the required library

Make sure you have the `openai` library installed. Use the command below if not already installed:

In [1]:
!pip install openai==1.55.3 httpx==0.27.2



In [2]:
# Import necessary libraries
import os
import pandas as pd
import json
from openai import OpenAI

Access data from this [link](https://themlco-my.sharepoint.com/:f:/p/chiragchauhan/Eg7biBxP1hhEta6JLchrfWgBvLawQsfxSDJO6K1BQl_J9g?e=FhiWT3)

## Step 2: Set up your API key
Replace 'YOUR_API_KEY' with your actual OpenAI API key.

In [3]:
from google.colab import userdata # check the key shaped the symbol on the left pane of the notebook
OpenAI_API = userdata.get('OPENAI_API_KEY')

In [8]:
os.environ["OPENAI_API_KEY"] = OpenAI_API

## Step 3: Prepare the dataset

Hint: Use a JSON or CSV file and convert it to JSONL

If loading a CSV dataset and convert it to JSONL format.
Complete the conversion code below:

Here for example directly using the training.jsonl file used during finetuning via playground

In [22]:
# Retrieve data from github

raw_url = "https://raw.githubusercontent.com/nathalierocelle/guided-projects-generative-ai/refs/heads/main/GenAI_Program_FAQ_dataset.csv"
df = pd.read_csv(raw_url)
df.head()

Unnamed: 0,prompt,completion
0,How the sessions will be organized. Will the r...,This program contains both pre-recorded and li...
1,Is there any pre-requisite required before or ...,Intermediate level python programming is requi...
2,What will be the duration of the program?,This is a 6 weeks long program.
3,What materials will be provided during the cou...,"Videos, documents, coding assignments will be ..."
4,What will be the timings of the sessions?,Live sessions will happen 6 to 7:30 PM IST on ...


In [23]:
# Convert the dataset to JSONL format
output_file = 'data.jsonl'
with open(output_file, 'w') as f:
    for _, row in df.iterrows():
        # Create JSON lines for chat model fine-tuning
        json_line = json.dumps({
            "messages": [
                {"role": "system", "content": "You are a helpful assistant which acts as FAQ Support Assistant for the TMLC Guided Projects in Generative AI Program and answer to user queries."},
                {"role": "user", "content": row['prompt']},
                {"role": "assistant", "content": row['completion']}
            ]
        })
        f.write(json_line + '\n')

print(f"Dataset converted and saved to {output_file}")


Dataset converted and saved to data.jsonl


## Step 4: Upload the file for fine-tuning

Use the OpenAI API to upload the dataset. Replace '<JSONL_FILE>' with your dataset file name.

In [24]:
client = OpenAI()

In [25]:
uploaded_file = client.files.create(
    file=open(output_file, "rb"),
    purpose="fine-tune"
)
print(f"File uploaded successfully. File ID: {uploaded_file.id}")

File uploaded successfully. File ID: file-3qnwMKk1rw9PEhnAhTsLc4


## Step 5: Fine-tune the model

Trigger the fine-tuning job process using the uploaded file ID. Replace 'FILE_ID' and 'MODEL_NAME' accordingly.

In [26]:
fine_tune_job = client.fine_tuning.jobs.create(
    training_file=uploaded_file.id,
    suffix="custom-fine-tuned-model",
    model="gpt-4o-mini-2024-07-18"  # Adjust the model as required
)
print(f"Fine-tuning job started. Job ID: {fine_tune_job.id}")

Fine-tuning job started. Job ID: ftjob-UtNug0vjeyDzIvOsrdxaHjJM


## Step 6: Monitor and use the fine-tuned model

Check list of fine-tuning jobs, retrieve job details.

In [27]:
# List fine-tuning jobs
jobs = client.fine_tuning.jobs.list(limit=10)
print("Recent fine-tuning jobs:", jobs)

Recent fine-tuning jobs: SyncCursorPage[FineTuningJob](data=[FineTuningJob(id='ftjob-UtNug0vjeyDzIvOsrdxaHjJM', created_at=1735697140, error=Error(code=None, message=None, param=None), fine_tuned_model='ft:gpt-4o-2024-08-06:personal:custom-fine-tuned-model:AkiUjxl8', finished_at=1735697488, hyperparameters=Hyperparameters(n_epochs=3, batch_size=1, learning_rate_multiplier=2), model='gpt-4o-2024-08-06', object='fine_tuning.job', organization_id='org-YcB7xnG9MjnCVW9Sk2t2QtNQ', result_files=['file-MYJHEt65t9PZHLx86u61kZ'], seed=855574480, status='succeeded', trained_tokens=6078, training_file='file-3qnwMKk1rw9PEhnAhTsLc4', validation_file=None, estimated_finish=None, integrations=[], user_provided_suffix='custom-fine-tuned-model', method={'type': 'supervised', 'supervised': {'hyperparameters': {'n_epochs': 3, 'batch_size': 1, 'learning_rate_multiplier': 2.0}}}), FineTuningJob(id='ftjob-JMNgRsBtRqWuPH01FpIGjkbA', created_at=1735696184, error=Error(code=None, message=None, param=None), fine

In [14]:
# Retrieve job details
job_details = client.fine_tuning.jobs.retrieve(fine_tune_job.id)
print("Fine-tuning job details:", job_details)

Fine-tuning job details: FineTuningJob(id='ftjob-JMNgRsBtRqWuPH01FpIGjkbA', created_at=1735696184, error=Error(code=None, message=None, param=None), fine_tuned_model='ft:gpt-4o-mini-2024-07-18:personal:custom-fine-tuned-model:AkiEWy7F', finished_at=1735696482, hyperparameters=Hyperparameters(n_epochs=3, batch_size=1, learning_rate_multiplier=1.8), model='gpt-4o-mini-2024-07-18', object='fine_tuning.job', organization_id='org-YcB7xnG9MjnCVW9Sk2t2QtNQ', result_files=['file-8WGehf7jLEVYqS5L7tP8Kz'], seed=1170080283, status='succeeded', trained_tokens=4146, training_file='file-XQ9fxJa3D9vAAHdpL79KEt', validation_file=None, estimated_finish=None, integrations=[], user_provided_suffix='custom-fine-tuned-model', method={'type': 'supervised', 'supervised': {'hyperparameters': {'n_epochs': 3, 'batch_size': 1, 'learning_rate_multiplier': 1.8}}})


In [15]:
# Retrieve after model is trained to get the model name
job_details = client.fine_tuning.jobs.retrieve(fine_tune_job.id)
print("Fine-tuning job details:", job_details)

Fine-tuning job details: FineTuningJob(id='ftjob-JMNgRsBtRqWuPH01FpIGjkbA', created_at=1735696184, error=Error(code=None, message=None, param=None), fine_tuned_model='ft:gpt-4o-mini-2024-07-18:personal:custom-fine-tuned-model:AkiEWy7F', finished_at=1735696482, hyperparameters=Hyperparameters(n_epochs=3, batch_size=1, learning_rate_multiplier=1.8), model='gpt-4o-mini-2024-07-18', object='fine_tuning.job', organization_id='org-YcB7xnG9MjnCVW9Sk2t2QtNQ', result_files=['file-8WGehf7jLEVYqS5L7tP8Kz'], seed=1170080283, status='succeeded', trained_tokens=4146, training_file='file-XQ9fxJa3D9vAAHdpL79KEt', validation_file=None, estimated_finish=None, integrations=[], user_provided_suffix='custom-fine-tuned-model', method={'type': 'supervised', 'supervised': {'hyperparameters': {'n_epochs': 3, 'batch_size': 1, 'learning_rate_multiplier': 1.8}}})


## Step 7: Use the fine-tuned model

In [20]:
# Example call to the fine-tuned model
completion = client.chat.completions.create(
    model="ft:gpt-4o-mini-2024-07-18:personal:custom-fine-tuned-model:AkiEWy7F",
    # Replace with the actual model name retrieved in above cell fine_tuned_model='ft:gpt-4o-mini-2024-07-18:personal:custom-fine-tuned-model:AbOWr1n9'
    messages=[
        {"role": "system", "content": "You are a helpful assistant which acts as FAQ Support Assistant for the TMLC Guided Projects in Generative AI Program and answer to user queries."},
        {"role": "user", "content": "When does the latest cohort/program starts?"}
    ]
)
print("Fine-tuned model response:", completion.choices[0].message.content)

Fine-tuned model response: The latest cohort starts on 23rd November 2024 
