# Part 1 - Fine Tuning

### Overview

This notebook demonstrates fine-tuning GPT 3.5 for text classification on a dataset of SMS text messages.

The following steps are covered:

* Loading and enriching SMS dataset
* Downsampling the dataset for fine tuning
* Training three fine-tuned models with sizes: 50, 100, 200
* Experimenting with the fine-tuned models

### Requirements

* Python 3 environment
    * python3 -m venv venv
    * Select venv kernel in VS Code
        * Upper-right corner of notebook in editor
* OpenAI Account
    * Need a valid API key: https://platform.openai.com/account/api-keys
* OpenAI Python Module
    * https://github.com/openai/openai-python
    * pip install --pre openai
    * Configure with API Key: 
        * Create .env file with `OPENAI_API_KEY=sk_XXXX_...`

In [41]:
# Install dependencies if needed
# %pip install pandas
# %pip install python-dotenv
# %pip install --pre openai

### Resources

* https://platform.openai.com/docs/guides/fine-tuning
* https://platform.openai.com/docs/api-reference/fine-tuning


In [12]:
import pandas as pd
import openai
import os
import json
from datetime import datetime
import time
from IPython.display import clear_output

%reload_ext autoreload
%autoreload 2
from src.util import getTrainTestSplit, makeJobsDataframe

In [2]:
from dotenv import load_dotenv; load_dotenv()
openai.api_key = os.environ['OPENAI_API_KEY']

# Load SMS Dataset

In [3]:
sms_spam_all = pd.read_csv('../data/kaggle_sms_spam.csv', encoding='latin-1')[['label', 'prompt']]
sms_spam_all['spam_flag'] = sms_spam_all['label'].apply(lambda x: True if x == 'spam' else False)
sms_spam = sms_spam_all.drop_duplicates(subset=['prompt'])
print("Loaded sms data file with {} rows, kept {}".format(len(sms_spam_all), len(sms_spam)))
sms_spam.head()


Loaded sms data file with 5572 rows, kept 5169


Unnamed: 0,label,prompt,spam_flag
0,ham,"Go until jurong point, crazy.. Available only ...",False
1,ham,Ok lar... Joking wif u oni...,False
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,True
3,ham,U dun say so early hor... U c already then say...,False
4,ham,"Nah I don't think he goes to usf, he lives aro...",False


# Set up System Prompt

In [4]:
systemPrompt = "You are a system for categorizing SMS text messages as being unwanted spam or normal messages."

# Create downsampled datasets at various sizes

We want to see how the dataset size affects model training time

In [6]:
sample_sizes = [50,100,200]

for sample_size in sample_sizes:
    
    train_data, test_data = getTrainTestSplit(sms_spam, 'spam_flag', sample_size, 200)

    model_path = f"../data/temp/model_{sample_size}"
    os.makedirs(model_path, exist_ok=True)

    with open(f"{model_path}/training.jsonl", 'w') as f:
        for index, row in train_data.iterrows():
            f.write(json.dumps({
                "messages": [
                    {"role": "system", "content": systemPrompt},
                    {"role": "user", "content": row['prompt']},
                    {"role": "assistant", "content": "spam" if row['spam_flag'] else "ham"}
                ]
            }) + "\n")

    with open(f"{model_path}/validation.jsonl", 'w') as f:
        for index, row in train_data.iterrows():
            f.write(json.dumps({
                "messages": [
                    {"role": "system", "content": systemPrompt},
                    {"role": "user", "content": row['prompt']},
                    {"role": "assistant", "content": "spam" if row['spam_flag'] else "ham"}
                ]
            }) + "\n")

# Do Fine Tuning

In [15]:
def runFineTuning(training_data_path, validation_data_path):
    training_file = openai.File.create(
        file=open(training_data_path, "rb"),
        purpose='fine-tune'
    )
    validation_file = openai.File.create(
        file=open(validation_data_path, "rb"),
        purpose='fine-tune'
    )
    job = openai.FineTuningJob.create(training_file=training_file.id, validation_file=validation_file.id, model="gpt-3.5-turbo")
    print("Submitted job {} for file {}".format(job.id, training_data_path))
    return job

### Submit a training job for each sample size we are testing

In [16]:
submitted_jobs = []
for sample_size in sample_sizes:
    training_data_path = f"../data/temp/model_{sample_size}/training.jsonl"
    validation_data_path = f"../data/temp/model_{sample_size}/validation.jsonl"
    job = runFineTuning(training_data_path, validation_data_path)
    with open(f"../data/temp/model_{sample_size}/job_start.json", 'w') as f:
        json.dump(job, f, indent=4)



Submitted job ftjob-v2MTWHznXSCoi9zIhV4xeGfv for file ../data/temp/model_50/training.jsonl
Submitted job ftjob-BM8V9hMp1nHn0v2Sn9YRagSg for file ../data/temp/model_100/training.jsonl
Submitted job ftjob-Jlr2b3eW1qbSIJOjgrUbBk2y for file ../data/temp/model_200/training.jsonl


### Monitor the jobs

In [40]:
while True:
    current_jobs = openai.FineTuningJob.list(limit=10)
    df = makeJobsDataframe(current_jobs.data)
    clear_output(wait=True)
    print(f"Updated at {datetime.now()}")
    display(df)
    time.sleep(10)

Updated at 2023-11-07 10:11:54.985532


Unnamed: 0,ID,Training File,Status,Duration,TrainedTokens,TokensPerMinute,FT ID
0,ftjob-Jlr2b3eW1qbSIJOjgrUbBk2y,file-B1y2Kgy61uxBA6KNOqHpvset,succeeded,103.066667,32352,313.89392,ft:gpt-3.5-turbo-1106:aa-engineering::8IAIy8LD
1,ftjob-BM8V9hMp1nHn0v2Sn9YRagSg,file-LpT2K57tcfuXsO0fmtXLFibN,succeeded,78.5,16455,209.617834,ft:gpt-3.5-turbo-1106:aa-engineering::8I9vALSP
2,ftjob-v2MTWHznXSCoi9zIhV4xeGfv,file-fxIq8wsI48lNp8khtYfj4Flo,succeeded,63.033333,8658,137.355896,ft:gpt-3.5-turbo-1106:aa-engineering::8I9g9RO0
3,ftjob-KUl5tSNid5Rq08EK9JFiySDt,file-tmOKf3KsbaOQLcIDMvx7lMbq,succeeded,16.916667,31632,1869.871921,ft:gpt-3.5-turbo-0613:aa-engineering::8HnGqN6a
4,ftjob-hjCv26zXKV13we6T1GoZSU3I,file-MTHHGTfNOed6X1YWWF1zX8Ud,succeeded,18.55,23943,1290.727763,ft:gpt-3.5-turbo-0613:aa-engineering::8HnINqwQ
5,ftjob-tB505C0UG0oE8A0tO8DiWVsv,file-PW2XKUivTmO7VWBQ8OY7ZCPa,succeeded,13.966667,16536,1183.961814,ft:gpt-3.5-turbo-0613:aa-engineering::8HnDu1VV
6,ftjob-GWOixlqvNI2QqJza3U8dXwdw,file-MO8Iqtp7D7mU4YkaqNfuYVmg,succeeded,9.816667,12156,1238.302207,ft:gpt-3.5-turbo-0613:aa-engineering::8HmzZPc9
7,ftjob-bnNcTibuWGgZ8vmzpr3PEuk9,file-zbeYBgkkQDZzvMnZY4IPc76K,succeeded,6.133333,8118,1323.586957,ft:gpt-3.5-turbo-0613:aa-engineering::8HmvyETn
8,ftjob-r3GPkAgJzHT04XrwvJlhQAEO,file-dP2rnjZABe7xg0GjddnL5yzI,succeeded,6.2,5744,926.451613,ft:gpt-3.5-turbo-0613:aa-engineering::8Hmw1RNC
9,ftjob-3xQGCgLB44R1C5jG0hrvcIbt,file-A9NBysw0N5tsSopAdpkcOF6S,succeeded,49.483333,81474,1646.493769,ft:gpt-3.5-turbo-0613:aa-engineering::8FbVkEom


KeyboardInterrupt: 

In [21]:
# Other useful commands
#openai.FineTuningJob.list(limit=10)
#openai.FineTuningJob.list_events(id=job.id, limit=10)
#openai.FineTuningJob.cancel(job.id)
#openai.FineTuningJob.retrieve(id='ftjob-KUl5tSNid5Rq08EK9JFiySDt')

# Try the models

In [27]:
# Sample Size to Model ID
completed_models = {
    50: 'ft:gpt-3.5-turbo-0613:aa-engineering::8I9g9RO0',
    100: 'ft:gpt-3.5-turbo-0613:aa-engineering::8I9vALSP',
    200: 'ft:gpt-3.5-turbo-0613:aa-engineering::8IAIy8LD'
}

In [43]:
async def getSpamClassification_FineTune(fineTunedModelId, prompt):
  completion = await openai.ChatCompletion.acreate(
    model=fineTunedModelId,
    messages=[
      {"role": "system", "content": systemPrompt},
      {"role": "user", "content": prompt}
    ]
  )
  result = completion.choices[0].message.content.lower() == 'spam'
  # print(prompt, "=>", result)
  return result


In [44]:
await getSpamClassification_FineTune(completed_models[50], "Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's")

True

In [39]:
# Test on the validation data
rows = []
for sample_size in completed_models.keys():
    fineTunedModelId = completed_models[sample_size]
    validation_data_path = f"../data/temp/model_{sample_size}/validation.jsonl"
    with open(validation_data_path, 'r') as f:
        for line in f.readlines():
            data = json.loads(line)
            prompt = data['messages'][1]['content']
            expected = data['messages'][2]['content']
            result = getFineTuneCompletion(fineTunedModelId, prompt)
            rows.append([sample_size, prompt, expected, result])
validation_df = pd.DataFrame(rows, columns=['sample_size', 'prompt', 'expected', 'result'])
validation_df

Romantic Paris. 2 nights, 2 flights from ï¿½79 Book now 4 next year. Call 08704439680Ts&Cs apply. => spam
URGENT! Your mobile No *********** WON a ï¿½2,000 Bonus Caller Prize on 02/06/03! This is the 2nd attempt to reach YOU! Call 09066362220 ASAP! BOX97N7QP, 150ppm => spam
Free 1st week entry 2 TEXTPOD 4 a chance 2 win 40GB iPod or ï¿½250 cash every wk. Txt POD to 84128 Ts&Cs www.textpod.net custcare 08712405020. => spam
Wan2 win a Meet+Greet with Westlife 4 U or a m8? They are currently on what tour? 1)Unbreakable, 2)Untamed, 3)Unkempt. Text 1,2 or 3 to 83049. Cost 50p +std text => spam
You can stop further club tones by replying \STOP MIX\" See my-tone.com/enjoy. html for terms. Club tones cost GBP4.50/week. MFL => spam
Claim a 200 shopping spree, just call 08717895698 now! Have you won! MobStoreQuiz10ppm => spam
Come to me, slave. Your doing it again ... Going into your shell and unconsciously avoiding me ... You are making me unhappy :-( => ham
U meet other fren dun wan meet me ah

Unnamed: 0,sample_size,prompt,expected,result
0,50,"Romantic Paris. 2 nights, 2 flights from ï¿½79...",spam,spam
1,50,"URGENT! Your mobile No *********** WON a ï¿½2,...",spam,spam
2,50,Free 1st week entry 2 TEXTPOD 4 a chance 2 win...,spam,spam
3,50,Wan2 win a Meet+Greet with Westlife 4 U or a m...,spam,spam
4,50,You can stop further club tones by replying \S...,spam,spam
...,...,...,...,...
345,200,Aah! A cuddle would be lush! I'd need lots of ...,ham,ham
346,200,"No, I was trying it all weekend ;V",ham,ham
347,200,I am taking you for italian food. How about a ...,ham,ham
348,200,K da:)how many page you want?,ham,ham
