### OpenAI's guide to fine-tuning: https://platform.openai.com/docs/guides/fine-tuning

In [1]:
import os
import openai


In [95]:
openai.api_key = os.getenv("OPENAI_API_KEY")

def test_prompt(prompt, suppress=False, model='text-davinci-003', max_tokens=256, **kwargs):

    response = openai.Completion.create(
      model=model,
      prompt=prompt,
      max_tokens=max_tokens,
      **kwargs
    )
    if not suppress:
        print(f'PROMPT:\n------\n{prompt}\n------\nRESPONSE\n------\n{prompt}{response.choices[0].text}')


In [7]:
examples = [
    ('Review: This movie sucks\nSubjective: Yes'),
    ('Review: This tv show was about the ocean\nSubjective: No'),
    ('Review: This book had a lot of flaws\nSubjective: Yes'),
    
    ('Review: The book was about WWII\nSubjective:'),
]

test_prompt('\n###\n'.join(examples))  # ### is a common few-shot separator

PROMPT:
------
Review: This movie sucks
Subjective: Yes
###
Review: This tv show was about the ocean
Subjective: No
###
Review: This book had a lot of flaws
Subjective: Yes
###
Review: The book was about WWII
Subjective:
------
RESPONSE
------
Review: This movie sucks
Subjective: Yes
###
Review: This tv show was about the ocean
Subjective: No
###
Review: This book had a lot of flaws
Subjective: Yes
###
Review: The book was about WWII
Subjective: No


In [61]:
prompts = [example.split('Subjective: ')[0]+'Subjective:' for example in examples[:-1]]
labels = [example.split('Subjective: ')[1] for example in examples[:-1]]

list(zip(prompts, labels))

[('Review: This movie sucks\nSubjective:', 'Yes'),
 ('Review: This tv show was about the ocean\nSubjective:', 'No'),
 ('Review: This book had a lot of flaws\nSubjective:', 'Yes')]

In [113]:
rows = []

# for review, label in zip(reviews, labels):
#     # don't forget the space before the label. It's not technically required, but GPT is
#     # used to predicing a space so it will make the training easier
#     rows.append({"prompt": review, "completion": f' {label}'})

    
for prompt, label in zip(prompts, labels):
    # don't forget the space before the label. It's not technically required, but GPT is
    # used to predicing a space so it will make the training easier
    rows.append({"prompt": prompt, "completion": f' {label}'})


In [114]:
rows

[{'prompt': 'Review: This movie sucks\nSubjective:', 'completion': ' Yes'},
 {'prompt': 'Review: This tv show was about the ocean\nSubjective:',
  'completion': ' No'},
 {'prompt': 'Review: This book had a lot of flaws\nSubjective:',
  'completion': ' Yes'}]

### More info on Fine-tuning: https://help.openai.com/en/articles/6811186-how-do-i-format-my-fine-tuning-data

In [126]:
import json

with open('../data/openai-fine-tuning-10.jsonl', 'w') as outfile:
    for entry in rows:
        json.dump(entry, outfile)
        outfile.write('\n')

In [127]:
!openai api fine_tunes.create -t "../data/openai-fine-tuning-10.jsonl" --model davinci --n_epochs 10


Upload progress: 100%|█████████████████████████| 249/249 [00:00<00:00, 126kit/s]
Uploaded file from ../data/openai-fine-tuning-10.jsonl: file-zdP7l4c5Bw1Iz406A5zrnuEJ
Created fine-tune: ft-c7QutbCMSGuDOxIEDmxUy8Aq
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-02-13 13:27:40] Created fine-tune: ft-c7QutbCMSGuDOxIEDmxUy8Aq

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i ft-c7QutbCMSGuDOxIEDmxUy8Aq



### GET READY TO WAIT IN A QUEUE

In [128]:
!openai api fine_tunes.follow -i ft-c7QutbCMSGuDOxIEDmxUy8Aq

[2023-02-13 13:27:40] Created fine-tune: ft-c7QutbCMSGuDOxIEDmxUy8Aq
[2023-02-13 13:40:34] Fine-tune costs $0.01
[2023-02-13 13:40:34] Fine-tune enqueued. Queue number: 0
[2023-02-13 13:40:34] Fine-tune is in the queue. Queue number: 0
[2023-02-13 13:40:35] Fine-tune started
[2023-02-13 13:42:58] Completed epoch 1/10
[2023-02-13 13:42:59] Completed epoch 2/10
[2023-02-13 13:43:00] Completed epoch 3/10
[2023-02-13 13:43:01] Completed epoch 4/10
[2023-02-13 13:43:02] Completed epoch 5/10
[2023-02-13 13:43:03] Completed epoch 6/10
[2023-02-13 13:43:04] Completed epoch 7/10
[2023-02-13 13:43:05] Completed epoch 8/10
[2023-02-13 13:43:06] Completed epoch 9/10
[2023-02-13 13:43:07] Completed epoch 10/10
[2023-02-13 13:43:39] Uploaded model: davinci:ft-personal-2023-02-13-21-43-39
[2023-02-13 13:43:40] Uploaded result file: file-Hwd2xXE7dCxU5lT9fpcbNnet
[2023-02-13 13:43:40] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m

In [129]:
# base GPT 3.5
test_prompt(
    'Review: The book was about WWII\nSubjective:'
)

PROMPT:
------
Review: The book was about WWII
Subjective:
------
RESPONSE
------
Review: The book was about WWII
Subjective: I really enjoyed reading this book. It was very historical and interesting.


In [123]:
FINE_TUNED_MODEL = 'davinci:ft-personal-2023-02-13-21-24-55'

In [124]:
# Our fine-tuned GPT3 model

# Don't forget to alter the prompt to be aligned with the fine tuning. 
#  In our case, we didn't change it

# Clearly the model is wrong but this is because you want at least a hundred examples
test_prompt(
    'Review: The book was about WWII\nSubjective:',
    model=FINE_TUNED_MODEL,
    max_tokens=1,
)

PROMPT:
------
Review: The book was about WWII
Subjective:
------
RESPONSE
------
Review: The book was about WWII
Subjective: Yes


# PROS
### - No need for few-shot
### - shorter prompts (theoretically no need for "Question: "
### - Don't need that many examples
### - Saves space in the token window
### - More aligned with what WE want from GPT, not OpenAI
### - Optional classification metrics can be calculated given a testing set
### - Can rely on smaller/faster/cheaper models IF YOU HAVE ENOUGH (minimum >= ~100) EXAMPLES

# CONS
### - Waiting in queue
### - No way to get performance metrics on non simple classification tasks like translation, sumamrization, Q/A, etc
### - Can only fine-tune language modeling (predicting next token), can't fine-tune RLHF (for now)
### - Very little control over traditional deep learning fine-tuning parameters (like learning rate, weight decay, etc)