# __First contact__

In [3]:
import os
import openai
from dotenv import load_dotenv

!pip install --upgrade openai

In [4]:
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

In [7]:
openai.Engine.list()

<OpenAIObject list at 0x7f7af007cb80> JSON: {
  "data": [
    {
      "created": null,
      "id": "ada",
      "max_replicas": null,
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true,
      "ready_replicas": null,
      "replicas": null
    },
    {
      "created": null,
      "id": "babbage",
      "max_replicas": null,
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true,
      "ready_replicas": null,
      "replicas": null
    },
    {
      "created": null,
      "id": "curie",
      "max_replicas": null,
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true,
      "ready_replicas": null,
      "replicas": null
    },
    {
      "created": null,
      "id": "curie-instruct-beta",
      "max_replicas": null,
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": false,
      "ready_replicas": null,
      "re

# __Fine Tuning__

Your data must be a JSONL document, where each line is a prompt-completion pair corresponding to a training example.</br>
__Use the data_cleaning notebook__ to get this format - notebook puts whitespace in front and END at the end of every completion as required by openai

In [21]:
{"prompt": "Something", " completion": " <ideal generated text> END"}
{"prompt": "", "completion": " <ideal generated text>. END"}
{"prompt": "Something ->", "completion": " <ideal generated text>. END"}
...

Ellipsis

## __CLI data preparation tool__

In [1]:
# tool which validates, gives suggestions and reformats your data:
!openai tools fine_tunes.prepare_data -f poem.json

Analyzing...

- Your file contains 13747 prompt-completion pairs

No remediations found.

You can use your file for fine-tuning:
> openai api fine_tunes.create -t "poem.json" --batch_size 2

After you’ve fine-tuned a model, remember that your prompt has to end with the indicator string `` for the model to start generating completions, rather than continuing with the prompt. Make sure to include `stop=[" END"]` so that the generated texts ends at the expected place.
Once your model starts training, it'll approximately take 3.19 hours to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you.


## __actual fine tuning for console__

In [None]:
!openai api fine_tunes.create -t poem_train.json -v poem_test.json -m ada --learning_rate_multiplier 0.05 --batch_size 2

__Info about running fine tune job__

In [None]:
!openai api fine_tunes.get -i ft-QlE0A0CEfPng2dKSBGwgXdBT

__resume the fine tune if interrupted__

In [None]:
!openai api fine_tunes.follow -i ft-P0FXEjChVnAtZhL8ur24xTNa

[2021-12-01 00:31:53] Created fine-tune: ft-MaJ0hq12vBdziVnUX5D8oTUZ
[2021-12-01 00:32:01] Fine-tune enqueued. Queue number: 0
[2021-12-01 00:32:04] Fine-tune started
[2021-12-01 00:37:26] Completed epoch 1/4
[2021-12-01 00:42:15] Completed epoch 2/4
[2021-12-01 00:47:03] Completed epoch 3/4
[2021-12-01 00:51:53] Completed epoch 4/4
[2021-12-01 00:52:42] Uploaded model: babbage:ft-user-r4iwcdpoxftbglfzc8c2mfn7-2021-11-30-23-52-40
[2021-12-01 00:52:45] Uploaded result file: file-ARlRSeqk73V3hlJwySUhS6gq
[2021-12-01 00:52:45] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m babbage:ft-user-r4iwcdpoxftbglfzc8c2mfn7-2021-11-30-23-52-40 -p <YOUR_PROMPT>


## __Tuning Hyperparameters__

The only required parameter is the training file.</br>
tweaking the hyperparameters for fine-tuning can lead to higher quality output. 

__model:__ name of the base model to fine-tune. You can select one of "ada", "babbage", or "curie".</br>

__n_epochs__ - default 4. An epoch refers to one full cycle through the training dataset.</br>

__batch_size__ - defaults to ~0.2% of the number of examples in the training set, capped at 8. The batch size is the number of training examples used to train a single forward and backward pass. When use_packing is true, the batch size becomes the number of 2048-token contexts instead of the number of raw examples. In general, we've found that larger batch sizes tend to work better for larger datasets.</br>
    
__learning_rate_multiplier__ - defaults to 0.05. The fine-tuning learning rate is the original learning rate used for pretraining multiplied by this multiplier. We recommend experimenting with values in the range 0.02 to 0.2 to see what produces the best results. Empirically, we've found that larger learning rates often perform better with larger batch sizes.</br>
    
__use_packing / no_packing:__ defaults to use_packing for datasets with at least 500k tokens. On classification tasks and small datasets, we recommend setting no_packing, else use_packing. When using packing, we pack as many prompt-completion pairs as possible into each training example. This greatly increases the speed of a fine-tuning job.</br>

__compute_classification_metrics__ - defaults to False. If True, for fine-tuning for classification tasks, computes classification-specific metrics (accuracy, F-1 score, etc) on the validation set at the end of every epoch.</br>


## __in the console__
__Every fine-tuning job starts from a base model, which defaults to curie. The choice of model influences both the performance of the model and the cost of running your fine-tuned model. Your model can be one of: ada, babbage, or curie.__</br>

***just for console***

thus __add key to your zshrc:__ </br>
<font color='green'>export OPENAI_API_KEY="OPENAI_API_KEY"</font>

If the event stream is __interrupted for any reason, you can resume__ it by running:</br>
<font color='green'>openai api fine_tunes.follow -i YOUR_FINE_TUNE_JOB_ID</font>

__feed the prepared file to a model__ (here it's babbage) - takes some minutes </br>
<font color='green'>openai api fine_tunes.create -t babyfood_new.json -m babbage</font>

__In addition to creating a fine-tune job, you can also list existing jobs, retrieve the status of a job, or cancel a job.__</br>
List all created fine-tunes </br>
<font color='green'>openai api fine_tunes.list</font>


__Retrieve the state of a fine-tune.__</br>
The resulting object includes: job status (which can be one of pending, running, succeeded, or failed) and other information</br>
<font color='green'>openai api fine_tunes.get -i YOUR_FINE_TUNE_JOB_ID</font>

__Cancel a job__</br>
<font color='green'>openai api fine_tunes.cancel -i YOUR_FINE_TUNE_JOB_ID</font>

__Analyzing your fine-tuned model__</br>
This results file ID will be listed when you retrieve a fine-tune, and also when you look at the events on a fine-tune.</br>
<font color='green'>openai api fine_tunes.results -i YOUR_FINE_TUNE_JOB_ID</font>

## __Fine Tune details__

__Details Trained models__

In [30]:
# jobs
openai.FineTune.list()

# infos
openai.FineTune.retrieve(id="ft-rM0VjMyIWkrDDqe61ltw9oLY")

# Immediately cancel a fine-tune job.
openai.FineTune.cancel(id="ft-IVKLoqPihI5FwLjTSfFXExjp")

# delete a fine tuned model
openai.Model.delete(FINE_TUNED_MODEL)

# in console
# openai api models.delete -i <FINE_TUNED_MODEL>

# Get fine-grained status updates for a fine-tune job.
openai.FineTune.list_events(id="ft-MaJ0hq12vBdziVnUX5D8oTUZ")

## __Upload a file__ 
...that contains document(s) to be used across various endpoints/features.</br>
The ID of an uploaded file that contains documents to search over. Up to 1 GB.</br>
You should specify either documents or a file, but not both.</br>
If __purpose__ is set to "fine-tune", each line is a JSON record with "prompt" and "completion" </br> 

In [21]:
openai.File.create(
  file=open("poem2.json"),
  purpose='fine-tune')

## __file handeling__

In [21]:
# list uploaded files
openai.File.list()

# delete a file - does not work
openai.File("file-oKSSO8yDnWJxI1b1mfAXjQly").delete()

# shell works 
# !curl https://api.openai.com/v1/files/file-SCYKCs0T31IEupjlCEotC6CP -X DELETE -H 'Authorization: BEARER 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'

# retrieve file
openai.File.retrieve("file-XjGxS3KTG0uNmNOK362iJua3")

# retrieve file content
content = openai.File.download("file-XjGxS3KTG0uNmNOK362iJua3")

# fine tune with a uploaded file
openai.FineTune.create(training_file="file-XGinujblHPwGLSztz8cPS8XY")

# __Talk to the fine tuned models__

## __CURIE the Rapper__
trained on selected rap snippets

In [143]:
CURIE = "curie:ft-user-r4iwcdpoxftbglfzc8c2mfn7-2021-11-30-19-59-03"

In [141]:
prompt_curie = '''The following is a rap song written about the future. 
It is written in a pompous style with a lot of rhymes.'''

In [144]:
response_curie = openai.Completion.create(
  model=CURIE,
  prompt=prompt_curie,
  temperature=0.8,
  max_tokens=128,
  frequency_penalty=0.25,
  presence_penalty=0.75,
  stop=[" END"])

In [146]:
print(response_curie["choices"][0]["text"])


The rhymes are extremely clever, intense and lethal.
The concept of the lyrics is very futuristic in nature but
I think the world will never see it because I represent
the underground!
The true original rap for the heads who came to kick it.
The street has a name and he don't play games like that!
I'm on a mission from God to destroy these rhymes and this record is hard as hell to top.


## __Lifestyle ADA__ 
trained on lifestyle tweets

In [7]:
ADA = "ada:ft-user-r4iwcdpoxftbglfzc8c2mfn7-2021-12-01-09-03-54"

In [44]:
prompt_ada = '''
The following is a poem about cyberwar.
It is written in the style of Charles Bukowski.
Follow the ryhming pattern. Be creative.
'''

In [48]:
response_ada= openai.Completion.create(
        model=ADA,
        prompt=prompt_ada,
        max_tokens= 128,
        temperature=0.9,
        top_p =1,
        frequency_penalty=0.85,
        presence_penalty=0.75,
        n=1,
        stop=[" END"])

In [49]:
print(response_ada["choices"][0]["text"])

dr katie rossman  more friday stories from your voice.


## __BABBAGE the musician__
trained on tweets obviously belonging to a musician

In [157]:
BABBAGE = "babbage:ft-user-r4iwcdpoxftbglfzc8c2mfn7-2021-11-30-23-52-40"

In [181]:
prompt_babbage = 'Poem: ->'

In [182]:
response_babbage= openai.Completion.create(
        model=BABBAGE,
        prompt=prompt_babbage,
        temperature=0.8,
        max_tokens=128,
        frequency_penalty=0.25,
        presence_penalty=0.75,
        n=1,
        stop=[" END"])

In [183]:
print(response_babbage["choices"][0]["text"])

 i recommend every single album but especially the one i wrote on my best friend in justify your love.


## __ADA the POET__
trained on 8100 poems from kaggle

In [8]:
ADA_POET = "ada:ft-user-r4iwcdpoxftbglfzc8c2mfn7-2021-12-01-19-53-04"

In [21]:
poet_ada_prompt = 'the following is a poem about the cyberspace '

In [22]:
response_ada_poet= openai.Completion.create(
        model=ADA_POET,
        prompt=poet_ada_prompt,
        temperature=0.8,
        max_tokens=128,
        frequency_penalty=0.85,
        presence_penalty=0.75,
        n=1,
        stop=[" END"])

In [23]:
print(response_ada_poet["choices"][0]["text"])

, big and small, that I inhabit so that others may live as they would have lived; I am a poet because of this. And what is created is produced by everyone who uses my body to form it. The other side exists only in the space between us when we are intimate. How many times has the turning point arrived for me? Two million dollars, which is not enough for anyone to live on alone and to be comfortable where they are without having to leave their home town or engage with those around them in ways that may make their lives less adequate.


# __My trained models__
__Curie can Rap__
</br>
"fine_tuned_model": "curie:ft-user-r4iwcdpoxftbglfzc8c2mfn7-2021-11-30-19-59-03"</br>
"id": "ft-rM0VjMyIWkrDDqe61ltw9oLY"</br>
"filename": "raps.json"</br>
"id": "file-IzLMmK7TxJd8tEUe0wdbwnQZ"</br>
params</br>
--batch_size  1 --learning_rate_multiplier 0.04 --n_epochs 4 --prompt_loss_weight 0.1 --no_packing 

__Babbage the musician__</br>
"fine_tuned_model": "babbage:ft-user-r4iwcdpoxftbglfzc8c2mfn7-2021-11-30-23-52-40"</br>
"id": "ft-MaJ0hq12vBdziVnUX5D8oTUZ"</br>
"filename": "musician.json"</br>
"id": "file-XmyaJgqSXoxCx8iGwa5hpM8h"</br>
params</br>
--learning_rate_multiplier 0.04 --batch_size 4


__ada's Lifestyle__</br>
"fine_tuned_model": "ada:ft-user-r4iwcdpoxftbglfzc8c2mfn7-2021-12-01-09-03-54"</br>
"id": "ft-P0FXEjChVnAtZhL8ur24xTNa"</br>
"filename": "lifestyle.json"</br>
"id": "file-wBtY1KEgkaM646pRCxniyPEI"</br>
params</br>
-m ada --learning_rate_multiplier 0.03 --batch_size 5 --n_epochs 5

__ada the poet__</br>
was trained on 8110 modern poems from kaggle and a 6000 poem set as validation</br>
"fine_tuned_model": "ada:ft-user-r4iwcdpoxftbglfzc8c2mfn7-2021-12-01-19-53-04",</br>
"id": "ft-QlE0A0CEfPng2dKSBGwgXdBT"</br>
-t poem_train.json -v poem_test.json -m ada --learning_rate_multiplier 0.05 --batch_size 2</br>