# __First contact__

In [2]:
import os
import openai

!pip install --upgrade openai

In [3]:
from dotenv import load_dotenv

In [4]:
load_dotenv()

True

In [8]:
openai.api_key = os.getenv("OPENAI_API_KEY")

In [23]:
openai.Engine.list()

<OpenAIObject list at 0x7f97fc75d770> JSON: {
  "data": [
    {
      "created": null,
      "id": "ada",
      "max_replicas": null,
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true,
      "ready_replicas": null,
      "replicas": null
    },
    {
      "created": null,
      "id": "babbage",
      "max_replicas": null,
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": false,
      "ready_replicas": null,
      "replicas": null
    },
    {
      "created": null,
      "id": "curie",
      "max_replicas": null,
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true,
      "ready_replicas": null,
      "replicas": null
    },
    {
      "created": null,
      "id": "curie-instruct-beta",
      "max_replicas": null,
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": false,
      "ready_replicas": null,
      "r

# __Fine Tuning__

Your data must be a JSONL document, where each line is a prompt-completion pair corresponding to a training example. 
Use the data_cleaning notebook to get this format - notebook puts whitespace in front and END at the end of every completion as required by openai

In [21]:
{"prompt": "<prompt text>", "completion": " <ideal generated text> END"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...

Ellipsis

## __CLI data preparation tool__

In [74]:
# tool which validates, gives suggestions and reformats your data:
!openai tools fine_tunes.prepare_data -f babyfood_new.json

Analyzing...

- Your file contains 58047 prompt-completion pairs

No remediations found.

You can use your file for fine-tuning:
> openai api fine_tunes.create -t "babyfood_new.json" --batch_size 1

After you’ve fine-tuned a model, remember that your prompt has to end with the indicator string `` for the model to start generating completions, rather than continuing with the prompt. Make sure to include `stop=[" ###"]` so that the generated texts ends at the expected place.
Once your model starts training, it'll approximately take 13.33 hours to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you.


## __actual fine tuning for console__

__Every fine-tuning job starts from a base model, which defaults to curie. The choice of model influences both the performance of the model and the cost of running your fine-tuned model. Your model can be one of: ada, babbage, or curie.__</br>

***just for console***

thus __add key to your zshrc:__ </br>
<font color='green'>export OPENAI_API_KEY="OPENAI_API_KEY"</font>

If the event stream is __interrupted for any reason, you can resume__ it by running:</br>
<font color='green'>openai api fine_tunes.follow -i YOUR_FINE_TUNE_JOB_ID</font>

__feed the prepared file to a model__ (here it's babbage) - takes some minutes </br>
<font color='green'>openai api fine_tunes.create -t babyfood_new.json -m babbage</font>

__In addition to creating a fine-tune job, you can also list existing jobs, retrieve the status of a job, or cancel a job.__</br>
List all created fine-tunes </br>
<font color='green'>openai api fine_tunes.list</font>


__Retrieve the state of a fine-tune.__</br>
The resulting object includes: job status (which can be one of pending, running, succeeded, or failed) and other information</br>
<font color='green'>openai api fine_tunes.get -i YOUR_FINE_TUNE_JOB_ID</font>

__Cancel a job__</br>
<font color='green'>openai api fine_tunes.cancel -i YOUR_FINE_TUNE_JOB_ID</font>

__Analyzing your fine-tuned model__</br>
This results file ID will be listed when you retrieve a fine-tune, and also when you look at the events on a fine-tune.</br>
<font color='green'>openai api fine_tunes.results -i YOUR_FINE_TUNE_JOB_ID</font>

## __Hyperparameters__

The only required parameter is the training file.</br>
tweaking the hyperparameters for fine-tuning can lead to higher quality output. 

__model:__ name of the base model to fine-tune. You can select one of "ada", "babbage", or "curie".</br>

__n_epochs__ - default 4. An epoch refers to one full cycle through the training dataset.</br>

__batch_size__ - defaults to ~0.2% of the number of examples in the training set, capped at 8. The batch size is the number of training examples used to train a single forward and backward pass. When use_packing is true, the batch size becomes the number of 2048-token contexts instead of the number of raw examples. In general, we've found that larger batch sizes tend to work better for larger datasets.</br>
    
__learning_rate_multiplier__ - defaults to 0.05. The fine-tuning learning rate is the original learning rate used for pretraining multiplied by this multiplier. We recommend experimenting with values in the range 0.02 to 0.2 to see what produces the best results. Empirically, we've found that larger learning rates often perform better with larger batch sizes.</br>
    
__use_packing / no_packing:__ defaults to use_packing for datasets with at least 500k tokens. On classification tasks and small datasets, we recommend setting no_packing, else use_packing. When using packing, we pack as many prompt-completion pairs as possible into each training example. This greatly increases the speed of a fine-tuning job.</br>

__compute_classification_metrics__ - defaults to False. If True, for fine-tuning for classification tasks, computes classification-specific metrics (accuracy, F-1 score, etc) on the validation set at the end of every epoch.</br>


In [None]:
openai api fine_tunes.create \
    -t file-JD89ePi5KMsB3Tayeli5ovfW \
    -m ada \
    --use_packing \
    --n_epochs 1

## __Fine Tune details__

In [75]:
# jobs
openai.FineTune.list()

<OpenAIObject list at 0x7f47e25e5e50> JSON: {
  "data": [
    {
      "created_at": 1638189561,
      "fine_tuned_model": "babbage:ft-user-r4iwcdpoxftbglfzc8c2mfn7-2021-11-29-13-04-27",
      "hyperparams": {
        "batch_size": 1,
        "learning_rate_multiplier": 0.05,
        "n_epochs": 4,
        "prompt_loss_weight": 0.1,
        "use_packing": true
      },
      "id": "ft-IVKLoqPihI5FwLjTSfFXExjp",
      "model": "babbage",
      "object": "fine-tune",
      "organization_id": "org-9ghUN8S9cCwE0yzpHXHDbEGa",
      "result_files": [
        {
          "bytes": 154961,
          "created_at": 1638191070,
          "filename": "compiled_results.csv",
          "id": "file-SCYKCs0T31IEupjlCEotC6CP",
          "object": "file",
          "purpose": "fine-tune-results",
          "status": "processed",
          "status_details": null
        }
      ],
      "status": "succeeded",
      "training_files": [
        {
          "bytes": 8353221,
          "created_at": 1638189560

In [None]:
# infos
openai.FineTune.retrieve(id="ftjob-AF1WoRqd3aJAHsqc9NY7iL8F")

# Immediately cancel a fine-tune job.
openai.FineTune.cancel(id="ftjob-AF1WoRqd3aJAHsqc9NY7iL8F")

# Get fine-grained status updates for a fine-tune job.
openai.FineTune.list_events(id="ftjb-AF1WoRqd3aJAHsqc9NY7iL8F")

## __Try tuned model__

In [10]:
# YOUR_FINE_TUNE_JOB_ID
FINE_TUNED_MODEL = "babbage:ft-user-r4iwcdpoxftbglfzc8c2mfn7-2021-11-29-13-04-27"

Note that no engine is specified on these requests.

## __create a prompt and get a completion__

In [20]:
prompt4 = '''
Rewrite the following rap verse about technology. Follow the rhyming pattern. ->:

Hold the cold one like he hold a old gun\n
Like he hold the microphone and stole the show for fun\n
Or a foe for ransom, flows is handsome\n
O's in tandem, anthem, random, tantrum\n
Phantom of the Grand Ole Opry ask the dumb hottie\n
Masked pump shotty, somebody stop me\n
Hardly come sloppy on a retarded hard copy\n
After rockin' parties he departed in a jalopy\n END

Rewrite the following rap verse about technology. Follow the rhyming pattern. ->:'''

In [21]:
fine_response4 = openai.Completion.create(
        model=FINE_TUNED_MODEL,
        prompt=prompt4,
        max_tokens= 50,
        temperature=0.84,
        presence_penalty=1,
        frequency_penalty=0,
        n=1,
        stop=[" END"])

In [22]:
print(fine_response4["choices"][0]["text"])

 that thing that makes the world go round is technology ### i think was trying to make a broader point about how much things have changed since the th ### opening statement looks like it will be an interesting yearhope you can stay with me ### watch president


## __Upload a file__ 
...that contains document(s) to be used across various endpoints/features.</br>
The ID of an uploaded file that contains documents to search over. Up to 1 GB.</br>
You should specify either documents or a file, but not both.</br>
If __purpose__ is set to "fine-tune", each line is a JSON record with "prompt" and "completion" </br> 

In [68]:
openai.File.create(
  file=open("rap_doc6.json"),
  purpose='fine-tune')

<File file id=file-34VVXDjNCCuFVML3Rut3qKbw at 0x7f47e221f3b0> JSON: {
  "bytes": 3012,
  "created_at": 1638228335,
  "filename": "rap_doc6.json",
  "id": "file-34VVXDjNCCuFVML3Rut3qKbw",
  "object": "file",
  "purpose": "fine-tune",
  "status": "uploaded",
  "status_details": null
}

## __file handeling__

In [76]:
# list uploaded files
openai.File.list()

<OpenAIObject list at 0x7f47e23f2b80> JSON: {
  "data": [
    {
      "bytes": 8353221,
      "created_at": 1638189560,
      "filename": "babyfood_new.json",
      "id": "file-oKSSO8yDnWJxI1b1mfAXjQly",
      "object": "file",
      "purpose": "fine-tune",
      "status": "processed",
      "status_details": null
    },
    {
      "bytes": 154961,
      "created_at": 1638191070,
      "filename": "compiled_results.csv",
      "id": "file-SCYKCs0T31IEupjlCEotC6CP",
      "object": "file",
      "purpose": "fine-tune-results",
      "status": "processed",
      "status_details": null
    },
    {
      "bytes": 3012,
      "created_at": 1638228335,
      "filename": "rap_doc6.json",
      "id": "file-34VVXDjNCCuFVML3Rut3qKbw",
      "object": "file",
      "purpose": "fine-tune",
      "status": "processed",
      "status_details": null
    }
  ],
  "object": "list"
}

In [None]:
# delete a file
openai.File("file-XjGxS3KTG0uNmNOK362iJua3").delete()

# retrieve file
openai.File.retrieve("file-XjGxS3KTG0uNmNOK362iJua3")

# retrieve file content
content = openai.File.download("file-XjGxS3KTG0uNmNOK362iJua3")

# fine tune with a uploaded file
openai.FineTune.create(training_file="file-XGinujblHPwGLSztz8cPS8XY")