[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sakamakik-outlook/llm-demo/blob/master/openai-api-03-fine-tuning.ipynb)




## Fine-tuning
Learn how to customize a model for your application.

 https://platform.openai.com/docs/guides/fine-tuning



### Introduction
Fine-tuning lets you get more out of the models available through the API by providing:

- Higher quality results than prompting
- Ability to train on more examples than can fit in a prompt
- Token savings due to shorter prompts
- Lower latency requests

### When to use fine-tuning
**<span style="color:red">Fine-tuning GPT models can make them better for specific applications, but it requires a careful investment of time and effort. We recommend first attempting to get good results with prompt engineering, prompt chaining (breaking complex tasks into multiple prompts), and function calling.</span>**


### At a high level, fine-tuning involves the following steps:

- Prepare and upload training data
- Train a new fine-tuned model
- Use your fine-tuned model


###  Prepare data
Create a diverse set of demonstration conversations that are similar to the conversations you will ask the model to respond to. (Minimun 10)

```
{
    "messages": [
        {"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, 
        {"role": "user", "content": "What's the tallest mountain in the world?"}, 
        {"role": "assistant", "content": "Mount Everest, where oxygen is overrated."}
    ]
}
```


In [3]:
import os
os.environ['OPENAI_API_KEY'] = "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"

In [4]:
import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.File.create(
  file=open("openai-api-03-fine-tuning-data.jsonl", "rb"),
  purpose='fine-tune'
)

<File file id=file-0hrmR58NhS1ocElF9DfNaAUa at 0x1f8840952b0> JSON: {
  "object": "file",
  "id": "file-0hrmR58NhS1ocElF9DfNaAUa",
  "purpose": "fine-tune",
  "filename": "file",
  "bytes": 5336,
  "created_at": 1694999004,
  "status": "uploaded",
  "status_details": null
}

### Run a training using the file

In [5]:
openai.FineTuningJob.create(training_file="file-0hrmR58NhS1ocElF9DfNaAUa", model="gpt-3.5-turbo")

<FineTuningJob fine_tuning.job id=ftjob-QFim5oKGMaX2LyK46JiHKgON at 0x1f8ec12fb90> JSON: {
  "object": "fine_tuning.job",
  "id": "ftjob-QFim5oKGMaX2LyK46JiHKgON",
  "model": "gpt-3.5-turbo-0613",
  "created_at": 1694999019,
  "finished_at": null,
  "fine_tuned_model": null,
  "organization_id": "org-4PrhvQVbA2imbdTOvWP5Wfnj",
  "result_files": [],
  "status": "created",
  "validation_file": null,
  "training_file": "file-0hrmR58NhS1ocElF9DfNaAUa",
  "hyperparameters": {
    "n_epochs": 5
  },
  "trained_tokens": null,
  "error": null
}

In [10]:
openai.FineTuningJob.list(limit=1)


<OpenAIObject list at 0x1f884164110> JSON: {
  "object": "list",
  "data": [
    {
      "object": "fine_tuning.job",
      "id": "ftjob-QFim5oKGMaX2LyK46JiHKgON",
      "model": "gpt-3.5-turbo-0613",
      "created_at": 1694999019,
      "finished_at": null,
      "fine_tuned_model": null,
      "organization_id": "org-4PrhvQVbA2imbdTOvWP5Wfnj",
      "result_files": [],
      "status": "running",
      "validation_file": null,
      "training_file": "file-0hrmR58NhS1ocElF9DfNaAUa",
      "hyperparameters": {
        "n_epochs": 5
      },
      "trained_tokens": null,
      "error": null
    }
  ],
  "has_more": true
}

In [13]:
openai.FineTuningJob.retrieve("ftjob-QFim5oKGMaX2LyK46JiHKgON")


<FineTuningJob fine_tuning.job id=ftjob-QFim5oKGMaX2LyK46JiHKgON at 0x1f884164e90> JSON: {
  "object": "fine_tuning.job",
  "id": "ftjob-QFim5oKGMaX2LyK46JiHKgON",
  "model": "gpt-3.5-turbo-0613",
  "created_at": 1694999019,
  "finished_at": 1694999336,
  "fine_tuned_model": "ft:gpt-3.5-turbo-0613:personal::7zx2vYe6",
  "organization_id": "org-4PrhvQVbA2imbdTOvWP5Wfnj",
  "result_files": [
    "file-JMYJ3kklVxtoImri9pV2Z4zg"
  ],
  "status": "succeeded",
  "validation_file": null,
  "training_file": "file-0hrmR58NhS1ocElF9DfNaAUa",
  "hyperparameters": {
    "n_epochs": 5
  },
  "trained_tokens": 4790,
  "error": null
}

In [14]:
import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")

def generate_chat_response(messages, model):
    completion = openai.ChatCompletion.create(
        model=model,              # <-- Model can be switched here
        messages=messages, 
        temperature=0.0,
    )

    print(completion.choices[0].message.content)


In [16]:
messages=[
{"role": "system", "content": "You are a sarcastic assistant."},
{"role": "user", "content":  "What's the tallest mountain in the world?"},
]
generate_chat_response(messages,"gpt-3.5-turbo-16k-0613")  # <-- Using a standard model

Oh, you must be referring to Mount Everest. It's only the highest peak on Earth, no big deal. Just a measly 29,032 feet tall. But hey, who's counting?


In [18]:
messages=[
{"role": "system", "content": "You are a sarcastic assistant."},
{"role": "user", "content":  "What's the tallest mountain in the world?"},
]
generate_chat_response(messages,"ft:gpt-3.5-turbo-0613:personal::7zx2vYe6")

Mount Everest, where oxygen is overrated.
