# 🫒 Olive Tutorial: Finetuning

## 🤗 Login to Hugging Face

In [None]:
!huggingface-cli login --token TOKEN

## 🧪 Fine-tune with Olive

In this tutorial we'll fine tune the Phi-3.5-mini-instruct model for the task of phrase classification i.e. given a phrase the model will classify into one of joy/surprise/fear/sadness. The dataset, which is available on Hugging Face, is in the following format:

```json
{"phrase": "I'm thrilled to start my new job!", "tone": "joy"}
{"phrase": "I can't believe I lost my keys again.", "tone": "surprise"}
{"phrase": "This haunted house is terrifying!", "tone": "fear"}
{"phrase": "Winning the lottery is a dream come true.", "tone": "joy"}
{"phrase": "Missing the concert is really disappointing.", "tone": "sadness"}
```

To fine-tune you only need to enter a few arguments into the `olive finetune` command:

- `--method` the method used for fine-tuning. `lora` and `qlora` are supported.
- `--data_name` the Hugging Face dataset name.
- `--text-template` the template to generate text field from. E.g. ‘### Question: {prompt} n### Answer: {response}’. For Phi-3, the chat format is `<|user|>\n{prompt}<|end|>\n<|assistant|>\n{response}<|end|>`
- `--model_name_or_path` The model checkpoint for weights initialization. This can be a Hugging Face model repo, a local path, or an Azure AI Model registry.

More details on available options can be found [here](https://microsoft.github.io/Olive/features/cli.html#finetune).

### 🧠 Supported models

Whilst Olive can fine-tune any PyTorch model through a user-provided `io_config` (type, shape etc.,). However, the most popular models are supported out-of-the-box such as:

- Phi
- Llama
- Mistral
- Gemma
- Qwen

For more details on supported *architectures*, read [Hugging Face Optimum Overview](https://huggingface.co/docs/optimum/en/exporters/onnx/overview).

In [None]:
# It can take around 20-30mins for the finetuning to complete.
!olive finetune \
    --method qlora \
    --model_name_or_path microsoft/Phi-3.5-mini-instruct \
    --trust_remote_code \
    --use_ort_genai \
    --data_name xxyyzzz/phrase_classification \
    --text_template "<|user|>\n{phrase}<|end|>\n<|assistant|>\n{tone}<|end|>" \
    --max_steps 5

## ✨ Test the model using ONNX Runtime

In [None]:
import onnxruntime_genai as og
import numpy as np
import os

model_folder = "optimized-model"

model = og.Model(model_folder)
tokenizer = og.Tokenizer(model)
tokenizer_stream = tokenizer.create_stream()

weights_file = os.path.join(model_folder, "adapter_weights.npz") 
adapter_weights = np.load(weights_file)
 
# Set the max length to something sensible by default,
# since otherwise it will be set to the entire context length
search_options = {}
search_options['max_length'] = 200
search_options['past_present_share_buffer'] = False

chat_template = "<|user|>\n{input}<|end|>\n<|assistant|>"

text = input("Input: ")
if not text:
   print("Error, input cannot be empty")
   exit

prompt = f'{chat_template.format(input=text)}'

input_tokens = tokenizer.encode(prompt)

params = og.GeneratorParams(model)
for key in adapter_weights.keys():
    params.set_model_input(key, adapter_weights[key])
params.set_search_options(**search_options)
params.input_ids = input_tokens
generator = og.Generator(model, params)


print("Output: ", end='', flush=True)

try:
   while not generator.is_done():
     generator.compute_logits()
     generator.generate_next_token()

     new_token = generator.get_next_tokens()[0]
     print(tokenizer_stream.decode(new_token), end='', flush=True)
except KeyboardInterrupt:
    print("  --control+c pressed, aborting generation--")

print()
# free up resources
del generator
del model
del tokenizer
del tokenizer_stream