# 🐋 Finetune and Optimize DeepSeek-R1-Distill-Qwen-1.5B with Olive

In this notebook, you will:

1. Fine-tune [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) to classify English phrases into Surprise/Joy/Fear/Sadness.
1. Optimize the fine-tuned model for the ONNX Runtime.


## 🐍 Install Python dependencies

The following cells create a pip requirements file and then install the libraries.

In [None]:
%%writefile requirements.txt

olive-ai[finetune, auto-opt]
onnxruntime-genai
transformers==4.44.2

In [None]:
%%capture

%pip install -r requirements.txt

## 🏃 Train the model

Fine-tuning language models helps when we desire very specific outputs. In this example, you'll fine-tune 🐋 DeepSeek-R1-Distill-Qwen-1.5B from the previous cell to respond to an English phrase with a single word answer that classifies the phrases into one of surprise/fear/joy/sadness categories. Here is a sample of the data used for fine-tuning:

```jsonl
{"phrase": "The sudden thunderstorm caught me off guard.", "tone": "surprise"}
{"phrase": "The creaking door at night is quite spooky.", "tone": "fear"}
{"phrase": "Celebrating my birthday with friends is always fun.", "tone": "joy"}
{"phrase": "Saying goodbye to my pet was heart-wrenching.", "tone": "sadness"}
```

In the following `olive finetune` command the `--data_name` argument is a Hugging Face dataset [xxyyzzz/phrase_classification](https://huggingface.co/datasets/xxyyzzz/phrase_classification). You can also provide your own data from local disk using the `--data_files` argument.

> ⏳ **It takes ~15mins to complete the Finetuning**

In [None]:
!olive finetune \
    --method lora \
    --model_name_or_path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
    --trust_remote_code \
    --data_name xxyyzzz/phrase_classification \
    --text_template "<|begin▁of▁sentence|><|User|>{phrase}<|Assistant|>{tone}<|end▁of▁sentence|>" \
    --max_steps 300 \
    --output_path models/deepseek/ft \
    --log_level 1

## 🪄 Automatic model optimization with Olive

Next, you'll execute Olive's automatic optimizer using the `auto-opt` CLI command, which will:

1. Capture the fine-tuned model into an ONNX graph and convert the weights into the ONNX format.
1. Optimize the ONNX graph (e.g. fuse nodes, reshape, etc).
1. Extract the fine-tuned LoRA weights and place them into a separate file.

> ⏳**It takes ~2mins for the automatic optimization to complete**

In [None]:
!olive auto-opt \
    --model_name_or_path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
    --adapter_path models/deepseek/ft/adapter \
    --device cpu \
    --provider CPUExecutionProvider \
    --use_ort_genai \
    --precision int4 \
    --output_path models/deepseek/onnx-ao \
    --log_level 1

## 🧠 Inference

The code below creates a test app that consumes the model in a simple console chat interface. You will be prompted to enter an English phrase (for example: "Cricket is a wonderful game") and the app will output a chat completion.

Whilst the inference code uses the Python API for the ONNX Runtime, other language bindings are available in [Java, C#, C++](https://github.com/microsoft/onnxruntime-genai/tree/main/examples).

To exit the chat interface, enter `exit` or select `Ctrl+c`.

In [None]:
import onnxruntime_genai as og

model_path = "models/deepseek/onnx-ao/model"

model = og.Model(f'{model_path}')
adapters = og.Adapters(model)
adapters.load(f'{model_path}/adapter_weights.onnx_adapter', "classifier")
tokenizer = og.Tokenizer(model)
tokenizer_stream = tokenizer.create_stream()

# Keep asking for input prompts in a loop
while True:
    phrase = input("Phrase: ")
    prompt = f"<|begin▁of▁sentence|><|User|>{phrase}<|Assistant|>"
    input_tokens = tokenizer.encode(prompt)
    
    # first run without the adapter
    params = og.GeneratorParams(model)
    params.set_search_options(past_present_share_buffer=False, temperature=0.5)
    generator = og.Generator(model, params)
    # set the adapter to active for this response
    generator.set_active_adapter(adapters, "classifier")

    generator.append_tokens(input_tokens)

    print("Output: ", end='', flush=True)

    while not generator.is_done():
            generator.generate_next_token()
            new_token = generator.get_next_tokens()[0]
            print(tokenizer_stream.decode(new_token), end='', flush=True)
    print()
    print()
    del generator