# 🐋 Finetune and Optimize DeepSeek R1 with Olive

DeepSeek introduced two first-generation reasoning models: DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero is trained using large-scale reinforcement learning (RL) without any supervised fine-tuning (SFT) as a preliminary step. This model has shown remarkable performance in reasoning tasks, naturally developing powerful and interesting reasoning behaviors through RL. However, it faces challenges such as endless repetition, poor readability, and language mixing.

To address these issues and further enhance reasoning performance, DeepSeek developed DeepSeek-R1. This model incorporates cold-start data before applying RL, which helps mitigate the challenges faced by DeepSeek-R1-Zero. As a result, DeepSeek-R1 achieves performance comparable to OpenAI-o1 across various tasks, including math, code, and reasoning.

To support the also released six dense models distilled from DeepSeek-R1. These dense models are based on Llama and Qwen architectures - in this tutorial, you'll be finetuning DeepSeek-R1-Distill-Qwen-1.5B

In this notebook, you will:

1. Fine-tune [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) for advanced medical reasoning.
1. Optimize the fine-tuned model for the ONNX Runtime.
1. Inference the fine-tuned model using the Generate API for the ONNX Runtime.

## Motivation for Fine tuning DeepSeek R1

Medical tasks frequently require intricate reasoning. In real-world scenarios, doctors often engage in careful deliberation when making diagnoses or decisions. To illustrate how a language model (LM) can assist in this process, consider a scenario where the LM receives a question about a patient. The model then employs chain-of-thought (CoT) reasoning to analyze the information and arrive at a diagnosis.

Given a medical question, the distilled DeepSeek-R1-Distill-Qwen-1.5B base model can provide answers. However, they are overly verbose answers and not concise. For example, given the following question:

*A 54-year-old construction worker with a long history of smoking presents with swelling in his upper extremity and face, along with dilated veins in this region. After conducting a CT scan and venogram of the neck, what is the most likely diagnosis for the cause of these symptoms?*

The model responds with:

*\</think>*

*Okay, so I'm trying to figure out what's going on with this 54-year-old construction worker who's been smoking and has these swelling and veins in his upper extremity and face. Let me break this down step by step. First, the age is 54, which is pretty old, but construction workers can be at higher risk for various health issues. The fact that he's been smoking for a long time suggests he might have a smoking-related condition, maybe something like lung cancer, but I'm not sure yet. He has swelling in his upper extremity and face, which is a common complaint in many patients, especially older ones. Swelling can be due to a variety of reasons, like infections, inflammation, or even something like a chronic condition. But given that he's been smoking, I'm thinking it's more likely something related to smoking. The CT scan and venogram results are important. CT scans can help visualize the areas, so the upper extremity and face would be checked. Venograms, or venous imaging, look at blood flow in the neck, which could indicate areas with high blood flow, possibly related to the lungs or other organs. Looking at the symptoms, the presence of blood vessels in that area suggests that there's an issue with blood flow, maybe an aneurysm or a blockage. An aneurysm is a bulge in a blood vessel, which can cause swelling and pain. But I'm not sure if that's the case here. Another possibility is atherosclerosis, where plaque builds up in the blood vessels, leading to blockage and swelling. This is common in older patients, especially those who smoke, because smoking increases the risk of lung cancer, which can lead to atherosclerosis. I should also consider other conditions, like a chronic disease like diabetes, but the question doesn't mention that. The main focus seems to be on smoking-related issues. Putting it all together, the CT scan and venogram likely showed areas with high blood flow, which points to a blockage in the blood vessels in the upper extremity and face. This could be an aneurysm or atherosclerosis. Given that he's been smoking, atherosclerosis is a more likely culprit, but aneurysm could also be a possibility. I should also think about the patient's smoking history. If he's been smoking for a long time, there's a higher risk of lung cancer, which can lead to atherosclerosis. Smoking also increases the risk of other conditions like chronic obstructive pulmonary disease (COPD), which can cause blockages.So, the most likely diagnosis would be a blockage in the blood vessels, either an aneurysm or atherosclerosis. But without more specific imaging, it's hard to tell exactly. However, given the smoking history, atherosclerosis is a strong candidate. I should also consider other possibilities, like a tumor, but that's less likely given the symptoms. The main suspects are atherosclerosis or aneurysm due to smoking-related conditions. In conclusion, the most likely diagnosis is a blockage in the blood vessels in the upper extremity and face, likely due to smoking-related atherosclerosis or a nearby aneurysm.*

*\</think>*

*The most likely diagnosis for the described symptoms is a blockage in the blood vessels in the upper extremity and face, likely due to smoking-related atherosclerosis or a nearby aneurysm. This conclusion is based on the patient's smoking history, which increases the risk of lung cancer and atherosclerosis, both of which can lead to blood vessel blockages.*

The reasoning process is outlined within the \<think>\</think> tags. We'd prefer to have a more concise reasoning. To this end, we'll finetune the DeepSeek-R1-Distill-Qwen-1.5B model using the [FreedomIntelligence/medical-o1-reasoning-SFT](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT) dataset, which was constructed using GPT-4o and searches for solutions to [verifiable medical problems](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-verifiable-problem). The data is validated through a medical verifier. The dataset comprises of 3 fields:

1. **Question**: A medical questions, explaining patient symptoms and background.
2. **Complex_COT**: The Chain of Thought (CoT) the LM should go through given the question.
3. **Response**: The formal response in the form of a diagnosis of the condition with a brief rationale.

The following is a single sample taken from the dataset:

| Question | Complex_COT | Response |
|----------| ----------- | ---------|
|A 17-year-old girl presents with sudden onset left-sided facial droop and is unable to fully close her left eye. Given her symptoms of unilateral facial paralysis that include an inability to move the left side of her mouth and close her left eye, but with intact extraocular movements and no other neurologic deficits, what treatment is most likely to address and improve her condition?| Alright, so we've got a 17-year-old girl here who suddenly can't move the left side of her face and can't fully close her left eye. That's pretty concerning. When I hear about facial droop and eye issues like that, I'm thinking about facial nerve stuff, like Bell's palsy. Okay, let's break this down. She's young and it's sudden onset, which fits the typical presentation of Bell's palsy. Plus, she's only got the one-sided facial paralysis, and no other neurological red flags—so things like her eye movements are normal. That's important because it helps to make sure this isn't something like a stroke or another cranial nerve problem. Hmm, definitely sounds like we're on the right track with Bell's palsy. Now, what do we do for Bell's palsy again? Oh right, corticosteroids! Those are the go-to treatment. Prednisone is what normally comes up because it helps reduce inflammation around the facial nerve. And you want to start that early, like ideally within the first three days or so of the symptoms showing up. So timing is key here. Also, we've got to think about her eye. If she can't close it properly, she might run into issues with dryness or injury. So we need to make sure she protects that eye. Keeping it moist with artificial tears or some kind of eye lubricant would be necessary. Putting it all together, for her condition, starting her on corticosteroids seems like the right move. Plus, taking care of her eye is crucial to avoid any complications there. This combo should set her on the path to recovery. Everything we've looked at lines up, and this plan makes sense based on her symptoms and the expected treatment for Bell's palsy.|Based on the symptoms you're describing, the 17-year-old girl is most likely experiencing Bell's palsy, which is characterized by the sudden onset of unilateral facial paralysis. The primary treatment for Bell's palsy is corticosteroids, with prednisone being the commonly used medication. Administering steroids can help reduce inflammation and swelling of the facial nerve, promoting recovery of nerve function. It is most effective when started early, ideally within the first three days of symptom onset. Additionally, it's crucial to manage her eye care, as she cannot fully close her left eye. Using artificial tears or eye lubricants will help keep her eye moist and protected from potential injury or dryness due to incomplete closure. In summary, the treatment that is most likely to address and improve her condition involves starting corticosteroids, such as prednisone, and ensuring protective care for her affected eye.|

## 🐍 Install Python dependencies

The following cells create a pip requirements file and then install the libraries.

In [None]:
%%writefile requirements.txt

olive-ai
onnxruntime-genai==0.6.0rc1
transformers==4.44.2
optimum==1.23.1
bitsandbytes==0.45.1
peft==0.13.2
accelerate>=0.30.0
scipy==1.14.1
torchvision==0.18.1
tabulate==0.9.0

In [None]:
%%capture

%pip install -r requirements.txt

## ❔💭💬 Pre-process the data

We need to massage the data so that a prompt is constructed for DeepSeek containing the instruction, medical question, human Chain of Thought (CoT) and desired response.

In [None]:
from datasets import load_dataset

prompt_template = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = prompt_template.format(input, cot, output) + "<｜end▁of▁sentence｜>"
        texts.append(text)
    return {
        "text": texts,
    }

# Create the English dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,remove_columns=["Question", "Complex_CoT", "Response"])
dataset.to_json("en_dataset.jsonl")

## 🏃 Train the model

In [None]:
!olive finetune \
    --method lora \
    --model_name_or_path "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" \
    --trust_remote_code \
    --data_name json \
    --data_files ./en_dataset.jsonl \
    --train_split "train[:20000]" \
    --eval_split "train[20000:25400]" \
    --text_field "text" \
    --max_steps 100 \
    --logging_steps 10 \
    --target_modules "q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj" \
    --output_path models/deepseek/en_ft \
    --log_level 1

## 🪄 Automatic model optimization with Olive

Next, you'll execute Olive's automatic optimizer using the `auto-opt` CLI command, which will:

1. Capture the fine-tuned model into an ONNX graph and convert the weights into the ONNX format.
1. Optimize the ONNX graph (e.g. fuse nodes, reshape, etc).
1. Extract the fine-tuned LoRA weights and place them into a separate file.

> ⏳**It takes ~2mins for the automatic optimization to complete**

In [None]:
!olive auto-opt \
    --model_name_or_path "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" \
    --adapter_path models/deepseek/en_ft/adapter \
    --device cpu \
    --provider CPUExecutionProvider \
    --use_ort_genai \
    --precision int4 \
    --output_path models/deepseek/en-onnx-ao \
    --log_level 1

## 🧠 Inference

The code below creates a test app that consumes the finetuned model in a simple console chat interface. Whilst the inference code uses the Python API for the ONNX Runtime, other language bindings are available in [Java, C#, C++](https://github.com/microsoft/onnxruntime-genai/tree/main/examples).

By default, the code will use the fine-tuned adapter to deliver more concise responses. You see the output of the original model (no fintuning) by setting `USE_ADAPTER = False`.

In [None]:
import onnxruntime_genai as og

USE_ADAPTER = True

model_path = "models/deepseek/en-onnx-ao/model"

model = og.Model(f'{model_path}')
adapters = og.Adapters(model)
adapters.load(f'{model_path}/adapter_weights.onnx_adapter', "en_medical_reasoning")
tokenizer = og.Tokenizer(model)
tokenizer_stream = tokenizer.create_stream()

prompt_template = """
Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>
"""

question = """
        A 54-year-old construction worker with a long history of smoking presents with swelling in his upper extremity and face, along with 
        dilated veins in this region. After conducting a CT scan and venogram of the neck, what is the most likely diagnosis for the cause of these symptoms?
"""
prompt = prompt_template.format(question, "")
input_tokens = tokenizer.encode(prompt)

# first run without the adapter
params = og.GeneratorParams(model)
params.set_search_options(past_present_share_buffer=False, temperature=0.6, max_length=1200)
generator = og.Generator(model, params)
# set the adapter to active for this response

if USE_ADAPTER:
        generator.set_active_adapter(adapters, "en_medical_reasoning")

generator.append_tokens(input_tokens)

print("Output: ", end='', flush=True)

while not generator.is_done():
        generator.generate_next_token()
        new_token = generator.get_next_tokens()[0]
        print(tokenizer_stream.decode(new_token), end='', flush=True)
print()
print()
del generator