<a href="https://colab.research.google.com/github/nefro313/deepseek-r1-model-finetuning/blob/main/Deepseek_R1_model_Finetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [21]:
#install the  unsloth library
# covert runtime processor from cpu to  T4
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [22]:
#Install the packages and run below command
!pip install -q datasets trl transformers

In [24]:
#Then make sure you dowloaded the all packages otherwise it will make impact when pulling the model from HF
#pip show -q  datasets trl transformers unsloth

In [41]:
# Access the Hugingface token
from google.colab import userdata
from huggingface_hub import login
login(userdata.get('HF_TOKEN'))

In [42]:
# Dowloading DeepSeek-R1-Distill-Llama-8B from huggingface using unsloth for faster finetuning

In [5]:
from unsloth import FastLanguageModel

max_seq_length = 2048
dtype = None
load_in_4bit = True

model,tokenizer = FastLanguageModel.from_pretrained(
    model_name = 'deepseek-ai/DeepSeek-R1-Distill-Llama-8B',
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = userdata.get('HF_TOKEN')
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.1.8: Fast Llama patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.9k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

In [43]:
#Config lora (Low-rank)

In [25]:
#Config lora (Low-rank)
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = [
        'q_proj',
        'k_proj',
        'v_proj',
        'o_proj',
        'gate_proj',
        'up_proj',
        'down_proj'
        ],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = 'none',
    use_gradient_checkpointing = 'unsloth',
    random_state = 3407,
    use_rslora = False,
    loftq_config = None

    )

Unsloth: Already have LoRA adapters! We shall skip this step.


In [27]:
#This model template using for the training
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
### Question:
{}

### Response:
<think>
{}
</think>
{}"""

In [28]:
#format the data
EOS_token = tokenizer.eos_token

def formatting_propmts_func(examples):
  input = examples['Question']
  cots = examples['Complex_CoT']
  res = examples['Response']
  text = []
  for input,cots,res in zip(input,cots,res):
    prompt = train_prompt_style.format(input,cots,res)+ EOS_token
    text.append(prompt)
  return  {'text': text}

In [37]:
#Pull data from the HF
# im only taking 100 records
from datasets import load_dataset
data = load_dataset('FreedomIntelligence/medical-o1-reasoning-SFT','en',split='train[:100]', trust_remote_code=True)
dataset = data.map(formatting_propmts_func,batched = True)

In [30]:
# check the how the data looklike
#dataset['text']

In [31]:
# Do supervised-finetuning using the trl and config the parameter
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported


trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    args=TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,# Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps = 5,
        max_steps = 60,
        learning_rate=2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported (),
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to = "none", # Use this for WandB etc
    )
  )

Map (num_proc=2):   0%|          | 0/100 [00:00<?, ? examples/s]

In [38]:
# Here where we train over data with model. Don't panic it wil take more time around 20 minites
# if you training again it will reduce the loss

In [12]:

training_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 100 | Num Epochs = 5
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
10,1.9115
20,1.4575
30,1.3249
40,1.2556
50,1.1782
60,1.1325


In [39]:
#sample question and prompt like the data we have
#Try different question which realted to our data
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
### Question:
{}

### Response:
<think>
{}
</think>"""
#sample question
question = "A 40-year-old female presents with fever, fatigue, and diffuse painful swelling in the midline of the neck. Fine needle aspiration cytology (FNAC) reveals epithelioid cells and giant cells. Based on these clinical and cytological findings, what is the most likely diagnosis?"

In [40]:
# Let's inference to see how our finetuned model looklike take up to max 1 minites
FastLanguageModel.for_inference(model)
input = tokenizer([prompt_style.format(question,"")],return_tensors='pt').to('cuda')
outputs = model.generate(
    input_ids = input.input_ids,
    attention_mask = input.attention_mask,
    max_new_tokens = 1200,
    eos_token_id = tokenizer.eos_token_id,
    use_cache = True
)
response = tokenizer.batch_decode(outputs,skip_special_tokens=True)
print(response[0].split("### Response:")[1])


<think>

</think>
Alright, let's think about this. We have a 40-year-old woman who's got a fever, feeling tired, and there's this painful swelling right in the middle of her neck. That's not something you see every day, and it's making me think there's something serious going on here. 

Now, let's not forget about the fine needle aspiration cytology (FNAC) results. They show epithelioid cells and giant cells. Hmm, those are key clues. Epithelioid cells are pretty common in granulomas, and giant cells are often a big hint towards a condition like sarcoidosis. 

Sarcoidosis is a systemic condition that can show up in various tissues, including the neck. It's known to cause these granulomas, which are groups of inflammatory cells that can include epithelioid and giant cells. And we know it's a chronic condition, which matches the symptoms she's having, like the fever and fatigue.

But wait, there's a catch here. Sarcoidosis can look similar to other conditions, like tuberculosis. Both ca