In [1]:
#Adding API keys weights and biases, hugging face

from kaggle_secrets import UserSecretsClient

user_secrets = UserSecretsClient()

hugging_face_token = user_secrets.get_secret("hugging_face_token")
weights_and_biases_token = user_secrets.get_secret("weights_and_biases")

In [2]:
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://unslothai/unsloth.git

In [3]:
# Fine tuning modules
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer # Trainer for supervised fine tuning
from unsloth import is_bfloat16_supported

# Hugging face modules
from huggingface_hub import login
from transformers import TrainingArguments # Defines training hyperparameters
from datasets import load_dataset

#Weights and biases
import wandb

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [4]:
#login to huggingface
login(hugging_face_token)

#login to wandb
wandb.login(key=weights_and_biases_token)
run = wandb.init(
    project="Fine-Tune-DeepSeek-Model-R1 On Medical Dataset",
    job_type="training",
    anonymous="allow"
)

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mcontact-mohamednaffeti[0m ([33mcontact-mohamednaffeti-isimm[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [5]:
#Hyper parameters
max_seq_length = 2084 # max sequence length the model can handle (how many tokens can be processed at once)
dtype = None # Set to default
load_in_4bit = True # Enables 4 bit quantization for memory saving optimization

In [6]:
#Loading the model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    token=hugging_face_token
)

==((====))==  Unsloth 2025.1.8: Fast Llama patching. Transformers: 4.48.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.9k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

In [9]:
# Defining a system prompt
prompt_style = """Below is an instruction that describes a task, paired with and input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}"""

In [13]:
# Defining a test question
question = """A 61-year-old woman with a long histroy of involuntary urine lossduring activities like coughing or sneezing, but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, and what would cystometry looks likely reveal about her residual volume and detrusor contractions?"""

# Enabling optimized inference mode for unsloth models, for improved speed and efficiency
FastLanguageModel.for_inference(model)

# Formatting the question using prompt_style and tokenizing it
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

#Generating a response
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True
)

# Decode the response
response = tokenizer.batch_decode(outputs)

#extracting and printing the relevant response part
print(response[0].split("### Response")[1])

:
<think>
Okay, so I need to figure out what cystometry would likely reveal about this woman's residual volume and detrusor contractions. Let me start by going through the information given.

First, the patient is a 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing, but she doesn't leak at night. That makes me think about possible causes. She undergoes a gynecological exam and Q-tip test. I'm not entirely sure what the Q-tip test entails, but I think it's a test to check for urethral obstruction or urethritis. So, if the Q-tip test is positive, it might suggest that her urethra isn't opening properly, causing the obstruction.

Now, the question is about what cystometry would show. Cystometry is a test where they fill the bladder and measure how much it holds (residual volume) and how the detrusor muscle contracts during filling. 

If she has involuntary urine loss, especially with coughing, that's often related to overactive bla

In [20]:
# Training prompt style by adding </think> tag
train_prompt_style = """Below is an instruction that describes a task, paired with and input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

In [14]:
#Loading the dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT", "en", split="train[0:500]", trust_remote_code=True)
dataset

README.md:   0%|          | 0.00/1.25k [00:00<?, ?B/s]

medical_o1_sft.json:   0%|          | 0.00/74.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25371 [00:00<?, ? examples/s]

Dataset({
    features: ['Question', 'Complex_CoT', 'Response'],
    num_rows: 500
})

In [15]:
dataset[1]

{'Question': 'A 45-year-old man with a history of alcohol use, who has been abstinent for the past 10 years, presents with sudden onset dysarthria, shuffling gait, and intention tremors. Given this clinical presentation and history, what is the most likely diagnosis?',
 'Complex_CoT': "Alright, let’s break this down. We have a 45-year-old man here, who suddenly starts showing some pretty specific symptoms: dysarthria, shuffling gait, and those intention tremors. This suggests something's going wrong with motor control, probably involving the cerebellum or its connections.\n\nNow, what's intriguing is that he's had a history of alcohol use, but he's been off it for the past 10 years. Alcohol can do a number on the cerebellum, leading to degeneration, and apparently, the effects can hang around or even appear long after one stops drinking.\n\nAt first glance, these symptoms look like they could be some kind of chronic degeneration, maybe something like alcoholic cerebellar degeneration, 

In [17]:
# Formatting the dataset to fit the prompt
EOS_TOKEN = tokenizer.eos_token # End of Sequence token
EOS_TOKEN

'<｜end▁of▁sentence｜>'

In [18]:
# Formatting function
def format_prompt_fn(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]

    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts
    }

In [23]:
# update dataset formatting
dataset_finetune = dataset.map(format_prompt_fn, batched=True)
dataset_finetune["text"][0]

"Below is an instruction that describes a task, paired with and input that provides further context.\nWrite a response that appropriately completes the request.\nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.\nPlease answer the following medical question.\n\n### Question:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Response:\n<think>\nOkay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdom

In [24]:
#Setting LoRA => adds small, trainable adapters to specific layers
model_lora = FastLanguageModel.get_peft_model(
    model, 
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj"
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None
)

Unsloth 2025.1.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [27]:
#Initialize the finetuning trainer
trainer = SFTTrainer(
    model = model_lora,
    tokenizer=tokenizer,
    train_dataset=dataset_finetune,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    
    #Training arguments
    args = TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        num_train_epochs=1,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs"
    )
)

Map (num_proc=2):   0%|          | 0/500 [00:00<?, ? examples/s]

In [28]:
#Start the finetuning process
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 500 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
10,1.9221
20,1.4704
30,1.4086
40,1.315
50,1.3509
60,1.3213


In [29]:
#save the finetuned model
wandb.finish()

0,1
train/epoch,▁▂▄▅▇██
train/global_step,▁▂▄▅▇██
train/grad_norm,█▂▂▁▂▂
train/learning_rate,█▇▅▄▂▁
train/loss,█▃▂▁▁▁

0,1
total_flos,1.7930995434061824e+16
train/epoch,0.96
train/global_step,60.0
train/grad_norm,0.26116
train/learning_rate,0.0
train/loss,1.3213
train_loss,1.46471
train_runtime,1510.3194
train_samples_per_second,0.318
train_steps_per_second,0.04


In [30]:
# Defining a test question for the finetuned model
question = """A 61-year-old woman with a long histroy of involuntary urine lossduring activities like coughing or sneezing, but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, and what would cystometry looks likely reveal about her residual volume and detrusor contractions?"""

# Enabling optimized inference mode for unsloth models, for improved speed and efficiency
FastLanguageModel.for_inference(model_lora)

# Formatting the question using prompt_style and tokenizing it
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

#Generating a response
outputs = model_lora.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True
)

# Decode the response
response = tokenizer.batch_decode(outputs)

#extracting and printing the relevant response part
print(response[0].split("### Response")[1])

:
<think>
Okay, so let's think about this. We have a 61-year-old woman who's been dealing with involuntary urine loss during things like coughing or sneezing, but she's not leaking at night. That suggests she might have some kind of problem with her pelvic floor muscles or maybe her bladder.

Now, she's got a gynecological exam, and they're doing a Q-tip test. I know the Q-tip test is usually used to check for urethral obstruction, especially in women. If the test shows a high resistance, it might mean there's something blocking the urethra, like a urethral stricture or a narrowed urethra.

But we're also thinking about what this could mean for her bladder. If there's a urethral obstruction, the bladder could be having trouble emptying itself, which might lead to a full bladder. That could mean she's retaining more urine than usual.

Now, let's consider what cystometry might show. Cystometry is like a test to see how the bladder behaves under different conditions. If the bladder is rea

In [33]:
import os

# Ensure the directory exists
os.makedirs("Mouhib007/DeepSeek-r1-Medical-Mini", exist_ok=True)

# Push to Hugging Face Hub
model_lora.push_to_hub_merged("Mouhib007/DeepSeek-r1-Medical-Mini/model", tokenizer, save_method="merged_16bit", token=hugging_face_token)

Unsloth: You are pushing to hub in Kaggle environment.
To save memory, we shall move Mouhib007/DeepSeek-r1-Medical-Mini/model to /tmp/model


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 16.76 out of 31.35 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 32/32 [00:51<00:00,  1.61s/it]


Unsloth: Saving tokenizer...

  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

 Done.
Unsloth: Saving /tmp/model/pytorch_model-00001-of-00004.bin...
Unsloth: Saving /tmp/model/pytorch_model-00002-of-00004.bin...
Unsloth: Saving /tmp/model/pytorch_model-00003-of-00004.bin...
Unsloth: Saving /tmp/model/pytorch_model-00004-of-00004.bin...


  0%|          | 0/4 [00:00<?, ?it/s]

pytorch_model-00002-of-00004.bin:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

pytorch_model-00001-of-00004.bin:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

pytorch_model-00003-of-00004.bin:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

pytorch_model-00004-of-00004.bin:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Done.
Saved merged model to https://huggingface.co/Mouhib007/model


In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="Mouhib007/DeepSeek-r1-Medical-Mini")
pipe(messages)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]