In [None]:
# Install unsloth
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [None]:
!pip -q install -U huggingface_hub wandb

from getpass import getpass
from huggingface_hub import login
import os, wandb

# Hugging Face
print("Paste your Hugging Face token (starts with hf_...).")
HF_TOKEN = getpass()
login(token=HF_TOKEN)                 # Authenticates this notebook session
os.environ["HF_TOKEN"] = HF_TOKEN     # So other libs can reuse it

# Weights & Biases
use_wandb = True   # set to False if you don't want logging
if use_wandb:
    print("Paste your Weights & Biases API Key (press Enter to skip).")
    try:
        WB_TOKEN = getpass()
        if WB_TOKEN.strip():
            wandb.login(key=WB_TOKEN)
            os.environ["WANDB_PROJECT"] = "med-qlora-unsloth"
            print("W&B logging enabled.")
        else:
            print("Skipped W&B login.")
            use_wandb = False
    except Exception as e:
        print("W&B login skipped:", e)
        use_wandb = False


Paste your Hugging Face token (starts with hf_...).
··········
Paste your Weights & Biases API Key (press Enter to skip).
··········


  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mramshaimran-work[0m ([33mramshaimran-codes[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


W&B logging enabled.


In [None]:
from unsloth import FastLanguageModel

max_seq_length = 2048
dtype = None
load_in_4bit = True


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = HF_TOKEN,
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


==((====))==  Unsloth 2025.8.9: Fast Llama patching. Transformers: 4.55.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

In [None]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}"""

In [None]:
question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"


FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<think>
Okay, so I'm trying to figure out what cystometry would show for this 61-year-old woman. Let me start by breaking down the information given.

She's been having involuntary urine loss, especially when she coughs or sneezes. That makes me think of stress urinary incontinence. But she doesn't leak at night, which might mean her bladder doesn't have problems when it's relaxed. So, it's probably not related to nighttime bladder capacity issues.

She undergoes a gynecological exam and Q-tip test. I remember that the Q-tip test is used to check for urethral obstruction, especially in women. The Q-tip is a catheter that's passed through the urethra, and it measures the pressure at the urethral meatus. If the pressure is higher than normal, it suggests that the urethra might be obstructed, which could lead to difficulty in emptying the bladder.

If the Q-tip test shows an elevated pressure, that might indicate that her urethra is partially blocked. If the pressure is higher than norma

In [None]:
EOS_TOKEN = tokenizer.eos_token


def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

In [None]:
from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

"Below is an instruction that describes a task, paired with an input that provides further context. \nWrite a response that appropriately completes the request. \nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. \nPlease answer the following medical question. \n\n### Question:\nGiven the symptoms of sudden weakness in the left arm and leg, recent long-distance travel, and the presence of swollen and tender right lower leg, what specific cardiac abnormality is most likely to be found upon further evaluation that could explain these findings?\n\n### Response:\n<think>Okay, let's see what's going on here. We've got sudden weakness in the person's left arm and leg - and that screams something neuro-related, maybe a stroke?\n\nBut wait, there's more. The right lower leg is 

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth 2025.8.9 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        # Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

Unsloth: Tokenizing ["text"]:   0%|          | 0/500 [00:00<?, ? examples/s]

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 500 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040 of 8,072,204,288 (0.52% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
10,2.1503
20,1.5434
30,1.5134
40,1.4712
50,1.4952
60,1.4731


In [None]:
question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"


FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])



<think>Okay, let's see. This woman is 61 and she's been dealing with involuntary urine loss, especially when she coughs or sneezes. Hmm, that sounds a lot like a common issue with a certain type of bladder condition. I'm thinking it's probably something related to stress incontinence. 

Alright, let's consider her symptoms. She doesn't leak at night, which is interesting because usually, that's more common in people with urge incontinence. But this is more about stress, so it must be something like urethral sphincter dysfunction or maybe a urethral insufficiency.

Now, she's getting a gynecological exam and a Q-tip test. The Q-tip test is usually used to figure out if there's any urethral obstruction. If the test shows some obstruction, it suggests that the urethral sphincter isn't working well.

So, if the Q-tip test shows urethral obstruction, it's likely that her problem is with the sphincter, not something like a weak detrusor muscle. That makes sense because with stress incontine

In [None]:
# First 3 questions of the dataset
for i in range(3):
    print(f"\nQuestion {i+1}:")
    print(dataset[i]["Question"])
    print("-" * 50)


Question 1:
Given the symptoms of sudden weakness in the left arm and leg, recent long-distance travel, and the presence of swollen and tender right lower leg, what specific cardiac abnormality is most likely to be found upon further evaluation that could explain these findings?
--------------------------------------------------

Question 2:
A 33-year-old woman is brought to the emergency department 15 minutes after being stabbed in the chest with a screwdriver. Given her vital signs of pulse 110/min, respirations 22/min, and blood pressure 90/65 mm Hg, along with the presence of a 5-cm deep stab wound at the upper border of the 8th rib in the left midaxillary line, which anatomical structure in her chest is most likely to be injured?
--------------------------------------------------

Question 3:
A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on 

In [None]:
question1 = "A 33-year-old woman is brought to the emergency department 15 minutes after being stabbed in the chest with a screwdriver. Given her vital signs of pulse 110/min, respirations 22/min, and blood pressure 90/65 mm Hg, along with the presence of a 5-cm deep stab wound at the upper border of the 8th rib in the left midaxillary line, which anatomical structure in her chest is most likely to be injured?"


FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question1, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<think>Okay, let's see what we're dealing with here. We have a 33-year-old woman who's been stabbed in the chest with a screwdriver. That's pretty serious, right? She's got a stab wound that's 5 centimeters deep. And it's in the upper border of the 8th rib, specifically on the left side, mid-axillary line. Hmm, that's an interesting spot. So, what does that tell us about what might be happening inside her chest?

First off, I need to think about the anatomy of her chest and what's at risk when you have a stab wound like this. The 8th rib is a bit lower than the 9th rib, which is closer to the clavicle and more at risk for injury when you're dealing with something like a stab wound. But wait, the 8th rib is pretty close to the heart. Oh, right, the heart is located in the chest cavity, right?

And the heart is a pretty critical organ. So, if there's a stab wound near it, there's a higher chance of hitting something important. Now, I'm thinking about the structures that are just below t

In [None]:
question2 = "Given the symptoms of sudden weakness in the left arm and leg, recent long-distance travel, and the presence of swollen and tender right lower leg, what specific cardiac abnormality is most likely to be found upon further evaluation that could explain these findings?"


FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question2, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<think>Alright, so there's this patient who's suddenly having weakness in their left arm and leg. Hmm, that sounds like a neurological issue. And then there's the recent long-distance travel... that's interesting. Also, the right lower leg is swollen and tender. That's definitely a clue that something's going on with the blood flow there. Let's think about what this could mean.

Now, when we're talking about sudden weakness, especially in just one side of the body, it often points to something like a stroke or a transient ischemic attack. But we should also consider other possibilities, like a spinal cord lesion. That's something that can cause weakness too, and it's not directly related to blood flow issues. So, we need to figure out if it's a vascular or a neurological problem.

Now, looking at the right leg being swollen and tender... that definitely sounds like it's related to something going wrong with the blood flow. Maybe there's a clot or something blocking a blood vessel. Cou