<a href="https://colab.research.google.com/github/aiguru-pro/ai-engg/blob/main/week3/deepseek_fine_tune.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [2]:
from huggingface_hub import login
import wandb
from getpass import getpass
import os

# Prompt for tokens securely (input will be hidden)
print("Please enter your access tokens:")
hf_token = getpass('Enter your Hugging Face token: ')
wb_token = getpass('Enter your Weights & Biases token: ')

# Store them as environment variables for the session
os.environ['HUGGINGFACE_TOKEN'] = hf_token
os.environ['WANDB_API_KEY'] = wb_token

# Login to both platforms
login(hf_token)  # Hugging Face login
wandb.login(key=wb_token)  # W&B login

# Initialize W&B run
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset',
    job_type="training",
    anonymous="allow"
)

Please enter your access tokens:
Enter your Hugging Face token: ··········
Enter your Weights & Biases token: ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33maocvee[0m ([33maocvee2[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


In [3]:
from unsloth import FastLanguageModel

max_seq_length = 2048
dtype = None
load_in_4bit = True


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_token,
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.1.7: Fast Llama patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/231 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.9k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

In [4]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}"""

In [6]:
import time

question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"
FastLanguageModel.for_inference(model)

# Tokenize input
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Record start time
start_time = time.time()

# Generate
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

# Calculate time taken
end_time = time.time()
time_taken = end_time - start_time

# Calculate number of new tokens generated
input_length = inputs.input_ids.shape[1]
output_length = outputs.shape[1]
new_tokens = output_length - input_length

# Calculate tokens per second
tokens_per_sec = new_tokens / time_taken

print(f"Generation Speed: {tokens_per_sec:.2f} tokens/sec")
print(f"Total new tokens: {new_tokens}")
print(f"Time taken: {time_taken:.2f} seconds")

# Print the response
response = tokenizer.batch_decode(outputs)
print("\nResponse:")
print(response[0].split("### Response:")[1])

Generation Speed: 14.74 tokens/sec
Total new tokens: 460
Time taken: 31.21 seconds

Response:

<think>
Okay, so I'm trying to figure out what the cystometry would show for this 61-year-old woman. Let me break this down step by step.

First, the patient has a history of involuntary urine loss when she coughs or sneezes. That makes me think about possible causes. Usually, when someone loses urine involuntarily during these activities, it's related to a weak pelvic floor muscle or urethral issues. But since she doesn't leak at night, that suggests it's not a nighttime issue like sleep apnea or something related to supine position, which often affects the lower urinary tract.

She undergoes a gynecological exam and Q-tip test. From what I remember, the Q-tip test is used to assess urethral function. The provider inserts a Q-tip catheter into the urethra and measures the pressure. If the pressure is high, it might indicate an obstruction, which could lead to retention or difficulty emptying

In [7]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

In [8]:
EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

In [9]:
from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]

README.md:   0%|          | 0.00/1.25k [00:00<?, ?B/s]

medical_o1_sft.json:   0%|          | 0.00/74.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25371 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

"Below is an instruction that describes a task, paired with an input that provides further context. \nWrite a response that appropriately completes the request. \nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. \nPlease answer the following medical question. \n\n### Question:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Response:\n<think>\nOkay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her ab

In [14]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth: Already have LoRA adapters! We shall skip this step.


In [15]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=4,  # Increased from 2 to 4 for faster data processing
    args=TrainingArguments(
        per_device_train_batch_size=4,  # Increased from 2 to 4
        gradient_accumulation_steps=2,  # Reduced from 4 to 2
        warmup_steps=2,  # Reduced from 5
        max_steps=30,  # Reduced from 60
        learning_rate=5e-4,  # Increased from 2e-4 for faster convergence
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=5,  # Reduced from 10
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)


Map (num_proc=4):   0%|          | 0/500 [00:00<?, ? examples/s]

In [16]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 500 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 2
\        /    Total batch size = 8 | Total steps = 30
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
5,1.9431
10,1.4899
15,1.419
20,1.3733
25,1.3729
30,1.3622


In [17]:
question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"


FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])



<think>
Okay, let's see. This woman is 61, and she's been dealing with this issue of losing urine involuntarily for a while now. That sounds like a urinary incontinence problem. I know that incontinence can be due to several things, like weak muscles in the pelvic floor or nerve problems. 

When they mentioned the coughing and sneezing, that's classic for stress incontinence. Stress incontinence usually happens when the bladder muscle isn't strong enough to hold back the urge, and it gets overwhelmed when there's pressure like from coughing or sneezing. So, it's likely that her bladder isn't holding up under pressure.

Now, about her night time. She said she doesn't leak at night. That’s interesting. Usually, with stress incontinence, people tend to leak at night as well because of the increased pressure during sleep. But she doesn't leak at night, which suggests that she doesn't have a problem with her bladder muscle not closing properly, like in genuine incontinence.

Now, the Q-tip

In [18]:
new_model_local = "DeepSeek-R1-Medical-COT-RV"
model.save_pretrained(new_model_local)
tokenizer.save_pretrained(new_model_local)

model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 6.0G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 3.0 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 38%|███▊      | 12/32 [00:01<00:01, 10.01it/s]
We will save to Disk and not RAM now.
100%|██████████| 32/32 [02:19<00:00,  4.37s/it]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving DeepSeek-R1-Medical-COT-RV/pytorch_model-00001-of-00004.bin...
Unsloth: Saving DeepSeek-R1-Medical-COT-RV/pytorch_model-00002-of-00004.bin...
Unsloth: Saving DeepSeek-R1-Medical-COT-RV/pytorch_model-00003-of-00004.bin...
Unsloth: Saving DeepSeek-R1-Medical-COT-RV/pytorch_model-00004-of-00004.bin...
Done.


In [19]:
new_model_online = "goaiguru/DeepSeek-R1-Medical-COT-RV"
model.push_to_hub(new_model_online)
tokenizer.push_to_hub(new_model_online)

model.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit")

README.md:   0%|          | 0.00/627 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.56k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

Saved model to https://huggingface.co/goaiguru/DeepSeek-R1-Medical-COT-RV


tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

Unsloth: You are pushing to hub, but you passed your HF username = goaiguru.
We shall truncate goaiguru/DeepSeek-R1-Medical-COT-RV to DeepSeek-R1-Medical-COT-RV
Unsloth: Will remove a cached repo with size 1.6K


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 2.94 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 32/32 [02:58<00:00,  5.57s/it]


Unsloth: Saving tokenizer...

No files have been modified since last commit. Skipping to prevent empty commit.


 Done.
Unsloth: Saving DeepSeek-R1-Medical-COT-RV/pytorch_model-00001-of-00004.bin...
Unsloth: Saving DeepSeek-R1-Medical-COT-RV/pytorch_model-00002-of-00004.bin...
Unsloth: Saving DeepSeek-R1-Medical-COT-RV/pytorch_model-00003-of-00004.bin...
Unsloth: Saving DeepSeek-R1-Medical-COT-RV/pytorch_model-00004-of-00004.bin...


pytorch_model-00002-of-00004.bin:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

pytorch_model-00004-of-00004.bin:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

pytorch_model-00001-of-00004.bin:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

pytorch_model-00003-of-00004.bin:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

Upload 4 LFS files:   0%|          | 0/4 [00:00<?, ?it/s]

Done.
Saved merged model to https://huggingface.co/goaiguru/DeepSeek-R1-Medical-COT-RV
