## Credits

This notebook and this code is a fork of Abid Ali Awan's tutorial on [Fine-Tuning DeepSeek R1](https://www.datacamp.com/tutorial/fine-tuning-deepseek-r1-reasoning-model).


## Which tools & packages will we be using?

Packages we're going to be using throughout this project

- `unsloth`: Efficient fine-tuning and inference for LLMs — Specifically we will be using:
    - `FastLanguageModel` module to optimize inference & fine-tuning
    - `get_peft_model` to enable LoRa (Low-Rank Adaptation) fine-tuning
- `peft`: Supports LoRA-based fine-tuning for large models.
- Different Hugging Face modules:
    - `transformers` from HuggingFace to work with our fine-tuning data and handle different model tasks
    - `trl` Transformer Reinforcement Learning from HuggingFace which allows for supervised fine-tuning of the model — we will use the `SFFTrainer` wrapper
    - `datasets` to fetch reasoning datasets from the Hugging Face Hub
- `torch`: Deep learning framework used for training
- `wandb`: Provides access to weights and biases for tracking our fine-tuning experiment 

## Before we get started — how to access the Hugging Face and Weights & Biases API

### Set GPU accelerator
We are using Kaggle Notebooks because we have access to free GPUs. To enable GPU access, press on Settings > Accelerator > GPU T4 x2

### How to access the Hugging Face API

1. Register to Huggin Face if you have not already
2. Go to [Hugging Face Tokens](https://huggingface.co/settings/tokens).
3. Click **"New Token"**.
4. Select **read/write** permissions if needed.
5. Copy your **API key**.

### Weights & Biases API key**
1. Sign up at [Weights & Biases](https://wandb.ai/site).
2. Go to [W&B Settings](https://wandb.ai/settings).
3. Copy your **API key** from the "API Keys" section.

### Add the API keys to Kaggle Notebooks
1. Press on Add-ons > Secrets
2. Add the API keys under `Hugging_Face_Token` and `wnb` respectively

You can now use this code to retrieve your API keys

```py
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
hugging_face_token = user_secrets.get_secret("Hugging_Face_Token")
wnb_token = user_secrets.get_secret("wnb")
```

## Install relevant packages

In [1]:
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

## Import packages

In [2]:
# Modules for fine-tuning
from unsloth import FastLanguageModel
from unsloth import is_bfloat16_supported
from trl import SFTTrainer
import torch

# Hugging Face modules
from huggingface_hub import login
from transformers import TrainingArguments
from datasets import load_dataset

# Import weights and biases
import wandb

# Import secrets
from kaggle_secrets import UserSecretsClient

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


## Get the API keys and login to both HuggingFace and Weights and Biases

In [3]:
# Initialize Hugging Face and W&B tokens
user_secrets = UserSecretsClient()
hf_token = user_secrets.get_secret("HF_TOKEN")
wnb_token = user_secrets.get_secret("wnb")

# Login to HuggingFace
login(hf_token)

# Login to W&B
wandb.login(key=wnb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset',
    job_type='training',
    anonymous='allow'
)

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33muozbek[0m ([33muozbek-havelsan[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


## Loading DeepSeek R1 and the Tokenizer
#### What are we trying to do here?
In this step, we **load the DeepSeek R1 model and its tokenizer** using `FastLanguageModel.from_pretrained()`. We also configure some key parameters for better fine-tuning. We will be using a distilled 8B version of R1 for faster computation.

#### What are the key parameters?
```python
    max_seq_length = 2048 # Define the maximum sequence length a model can handle (i.e. number of tokens per input)
    dtype = None # Default data type (usually auto-detected)
    load_in_4bit = True # Enables 4-bit quantization. This is a memory-saving optimization
```

#### Why 4-bit quantization?
Imagine compressing a **high-resolution image** to a smaller size-it takes up less space but still looks decent. Similarly, 4-bit quantization reduces the precision of model weights, making the model **smaller and faster while keeping most of its accuracy**. Instead of storing **32-bit or 16-bit numbers**, we compress them into **4-bit values**. This allows LLMs to **run efficiently on consumer GPUs** without needing too much memory.

In [4]:
# Set parameters
max_seq_length = 2048 # Define the maximum sequence length a model can handle (i.e. number of tokens per input)
dtype = None # Default data type (usually auto-detected)
load_in_4bit = True # Enables 4-bit quantization. This is a memory-saving optimization

# Load the DeepSeek R1 model and tokenizer using unsloth - imported using: from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name='unsloth/DeepSeek-R1-Distill-Llama-8B', # Load the pre-trained DeepSeek R1 model (8B parameter version)
    max_seq_length=max_seq_length, # Ensure the model can process up to 2048 tokens at once
    dtype=dtype, # Use the default data type (e.g., FP16 or BF16 depending on hardware support)
    load_in_4bit=load_in_4bit, # Load the model in 4-bit quantization to save memory
    token=hf_token # Use the HuggingFace token
)

==((====))==  Unsloth 2025.2.4: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.9k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

## Testing DeepSeek R1 on a medical use-case before proceeding with fine-tuning
#### Defining a system prompt
To create a prompt style for the model, we will be defining a system prompt. We will include placeholders for the question and response generation. The prompt will guide the model to think step-by-step and provide a logical response.

In [5]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, carefully analyze the question and construct a clear, step-by-step chain of reasoning to ensure a logical, accurate, and well-supported response.

### Instruction:
You are a highly knowledgeable medical expert with advanced expertise in clinical reasoning, diagnostics, and treatment planning. 
Provide a precise and well-structured answer to the following medical question, ensuring clarity and evidence-based reasoning.

### Question:
{}

### Response:
<think>{}"""

## Running inference on the model
In this step, we test the R1 model by providing a medical question and generating a response. The process involves the following steps:

1. Define a test question related to a medical case.
2. Format the question using the structured prompt (`prompt_style`) to ensure the model follows a logical reasoning process.
3. Tokenize the input and move it to the GPU (`cuda`)** for faster inference.
4. Generate a response using the model, specifying key parameters like `max_new_tokens=1200` (limits response length).
5. Decode the output tokens back into text to obtain the final readable answer.

In [6]:
# Creating a test medical question for inference
question = """A 45-year-old man presents with progressive difficulty swallowing both solids and liquids over the past 
              six months. He reports occasional regurgitation of undigested food and a sensation of food sticking in his 
              chest. A barium swallow study reveals a dilated esophagus with a bird’s beak appearance at the lower 
              esophageal sphincter. What would esophageal manometry most likely demonstrate regarding lower esophageal 
              sphincter pressure and peristalsis?"""


# Enable optimized inference mode for Unsloth models (improves speed and efficiency)
FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!

# Format the question using the structured prompt (`prompt_style`) and tokenize it
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")  # Convert input to PyTorch tensor & move to GPU

# Generate a response using the model
outputs = model.generate(
    input_ids=inputs.input_ids, # Tokenized input question
    attention_mask=inputs.attention_mask, # Attention mask to handle padding
    max_new_tokens=1200, # Limit response length to 1200 tokens (to prevent excessive output)
    use_cache=True, # Enable caching for faster inference
)

# Decode the generated output tokens into human-readable text
response = tokenizer.batch_decode(outputs)

# Extract and print only the relevant response part (after "### Response:")
print(response[0].split("### Response:")[1])  


<think>
Okay, so I'm trying to figure out what esophageal manometry would show in this case. Let me start by going through the information given.

The patient is a 45-year-old man with progressive difficulty swallowing solids and liquids for six months. He also reports occasional regurgitation of undigested food and a sensation of food sticking in his chest. The barium swallow study showed a dilated esophagus with a bird’s beak appearance at the lower esophageal sphincter.

Hmm, dilated esophagus with a bird’s beak at the lower sphincter. I remember that the lower esophageal sphincter (LES) is the gatekeeper between the esophagus and stomach. If there's a bird’s beak appearance, that probably means there's some kind of narrowing or structure there. Maybe it's a strictured area or a functional obstruction.

The symptoms include dysphagia (difficulty swallowing) and regurgitation of undigested food, which suggests that things aren't moving smoothly from the esophagus into the stomach. T

## Before moving on, why would we need fine-tuning at all?
Even without fine-tuning, R1 was able to generate a structured chain of thought and present reasoning before arriving at the final answer. This reasoning process is enclosed within the <think> </think> tags. So, why is fine-tuning still necessary? While the model's reasoning is thorough, it tends to be overly verbose rather than concise. Moreover, we aim to ensure that the final response maintains a consistent style.

## Fine-tuning step-by-step
### Step 1 - Update the system prompt
We will slightly change the prompt style for processing the dataset by adding the third placeholder for the complex chain of thought column `</think>`

In [7]:
# Updated training prompt style to add </think> tag 
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides additional context. 
Generate a response that fully addresses the request with logical reasoning and medical expertise. 
Before answering, carefully analyze the question and construct a step-by-step chain of thoughts to ensure clarity, accuracy, and coherence.

### Instruction:
You are a highly skilled medical expert with deep expertise in clinical reasoning, diagnostics, and treatment planning. 
Provide a well-structured, evidence-based response to the following medical question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

### Step 2 - Download the fine-tuning dataset and format it for fine-tuning
We will use the Medical O1 Reasoning SFT found [here](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT) on HuggingFace. From the authors: This dataset is used to fine-tune HuatuoGPT-o1, a medical LLM designed for advanced medical reasoning. This dataset is constructed using GPT-4o, which searches for solutions to verifiable medical problems and validates them through a medical verifier.

In [8]:
# Download the dataset using Hugging Face — function imported using from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True) # Keep only first 500 rows
dataset

README.md:   0%|          | 0.00/1.25k [00:00<?, ?B/s]

medical_o1_sft.json:   0%|          | 0.00/74.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25371 [00:00<?, ? examples/s]

Dataset({
    features: ['Question', 'Complex_CoT', 'Response'],
    num_rows: 500
})

In [10]:
# Show an entry from the dataset
dataset[1]

{'Question': 'A 45-year-old man with a history of alcohol use, who has been abstinent for the past 10 years, presents with sudden onset dysarthria, shuffling gait, and intention tremors. Given this clinical presentation and history, what is the most likely diagnosis?',
 'Complex_CoT': "Alright, let’s break this down. We have a 45-year-old man here, who suddenly starts showing some pretty specific symptoms: dysarthria, shuffling gait, and those intention tremors. This suggests something's going wrong with motor control, probably involving the cerebellum or its connections.\n\nNow, what's intriguing is that he's had a history of alcohol use, but he's been off it for the past 10 years. Alcohol can do a number on the cerebellum, leading to degeneration, and apparently, the effects can hang around or even appear long after one stops drinking.\n\nAt first glance, these symptoms look like they could be some kind of chronic degeneration, maybe something like alcoholic cerebellar degeneration, 

#### Next step is to structure the fine-tuning dataset according to train prompt style—why?

 - Each question is paired with chain-of-thought reasoning and the final response.
 - Ensures every training example follows a consistent pattern.
 - Prevents the model from continuing beyond the expected response lengt by adding the EOS token.

In [11]:
# We need to format the dataset to fit our prompt training style 
EOS_TOKEN = tokenizer.eos_token  # Define EOS_TOKEN which the model when to stop generating text during training
EOS_TOKEN

'<｜end▁of▁sentence｜>'

In [15]:
# Define formatting prompt function
def formatting_prompts_func(examples):
    inputs = examples["Question"]     # Get the medical question from the dataset
    cots = examples["Complex_CoT"]    # Get the chain-of-thought reasoning (logical step-by-step explanation)
    outputs = examples["Response"]    # Get the final model-generated response

    texts = []

    # Iterate over the dataset, formatting each question, reasoning step and response
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN # Insert values into our prompt template & append EOS token
        texts.append(text)

    return {
        "text": texts # Return the newly formatted dataset with a "text" column containing structured prompts
    }

In [16]:
# Update dataset formatting
dataset_finetune = dataset.map(formatting_prompts_func, batched=True)
dataset_finetune["text"][0]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

"Below is an instruction that describes a task, paired with an input that provides additional context. \nGenerate a response that fully addresses the request with logical reasoning and medical expertise. \nBefore answering, carefully analyze the question and construct a step-by-step chain of thoughts to ensure clarity, accuracy, and coherence.\n\n### Instruction:\nYou are a highly skilled medical expert with deep expertise in clinical reasoning, diagnostics, and treatment planning. \nProvide a well-structured, evidence-based response to the following medical question.\n\n### Question:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Response:\n<think>\nOkay, let's think about this step by step. There's a 61-year-old woman here who'

### Step 3 - Setting up the model using LoRa
#### An intuitive example of LoRa
LLMs have **millions or even billions of weights** that determine how they process and generate text. When fine-tuning a model, we usually update all these weights, which requires **massive amounts of computational resources**.

LoRa (**Low-Rank Adaptation**) allows to fine-tune efficiently by:

- Instead of modifying all weights, **LoRa adds small, trainable adapters** to specific layers.
- These adapters **capture task-specific knowledge** while leaving the original model unchanged.
- This reduces the number of trainable parameters **by more than 90%**, making fine-tuning **much faster and memory-efficient**.

Think of an LLM as a **complex factory**. Instead of rebuilding the entire factory to produce a new product, LoRa adds small, specialized tools to existing machines. This allows the factory to adapt quickly without disrupting its core structure.

For a more technical explanation, check out this tutorial by [Sebastian Raschka](https://www.youtube.com/watch?v=rgmJep4Sb4&t).

Below, we will use the `get_peft_model()` function which stands for Parameter-Efficient Fine-Tuning — this function wraps the base model (`model`) with LoRA modifications, ensuring that only specific parameters are trained.

In [17]:
# Apply LoRA (Low-Rank Adaptation) fine-tuning to the model 
model_lora = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank: Determines the size of the trainable adapters (higher = more parameters, lower = more efficiency)
    target_modules=[  # List of transformer layers where LoRA adapters will be applied
        "q_proj",   # Query projection in the self-attention mechanism
        "k_proj",   # Key projection in the self-attention mechanism
        "v_proj",   # Value projection in the self-attention mechanism
        "o_proj",   # Output projection from the attention layer
        "gate_proj",  # Used in feed-forward layers (MLP)
        "up_proj",    # Part of the transformer’s feed-forward network (FFN)
        "down_proj",  # Another part of the transformer’s FFN
    ],
    lora_alpha=16,  # Scaling factor for LoRA updates (higher values allow more influence from LoRA layers)
    lora_dropout=0,  # Dropout rate for LoRA layers (0 means no dropout, full retention of information)
    bias="none",  # Specifies whether LoRA layers should learn bias terms (setting to "none" saves memory)
    use_gradient_checkpointing="unsloth",  # Saves memory by recomputing activations instead of storing them (recommended for long-context fine-tuning)
    random_state=4056,  # Sets a seed for reproducibility, ensuring the same fine-tuning behavior across runs
    use_rslora=False,  # Whether to use Rank-Stabilized LoRA (disabled here, meaning fixed-rank LoRA is used)
    loftq_config=None,  # Low-bit Fine-Tuning Quantization (LoFTQ) is disabled in this configuration
)

Now, we initialize `SFTTrainer`, a supervised fine-tuning trainer from `trl` (Transformer Reinforcement Learning), to fine-tune our model efficiently on a dataset

In [19]:
# Initialize the fine-tuning trainer — Imported using from trl import SFTTrainer
trainer = SFTTrainer(
    model=model_lora,  # The model to be fine-tuned
    tokenizer=tokenizer,  # Tokenizer to process text inputs
    train_dataset=dataset_finetune,  # Dataset used for training
    dataset_text_field="text",  # Specifies which field in the dataset contains training text
    max_seq_length=max_seq_length,  # Defines the maximum sequence length for inputs
    dataset_num_proc=2,  # Uses 2 CPU threads to speed up data preprocessing

    # Define training arguments
    args=TrainingArguments(
        per_device_train_batch_size=2,  # Number of examples processed per device (GPU) at a time
        gradient_accumulation_steps=4,  # Accumulate gradients over 4 steps before updating weights
        num_train_epochs=1, # Full fine-tuning run
        warmup_steps=5,  # Gradually increases learning rate for the first 5 steps
        max_steps=60,  # Limits training to 60 steps (useful for debugging; increase for full fine-tuning)
        learning_rate=2e-4,  # Learning rate for weight updates (tuned for LoRA fine-tuning)
        fp16=not is_bfloat16_supported(),  # Use FP16 (if BF16 is not supported) to speed up training
        bf16=is_bfloat16_supported(),  # Use BF16 if supported (better numerical stability on newer GPUs)
        logging_steps=10,  # Logs training progress every 10 steps
        optim="adamw_8bit",  # Uses memory-efficient AdamW optimizer in 8-bit mode
        weight_decay=0.01,  # Regularization to prevent overfitting
        lr_scheduler_type="linear",  # Uses a linear learning rate schedule
        seed=4056,  # Sets a fixed seed for reproducibility
        output_dir="outputs"  # Directory where fine-tuned model checkpoints will be saved
    )
)

### Step 4 - Model training
This should take around 30 to 40 minutes - we can then check out our training results on Weights and Biases

In [20]:
# Start the fine-tuning process
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 500 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
10,1.9166
20,1.3967
30,1.4024
40,1.3371
50,1.3043
60,1.2998


In [21]:
# Save the fine-tuned model
wandb.finish()

0,1
train/epoch,▁▂▄▅▇██
train/global_step,▁▂▄▅▇██
train/grad_norm,█▄▂▃▂▁
train/learning_rate,█▇▅▄▂▁
train/loss,█▂▂▁▁▁

0,1
total_flos,1.824470363111424e+16
train/epoch,0.96
train/global_step,60.0
train/grad_norm,0.23267
train/learning_rate,0.0
train/loss,1.2998
train_loss,1.44281
train_runtime,1531.4618
train_samples_per_second,0.313
train_steps_per_second,0.039


### Step 5 - Run model inference after fine-tuning

In [22]:
question = """A 45-year-old man presents with progressive difficulty swallowing both solids and liquids over the past 
              six months. He reports occasional regurgitation of undigested food and a sensation of food sticking in his 
              chest. A barium swallow study reveals a dilated esophagus with a bird’s beak appearance at the lower 
              esophageal sphincter. What would esophageal manometry most likely demonstrate regarding lower esophageal 
              sphincter pressure and peristalsis?"""

# Load the inference model using FastLanguageModel (Unsloth optimizes for speed)
FastLanguageModel.for_inference(model_lora)  # Unsloth has 2x faster inference!

# Tokenize the input question with a specific prompt format and move it to the GPU
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generate a response using LoRA fine-tuned model with specific parameters
outputs = model_lora.generate(
    input_ids=inputs.input_ids,          # Tokenized input IDs
    attention_mask=inputs.attention_mask, # Attention mask for padding handling
    max_new_tokens=1200,                  # Maximum length for generated response
    use_cache=True,                        # Enable cache for efficient generation
)

# Decode the generated response from tokenized format to readable text
response = tokenizer.batch_decode(outputs)

# Extract and print only the model's response part after "### Response:"
print(response[0].split("### Response:")[1])


<think>
Alright, let's think about this. This guy is 45 and has been struggling with swallowing stuff for six months now. Sounds like he's having trouble with both solids and liquids, which is a bit unusual. He's also complaining about regurgitating undigested food, which means it's not getting through the digestive system properly. 

When they looked at his esophagus with a barium swallow study, they saw a dilated esophagus and something called a bird’s beak appearance at the lower esophageal sphincter. That bird’s beak thing sounds familiar. It’s often linked to a condition called achalasia, where the sphincter doesn’t open properly and the esophagus doesn’t move properly due to a lack of cholecystokinin, a hormone needed for sphincter function and esophageal peristalsis.

Now, thinking about manometry, it’s like checking the pressure and movement of the sphincter. In achalasia, the sphincter is usually not moving properly. So, if we look at the pressure readings, they might show lo

In [23]:
question = """A 67-year-old woman with a history of chronic kidney disease presents with progressive bone pain and muscle weakness. 
              Laboratory studies reveal hypocalcemia, hyperphosphatemia, and elevated parathyroid hormone levels. 
              Radiographic imaging shows subperiosteal bone resorption, particularly in the phalanges. 
              What is the underlying pathophysiological mechanism responsible for this patient's condition?"""

# Load the inference model using FastLanguageModel (Unsloth optimizes for speed)
FastLanguageModel.for_inference(model_lora)  # Unsloth has 2x faster inference!

# Tokenize the input question with a specific prompt format and move it to the GPU
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generate a response using LoRA fine-tuned model with specific parameters
outputs = model_lora.generate(
    input_ids=inputs.input_ids,          # Tokenized input IDs
    attention_mask=inputs.attention_mask, # Attention mask for padding handling
    max_new_tokens=1200,                  # Maximum length for generated response
    use_cache=True,                        # Enable cache for efficient generation
)

# Decode the generated response from tokenized format to readable text
response = tokenizer.batch_decode(outputs)

# Extract and print only the model's response part after "### Response:"
print(response[0].split("### Response:")[1])


<think>
Okay, let's think about this. We have a 67-year-old woman with chronic kidney disease, and she's showing symptoms like bone pain and muscle weakness. These symptoms seem to be progressing, which is worrying. Let's look at her labs. She's hypocalcemic, meaning she's not getting enough calcium. Her phosphorus levels are high, which is also a bit concerning. And her parathyroid hormone levels are up, which means her kidneys are working hard to try to maintain calcium levels. That's a bit of a puzzle.

Now, looking at the imaging, she's showing subperiosteal bone resorption, particularly in her phalanges. This sounds like something like osteitis or maybe some kind of metabolic bone disease. Since her parathyroid hormone is high, it's pointing towards something related to parathyroid activity.

I remember that chronic kidney disease can lead to secondary hyperparathyroidism because the kidneys are not working well and can't activate vitamin D properly. This leads to a loop where th

## Conclusion
Fine-tuning DeepSeek R1 significantly improved its ability to generate structured, concise, and medically accurate responses. By leveraging Unsloth, Hugging Face Transformers, and Kaggle’s GPU resources, we optimized training efficiency while enhancing the model’s clinical reasoning and diagnostic accuracy. The results demonstrate fine-tuning’s potential in medical Q&A, decision support, and automated documentation. Future work could involve larger datasets, reinforcement learning, or integrating real-world medical cases to further refine accuracy and adaptability, making the model a valuable tool for AI-driven medical assistance.