## Setting Up

In [None]:
%%capture
%pip install -U transformers==4.52.1
%pip install -U datasets
%pip install -U accelerate
%pip install -U peft
%pip install -U trl
%pip install -U bitsandbytes

In [None]:
from huggingface_hub import login
import os

hf_token = os.environ.get("HF_TOKEN")
login(hf_token)

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


## Loading the model and tokenizer

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load tokenizer & model

model_dir = "unsloth/Magistral-Small-2506-bnb-4bit"

tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_dir,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

model.config.use_cache = False
model.config.pretraining_tp = 1

tokenizer_config.json:   0%|          | 0.00/201k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/21.4k [00:00<?, ?B/s]

chat_template.jinja:   0%|          | 0.00/2.73k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/165k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.18G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/216 [00:00<?, ?B/s]

In [None]:
!nvidia-smi

Thu Jun 12 09:08:30 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05             Driver Version: 550.127.05     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-80GB          On  |   00000000:4A:00.0 Off |                    0 |
| N/A   28C    P0             82W /  400W |   22943MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

## Loading and processing the dataset

In [None]:
train_prompt_style = """
Please answer with one of the options in the bracket. Write reasoning in between <analysis></analysis>. Write the answer in between <answer></answer>.
### Question:
{}

### Response:
{}"""

In [None]:
EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN

def formatting_prompts_func(examples):
    inputs = examples["input"]
    outputs = examples["output"]
    texts = []
    for question, response in zip(inputs, outputs):
        # Remove the "Q:" prefix from the question
        question = question.replace("Q:", "")

        # Append the EOS token to the response if it's not already there
        if not response.endswith(tokenizer.eos_token):
            response += tokenizer.eos_token

        text = train_prompt_style.format(question, response)
        texts.append(text)
    return {"text": texts}

In [None]:
from datasets import load_dataset

dataset = load_dataset(
    "mamachang/medical-reasoning",
    split="train",
    trust_remote_code=True,
)
dataset = dataset.map(
    formatting_prompts_func,
    batched=True,
)
print(dataset["text"][10])

instruction-dataset-w-reasoning2.json:   0%|          | 0.00/7.26M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/3702 [00:00<?, ? examples/s]

Map:   0%|          | 0/3702 [00:00<?, ? examples/s]


Please answer with one of the options in the bracket. Write reasoning in between <analysis></analysis>. Write the answer in between <answer></answer>.
### Question:
A research group wants to assess the relationship between childhood diet and cardiovascular disease in adulthood. A prospective cohort study of 500 children between 10 to 15 years of age is conducted in which the participants' diets are recorded for 1 year and then the patients are assessed 20 years later for the presence of cardiovascular disease. A statistically significant association is found between childhood consumption of vegetables and decreased risk of hyperlipidemia and exercise tolerance. When these findings are submitted to a scientific journal, a peer reviewer comments that the researchers did not discuss the study's validity. Which of the following additional analyses would most likely address the concerns about this study's design?? 
{'A': 'Blinding', 'B': 'Crossover', 'C': 'Matching', 'D': 'Stratification',

In [None]:
from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

## Model inference before fine-tuning

In [None]:
inference_prompt_style = """
Please answer with one of the options in the bracket. Write reasoning in between <analysis></analysis>. Write the answer in between <answer></answer>.

### Question:
{}

### Response:
<analysis>
"""

In [None]:
question = dataset[10]['input']
question = question.replace("Q:", "")

inputs = tokenizer(
    [inference_prompt_style.format(question) + tokenizer.eos_token],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=512,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response[0].split("### Response:")[1])


<analysis>
 to 5 years old, and their diets are recorded for 1 year. The participants are then assessed 20 years later for cardiovascular disease. A statistically significant association is found between childhood consumption of vegetables and decreased risk of hyperlipidemia and exercise tolerance. A peer reviewer comments that the researchers did not discuss the study's validity.

The key issue here is the validity of the study. Validity refers to whether the study measures what it claims to measure and whether the findings are generalizable. In a prospective cohort study, validity is particularly important because it relies on self-reported data (diet) and long-term follow-up (20 years).

The most critical aspect of validity in this context is ensuring that the observed association is not due to confounding factors (e.g., other dietary habits, physical activity levels, or socioeconomic status). To address this, the researchers should discuss how they controlled for potential confou

## Train the model


In [None]:
from peft import LoraConfig, get_peft_model

# LoRA config
peft_config = LoraConfig(
    lora_alpha=16,                           # Scaling factor for LoRA
    lora_dropout=0.05,                       # Add slight dropout for regularization
    r=64,                                    # Rank of the LoRA update matrices
    bias="none",                             # No bias reparameterization
    task_type="CAUSAL_LM",                   # Task type: Causal Language Modeling
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],  # Target modules for LoRA
)

model = get_peft_model(model, peft_config)

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments


# Training Arguments
training_arguments = TrainingArguments(
    output_dir="Magistral-Medical-Reasoning",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=2,
    optim="paged_adamw_32bit",
    num_train_epochs=1,
    logging_steps=0.2,
    warmup_steps=10,
    logging_strategy="steps",
    learning_rate=2e-4,
    fp16=False,
    bf16=False,
    group_by_length=True,
    report_to="none"
)

# Initialize the Trainer
trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=dataset,
    peft_config=peft_config,
    data_collator=data_collator,
)

Converting train dataset to ChatML:   0%|          | 0/3702 [00:00<?, ? examples/s]

Adding EOS to train dataset:   0%|          | 0/3702 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/3702 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/3702 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Let's train the model! To resume a training run, set `trainer.train(resume_from_checkpoint = True)`

In [None]:
import gc, torch
gc.collect()
torch.cuda.empty_cache()
model.config.use_cache = False
trainer.train()

Step,Training Loss
371,0.9482
742,0.9087
1113,0.8809
1484,0.8547


TrainOutput(global_step=1851, training_loss=0.8885096586568494, metrics={'train_runtime': 4168.7115, 'train_samples_per_second': 0.888, 'train_steps_per_second': 0.444, 'total_flos': 2.26971228206592e+17, 'train_loss': 0.8885096586568494})

## Model inference after fine-tuning

In [None]:
question = dataset[10]['input']
question = question.replace("Q:", "")

inputs = tokenizer(
    [inference_prompt_style.format(question,) + tokenizer.eos_token],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=512,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response[0].split("### Response:")[1])


<analysis>
Analysis:

This is a prospective cohort study looking at the relationship between childhood diet and cardiovascular disease in adulthood. The key issue with the validity of this study is confounding. The researchers did not account for other factors that could influence cardiovascular disease risk in adulthood besides childhood diet. 

To address confounding, the researchers should have stratified the analysis by potential confounders like family history, physical activity levels, and smoking status. This would allow them to see if the relationship between diet and disease persists even when accounting for these other variables. 

Blinding, crossover, matching, and randomization do not address the main validity concern in this study design.
</analysis>
<answer>
D: Stratification
</answer>


In [None]:
print(dataset[10]['output'])

<analysis>

This is a question about assessing the validity of a prospective cohort study. The study found an association between childhood diet and cardiovascular disease in adulthood. The peer reviewer is concerned that the researchers did not discuss the validity of the study design. 

To address concerns about validity in a prospective cohort study, we need to consider potential confounding factors that could influence the results. The additional analysis suggested should help control for confounding.
</analysis>
<answer>
D: Stratification
</answer>


In [None]:
question = dataset[100]['input']
question = question.replace("Q:", "")

inputs = tokenizer(
    [inference_prompt_style.format(question) + tokenizer.eos_token],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=512,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response[0].split("### Response:")[1])


<analysis>
 analysis>

This is a clinical vignette describing a 55-year-old man with burning and shooting pain in his feet and lower legs that worsens at night. He has a history of type 2 diabetes mellitus and hypertension. 

The key findings are:
- Burning and shooting pain in feet and lower legs 
- Pain worsens at night
- History of type 2 diabetes mellitus

This presentation is most consistent with diabetic peripheral neuropathy. The pain distribution, timing, and history of diabetes point towards a distal symmetric sensorimotor polyneuropathy as the etiology. The other options can be ruled out based on the clinical presentation.
</analysis>
<answer>
D: Distal symmetric sensorimotor polyneuropathy
</answer>


In [None]:
print(dataset[100]['output'])

<analysis>

This patient has a history of type 2 diabetes mellitus and is experiencing burning and shooting pains in his feet and lower legs that are worse at night and have progressed over the past 6 months. This presentation is most consistent with distal symmetric sensorimotor polyneuropathy, a type of diabetic neuropathy that affects the distal extremities in a length-dependent pattern. Autonomic neuropathy, cranial nerve neuropathy, and radiculopathy would not explain the symmetric distal distribution. Isolated peripheral neuropathy would not be expected in the setting of longstanding diabetes.
</analysis>
<answer>
D: Distal symmetric sensorimotor polyneuropathy
</answer>


### Saving finetuned models

In [None]:
new_model_name = "kingabzpro/Magistral-Small-Medical-QA"

trainer.model.push_to_hub(new_model_name)
trainer.processing_class.push_to_hub(new_model_name)

Uploading...:   0%|          | 0.00/1.48G [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

Uploading...:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/kingabzpro/Magistral-Small-Medical-QA/commit/093367ec7f9fbb733572d96dbde6a1430f0312a0', commit_message='Upload tokenizer', commit_description='', oid='093367ec7f9fbb733572d96dbde6a1430f0312a0', pr_url=None, repo_url=RepoUrl('https://huggingface.co/kingabzpro/Magistral-Small-Medical-QA', endpoint='https://huggingface.co', repo_type='model', repo_id='kingabzpro/Magistral-Small-Medical-QA'), pr_revision=None, pr_num=None)

## Loading the Adopter and testing the model

In [None]:
del model
del trainer
torch.cuda.empty_cache()

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

# Base model
base_model_id = "unsloth/Magistral-Small-2506-bnb-4bit"

# Your fine-tuned LoRA adapter repository
lora_adapter_id = "kingabzpro/Magistral-Small-Medical-QA"

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)

# Attach the LoRA adapter
model = PeftModel.from_pretrained(
    base_model,
    lora_adapter_id,
    device_map="auto",
    trust_remote_code=True,
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

adapter_config.json:   0%|          | 0.00/870 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/1.48G [00:00<?, ?B/s]

In [None]:
# Inference example
prompt = """
Please answer with one of the options in the bracket. Write reasoning in between <analysis></analysis>. Write the answer in between <answer></answer>.

### Question:
A research group wants to assess the relationship between childhood diet and cardiovascular disease in adulthood.
A prospective cohort study of 500 children between 10 to 15 years of age is conducted in which the participants' diets are recorded for 1 year and then the patients are assessed 20 years later for the presence of cardiovascular disease.
A statistically significant association is found between childhood consumption of vegetables and decreased risk of hyperlipidemia and exercise tolerance.
When these findings are submitted to a scientific journal, a peer reviewer comments that the researchers did not discuss the study's validity.
Which of the following additional analyses would most likely address the concerns about this study's design?
{'A': 'Blinding', 'B': 'Crossover', 'C': 'Matching', 'D': 'Stratification', 'E': 'Randomization'},
### Response:
<analysis>

"""

inputs = tokenizer(
    [prompt + tokenizer.eos_token],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response[0].split("### Response:")[1])


<analysis>

Analysis:

This is a prospective cohort study looking at the relationship between childhood diet and cardiovascular disease in adulthood. The peer reviewer is concerned about the validity of the study's findings. To address concerns about validity in a prospective cohort study, we need to consider potential confounding factors and selection bias. 

Choice A, blinding, is not relevant since this is an observational study, not a clinical trial. 

Choice B, crossover, is also not applicable since this is a cohort study.

Choice C, matching, could help control for confounding if patients were matched on relevant factors. However, the question does not indicate matching was done.

Choice D, stratification, could help control for confounding by stratifying by key variables. This is a reasonable option.

Choice E, randomization, is the best option. Randomizing patients to different diets would help control for confounding and selection bias. Randomization is the gold standard for