# Fine-tune Qwen 2.5 0.5B on FineTome

**Goal:** Create baseline Qwen model for comparison

This notebook trains Qwen on the **same FineTome dataset** used for Llama 1B baseline.

**Hyperparameters:** Using standard configurations (r=16, lr=2e-4, alpha=16)

**Justification:** Llama 1B's grid search showed minimal improvement (~0.4%) between configurations on the small sample size. Not worth the time to tune for Qwen.

In [None]:
%%capture
!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps trl peft accelerate bitsandbytes

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import os
os.environ["WANDB_DISABLED"] = "true"

Mounted at /content/drive


## Load Qwen 2.5 0.5B

In [None]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Qwen2.5-0.5B",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

print("âœ“ Qwen 2.5 0.5B loaded")

ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using

==((====))==  Unsloth 2025.11.6: Fast Qwen2 patching. Transformers: 4.57.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/521M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/171 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/617 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

âœ“ Qwen 2.5 0.5B loaded


## Add LoRA - Standard Hyperparameters

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    lora_dropout=0,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

print("âœ“ LoRA added (r=16, alpha=16)")

Unsloth 2025.11.6 patched 24 layers with 24 QKV layers, 24 O layers and 24 MLP layers.


âœ“ LoRA added (r=16, alpha=16)


## Load FineTome Dataset

Same as Llama 1B baseline

In [None]:
from datasets import load_dataset

print("Loading FineTome-100k...")
dataset = load_dataset("mlabonne/FineTome-100k", split="train")

print(f"Dataset size: {len(dataset)}")

Loading FineTome-100k...


README.md:   0%|          | 0.00/982 [00:00<?, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/117M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/100000 [00:00<?, ? examples/s]

Dataset size: 100000


## Format for Qwen (ChatML)

In [None]:
from unsloth.chat_templates import get_chat_template, standardize_sharegpt

tokenizer = get_chat_template(
    tokenizer,
    chat_template="chatml",
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False) for convo in convos]
    return {"text": texts}

dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched=True)

print("âœ“ Dataset formatted for Qwen")

Unsloth: Will map <|im_end|> to EOS = <|endoftext|>.


Unsloth: Standardizing formats (num_proc=2):   0%|          | 0/100000 [00:00<?, ? examples/s]

Map:   0%|          | 0/100000 [00:00<?, ? examples/s]

âœ“ Dataset formatted for Qwen


## Split Dataset

In [None]:
train_test = dataset.train_test_split(test_size=0.1, seed=42)
train_dataset = train_test['train']
val_dataset = train_test['test']

print(f"Train: {len(train_dataset)}")
print(f"Val:   {len(val_dataset)}")

Train: 90000
Val:   10000


## Training Configuration

In [None]:
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from trl import SFTTrainer
from unsloth import is_bfloat16_supported

output_dir = "/content/drive/MyDrive/lab2_models/qwen_finetome_model"

training_args = TrainingArguments(
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=5,
    max_steps=1000,
    learning_rate=2e-4,
    fp16=not is_bfloat16_supported(),
    bf16=is_bfloat16_supported(),
    logging_steps=10,
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=3407,
    output_dir=output_dir,
    save_strategy="steps",
    save_steps=200,
    save_total_limit=2,
    eval_strategy="steps",
    eval_steps=200,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    report_to="none",
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer),
    dataset_num_proc=2,
    packing=False,
    args=training_args,
)

print("âœ“ Trainer configured")

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/90000 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/10000 [00:00<?, ? examples/s]

âœ“ Trainer configured


## Train

In [None]:
print("="*80)
print("TRAINING QWEN ON FINETOME")
print("="*80)
print("\nThis creates the baseline Qwen model for comparison")
print("Expected time: ~20 minutes\n")

trainer_stats = trainer.train()

print("\nâœ“ Training complete")

The model is already on multiple devices. Skipping the move to device specified in `args`.


TRAINING QWEN ON FINETOME

This creates the baseline Qwen model for comparison
Expected time: ~20 minutes



==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 90,000 | Num Epochs = 1 | Total steps = 1,000
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 8,798,208 of 502,830,976 (1.75% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss,Validation Loss
200,0.9736,0.973145
400,0.945,0.961468
600,0.9132,0.954356


Unsloth: Not an error, but Qwen2ForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Step,Training Loss,Validation Loss
200,0.9736,0.973145
400,0.945,0.961468
600,0.9132,0.954356
800,0.9681,0.949581
1000,0.8696,0.947371


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).



âœ“ Training complete


## Save Model

In [None]:
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"âœ“ Qwen FineTome model saved to: {output_dir}")

âœ“ Qwen FineTome model saved to: /content/drive/MyDrive/lab2_models/qwen_finetome_model


## Summary

âœ… Qwen 2.5 0.5B fine-tuned on FineTome-100k

âœ… Standard hyperparameters used (no tuning)

**Next:** Train Qwen on code documentation dataset, then compare all models!