# 🧠 COLAB SESSION 1: Training a Lightweight LLM with Unsloth.ai  

**Base Model:** `unsloth/smollm2-135m`  
**Objective:** Fine-tune the model for **Mathematical Reasoning** tasks  

---

## 📘 Overview  
This Colab notebook demonstrates how to fine-tune a compact open-source language model using **Unsloth.ai**.  
The goal is to improve the model’s performance on problem-solving and reasoning-based mathematical prompts while maintaining high efficiency on limited GPU resources.  

Unsloth provides optimized utilities for model loading, dataset processing, and parameter-efficient or full fine-tuning — making it ideal for quick experimentation with smaller models.  

---

## ⚙️ Setup  
We begin by setting up the Colab environment and installing the required dependencies, including:  

- [`unsloth`](https://github.com/unslothai/unsloth) – lightweight LLM fine-tuning toolkit  
- `datasets` – to load and preprocess training data  
- `transformers` – Hugging Face model utilities  
- `accelerate` – optimized training and device management  
- `bitsandbytes` – quantization support for memory efficiency  
- `wandb` – experiment tracking and logging  
- `huggingface_hub` – model versioning and deployment  

After installation, we configure authentication for both **Weights & Biases** and **Hugging Face Hub** to enable training metrics and model uploads.  

---

## 🧩 Goal  
By the end of this session, you’ll have:  
- A **fine-tuned SmolLM2-135M** model specialized in mathematical reasoning  
- A **complete training pipeline** you can reuse for other tasks or datasets  
- The ability to **publish and share** your model on Hugging Face Hub  

---

> 💡 *Tip:* For faster iterations, you can reduce the dataset size or number of epochs. Unsloth automatically optimizes for available GPU memory without requiring complex configuration.


## ⚙️ Environment Setup & Library Installation  

Before fine-tuning, we’ll set up our Colab environment with all the required dependencies.  
Unsloth works seamlessly with the Hugging Face ecosystem and a few additional optimization libraries for efficient model training.


In [1]:
!pip install unsloth datasets transformers accelerate bitsandbytes wandb huggingface_hub

Collecting unsloth
  Downloading unsloth-2025.11.3-py3-none-any.whl.metadata (61 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/61.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.8/61.8 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Collecting bitsandbytes
  Downloading bitsandbytes-0.48.2-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting unsloth_zoo>=2025.11.4 (from unsloth)
  Downloading unsloth_zoo-2025.11.4-py3-none-any.whl.metadata (32 kB)
Collecting tyro (from unsloth)
  Downloading tyro-0.9.35-py3-none-any.whl.metadata (12 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Downloading xformers-0.0.33.post1-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (1.2 kB)
Collecting datasets
  Downloading datasets-4.3.0-py3-none-any.whl.metadata (18 kB)
Collecting trl!=0.19.0,<=0.23.0,>=0.18.2 (from unsloth)
  Downloading trl-0.23.0-py3-none-any.whl.metadata (11 kB)
Collecting pyarrow>=21.0

## 🧩 Importing Libraries & Initializing the Environment  

With the required packages installed, we can now import the main libraries that will power our fine-tuning workflow.  
Each library serves a distinct role in training, tracking, and managing the model lifecycle.

In [None]:
import unsloth
from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from huggingface_hub import login
import wandb

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


## 🔐 Authentication & Token Setup  

Before starting fine-tuning, we need to securely log in to both **Hugging Face Hub** (for model management) and **Weights & Biases (W&B)** (for experiment tracking).  
In Google Colab, we can safely store and retrieve access tokens using the built-in `userdata` feature.

In [None]:
from google.colab import userdata
from huggingface_hub import login
import wandb

# Retrieve stored tokens
hf_token = userdata.get("HGFaceApi")
wb_token = userdata.get("wb_token")

# Authenticate with both platforms
login(hf_token)
wandb.login(key=wb_token)

# Initialize a W&B tracking session
run = wandb.init(
    project="Full-Finetuning-SmolLM2-135M",
    job_type="training",
    anonymous="allow"
)

## ⚡ Version Compatibility Setup  

To ensure stable training and avoid runtime errors, it’s best to install specific library versions that are known to work seamlessly together.  
Unsloth is actively developed, and aligning it with the correct versions of **Transformers**, **PyTorch**, and **Accelerate** guarantees smooth performance for full fine-tuning.

In [5]:
!pip install --upgrade --force-reinstall --no-cache-dir \
  "unsloth==2025.10.1" \
  "transformers==4.46.3" \
  "torch==2.5.1" \
  "accelerate>=1.0.1"


Collecting unsloth==2025.10.1
  Downloading unsloth-2025.10.1-py3-none-any.whl.metadata (53 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.2/53.2 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers==4.46.3
  Downloading transformers-4.46.3-py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.1/44.1 kB[0m [31m295.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torch==2.5.1
  Downloading torch-2.5.1-cp312-cp312-manylinux1_x86_64.whl.metadata (28 kB)
Collecting accelerate>=1.0.1
  Downloading accelerate-1.11.0-py3-none-any.whl.metadata (19 kB)
Collecting unsloth_zoo>=2025.10.1 (from unsloth==2025.10.1)
  Downloading unsloth_zoo-2025.11.4-py3-none-any.whl.metadata (32 kB)
Collecting xformers>=0.0.27.post2 (from unsloth==2025.10.1)
  Downloading xformers-0.0.33.post1-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (1.2 kB)
Collecting bitsandbytes (from unsloth==2025.10.1)
  Downloading bitsandbyte

## 🧠 Model Loading & Configuration  

With the environment ready and dependencies aligned, the next step is to **load the base model** and tokenizer that we’ll fine-tune.  
We’ll be using **SmolLM2-135M**, a compact and efficient open-source language model designed for lightweight reasoning tasks.

In [6]:
max_seq_length = 2048
dtype = None  # Uses the default data type for your device (e.g., float16 or bfloat16)

# Load model and tokenizer with full fine-tuning capability
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/smollm2-135m",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=False,        # Disable quantization → full precision training
    full_finetuning=True,      # Enable training of all layers
    token=hf_token,
)

print("✅ Model is ready for full fine-tuning!")


==((====))==  Unsloth 2025.11.3: Fast Llama patching. Transformers: 4.57.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Float16 full finetuning uses more memory since we upcast weights to float32.


model.safetensors:   0%|          | 0.00/269M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/158 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/742 [00:00<?, ?B/s]

✅ Model is ready for full fine-tuning!


## 📚 Dataset Preparation & Prompt Formatting  

In this step, we’ll prepare the training dataset for fine-tuning.  
We’ll use a **subset of the Alpaca dataset** — a widely adopted instruction-following dataset — to train the model for structured reasoning and response generation.

### 🔹 Load the Dataset  

To keep the training process efficient for Google Colab environments, we’ll load only **500 examples** from the Alpaca dataset:

In [7]:
from datasets import load_dataset

# Load a small portion of the Alpaca dataset for quick experimentation
dataset = load_dataset("tatsu-lab/alpaca", split="train[:500]")

# Template used to organize each example into instruction, input, and response
prompt_template = """Below is an instruction describing a task, along with an input that adds context.
Write an appropriate response to complete the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token

def prepare_prompt_examples(examples):
    """Formats the Alpaca samples into the defined prompt template."""
    texts = []
    for instruction, input_text, output_text in zip(
        examples["instruction"], examples["input"], examples["output"]
    ):
        if input_text.strip() == "":
            formatted_text = prompt_template.format(instruction, "N/A", output_text) + EOS_TOKEN
        else:
            formatted_text = prompt_template.format(instruction, input_text, output_text) + EOS_TOKEN
        texts.append(formatted_text)
    return {"text": texts}

# Apply formatting to the dataset
dataset = dataset.map(prepare_prompt_examples, batched=True)

print("✅ Example of formatted prompt:\n", dataset["text"][0][:400])

README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001-a09b74b3ef9c3b(…):   0%|          | 0.00/24.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/52002 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

✅ Example of formatted prompt:
 Below is an instruction describing a task, along with an input that adds context.
Write an appropriate response to complete the request.

### Instruction:
Give three tips for staying healthy.

### Input:
N/A

### Response:
1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. 
2. Exercise regularly to keep your body active and strong. 
3. Get enough sleep and maintain a c


## 🏋️ Model Training Configuration & Fine-Tuning Setup  

With the dataset prepared, we can now configure the **training pipeline** using the **Unsloth** and **TRL (Transformers Reinforcement Learning)** libraries.  
We’ll use the `SFTTrainer` — a simple yet powerful fine-tuning wrapper for supervised instruction training.

In [8]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    args = TrainingArguments(
        per_device_train_batch_size = 8,
        gradient_accumulation_steps = 1,
        num_train_epochs = 3,
        warmup_steps = 5,
        learning_rate = 1e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 5,
        output_dir = "outputs",
        save_total_limit = 1,
        report_to = "wandb"
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/500 [00:00<?, ? examples/s]

## 🚀 Model Training & Performance Monitoring  

Now that the trainer is configured, we can begin fine-tuning the model.  
This step performs forward and backward passes through the dataset, gradually adjusting the model weights to improve its ability to follow mathematical and reasoning-based instructions.

### 🔹 Start Training  

Before launching the training process, let’s verify that the GPU is active and check its available memory:

In [9]:
gpu = torch.cuda.get_device_properties(0)
print(f"Using GPU: {gpu.name} ({round(gpu.total_memory/1e9, 2)} GB VRAM)")

trainer_stats = trainer.train()

The model is already on multiple devices. Skipping the move to device specified in `args`.


Using GPU: Tesla T4 (15.83 GB VRAM)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 500 | Num Epochs = 3 | Total steps = 189
O^O/ \_/ \    Batch size per device = 8 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (8 x 1 x 1) = 8
 "-____-"     Trainable parameters = 134,515,584 of 134,515,584 (100.00% trained)


Step,Training Loss
5,2.51
10,1.4536
15,1.1554
20,1.0787
25,1.2468
30,1.0734
35,1.2959
40,1.2381
45,1.0855
50,1.2075


Unsloth: Will smartly offload gradients to save VRAM!


## ✅ Training Completion & Resource Summary  

After running the training process, we can print a short summary that confirms successful completion and provides key runtime statistics such as training duration and peak GPU usage.

In [10]:
trainer_stats = trainer.train()

print("✅ Training completed successfully!")
used_mem = round(torch.cuda.max_memory_reserved() / 1e9, 3)
print(f"⏱ Runtime: {round(trainer_stats.metrics['train_runtime']/60, 2)} minutes")
print(f"💾 Peak reserved GPU memory: {used_mem} GB")

The model is already on multiple devices. Skipping the move to device specified in `args`.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 500 | Num Epochs = 3 | Total steps = 189
O^O/ \_/ \    Batch size per device = 8 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (8 x 1 x 1) = 8
 "-____-"     Trainable parameters = 134,515,584 of 134,515,584 (100.00% trained)


Step,Training Loss
5,0.6632
10,0.4603
15,0.4384
20,0.4567
25,0.5314
30,0.4621
35,0.5938
40,0.6108
45,0.5172
50,0.5624


✅ Training completed successfully!
⏱ Runtime: 1.28 minutes
💾 Peak reserved GPU memory: 3.02 GB


## 🔮 Inference & Model Testing

With training complete, let’s switch the model to **inference mode** and generate an answer for a math prompt.  
We’ll also standardize the compute **dtype** (bfloat16 or float16) for stable, fast decoding.

In [11]:
# Ensure consistent dtype for inference
from unsloth import FastLanguageModel, is_bfloat16_supported
import torch
from IPython.display import Markdown

# 1) Put the model in inference mode (applies Unsloth speedups)
FastLanguageModel.for_inference(model)

# 2) Pick a single dtype and move the model to it
inference_dtype = torch.bfloat16 if is_bfloat16_supported() else torch.float16
model = model.to(dtype=inference_dtype)

# 3) Build the prompt (use your existing template variable)
test_prompt = prompt_template.format(
    "If the system of equations 3x + y = a and 2x + 5y = 2a has a solution when x = 2, compute a.",
    "",
    "",
)

# 4) Tokenize and move tensors to the same device as the model
inputs = tokenizer([test_prompt], return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}

# 5) Generate under autocast with the chosen dtype
with torch.cuda.amp.autocast(dtype=inference_dtype):
    outputs = model.generate(**inputs, max_new_tokens=150)  # keep use_cache=True (default) for speed

decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
response = decoded.split("### Response:")[1].strip() if "### Response:" in decoded else decoded
Markdown(response)


  with torch.cuda.amp.autocast(dtype=inference_dtype):


If the system of equations 3x + y = a and 2x + 5y = 2a has a solution when x = 2, compute a.

## 🚀 Publish to Hugging Face Hub

After training and testing, you can publish your model so it’s easy to load and share.  
This cell logs into your Hugging Face account, creates (or reuses) a repository under **your username**, saves the model/tokenizer locally, pushes them to the Hub, and writes a minimal model card.

In [14]:
# 🚀 Push your fine-tuned model and tokenizer to your own Hugging Face account
!pip -q install -U huggingface_hub

from huggingface_hub import login, HfApi, whoami
import os, shutil, peft

# 1️⃣  Log in interactively (recommended)
#     It will open a prompt in Colab; paste your HF token there.
#     (Get it from https://huggingface.co/settings/tokens)
login()

# 2️⃣  Confirm your username
me = whoami()
hf_username = me.get("name") or me.get("username")
print("Logged in as:", hf_username)

# 3️⃣  Set model names — change base_name if you want a different repo name
base_name = "SmolLM2-135M-Math"
local_dir = base_name
repo_id = f"{hf_username}/{base_name}"   # ✅ uploads to your own profile
visibility_private = False               # True → private repo

# 4️⃣  Detect if model is LoRA (optional)
is_lora = isinstance(model, peft.PeftModel)

# 5️⃣  Create repo if it doesn’t exist
api = HfApi()
api.create_repo(repo_id, private=visibility_private, exist_ok=True)

# 6️⃣  Save locally
if os.path.isdir(local_dir):
    shutil.rmtree(local_dir)
os.makedirs(local_dir, exist_ok=True)

# some tokenizers lack pad_token
if getattr(tokenizer, "pad_token", None) is None and getattr(tokenizer, "eos_token", None):
    tokenizer.pad_token = tokenizer.eos_token

model.save_pretrained(local_dir, safe_serialization=True)
tokenizer.save_pretrained(local_dir)
print(f"✅ Saved to local folder: {local_dir}")

# 7️⃣  Push to the Hub
print(f"📤 Uploading to {repo_id} ... (private={visibility_private})")
model.push_to_hub(repo_id, private=visibility_private, safe_serialization=True)
tokenizer.push_to_hub(repo_id, private=visibility_private)
print("✅ Upload complete!")

# 8️⃣  Optional: add a simple model card (FIXED call signature)
card = f"""---
license: apache-2.0
language:
- en
tags:
- unsloth
- smollm2
- {"lora" if is_lora else "full-finetune"}
---

# {base_name}

Finetuned with **Unsloth** on Google Colab
Base: `unsloth/smollm2-135m`
"""

with open("README.md", "w", encoding="utf-8") as f:
    f.write(card)

# ✅ use keyword-only args + repo_type="model"
api.upload_file(
    path_or_fileobj="README.md",
    path_in_repo="README.md",
    repo_id=repo_id,
    repo_type="model",
    commit_message="Add model card",
)
print("📝 Model card uploaded.")


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Logged in as: Yugm1312
✅ Saved to local folder: SmolLM2-135M-Math
📤 Uploading to Yugm1312/SmolLM2-135M-Math ... (private=False)


Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...5M-Math/model.safetensors:  12%|#2        | 33.5MB /  269MB            

No files have been modified since last commit. Skipping to prevent empty commit.


Saved model to https://huggingface.co/Yugm1312/SmolLM2-135M-Math


No files have been modified since last commit. Skipping to prevent empty commit.


✅ Upload complete!
📝 Model card uploaded.
