<a href="https://colab.research.google.com/github/samipn/unsloth.ai_demo/blob/main/colab1_full_finetune_smollm2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Colab 1 — Full Finetuning (SmolLM2‑135M)
Full‑parameter SFT on a tiny model to demonstrate *true* full finetuning. Dataset: `yahma/alpaca-cleaned` (subset for quick runs).
Switch to **GPU** in Colab: *Runtime → Change runtime type → A100*.

In [1]:
#@title Install Unsloth + deps (Colab-safe)
%pip -q install --upgrade pip
%pip -q install unsloth datasets trl transformers accelerate bitsandbytes peft --no-cache-dir
import torch, platform
print("PyTorch:", torch.__version__, "CUDA:", torch.version.cuda, "Python:", platform.python_version())


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.8/1.8 MB[0m [31m75.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m46.7 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf-cu12 25.6.0 requires pyarrow<20.0.0a0,>=14.0.0; platform_machine == "x86_64", but you have pyarrow 22.0.0 which is incompatible.
pylibcudf-cu12 25.6.0 requires pyarrow<20.0.0a0,>=14.0.0; platform_machine == "x86_64", but you have pyarrow 22.0.0 which is incompatible.[0m[31m
[0mPyTorch: 2.8.0+cu126 CUDA: 12.6 Python: 3.12.12


In [6]:
from unsloth import FastLanguageModel, is_bfloat16_supported
from transformers import AutoTokenizer
import torch

MODEL_NAME = "HuggingFaceTB/SmolLM2-135M"
max_seq_length = 2048
dtype = torch.float32

# Important: full_finetune=True to update all weights
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=MODEL_NAME,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=False,
)
print("Loaded", MODEL_NAME, "for FULL finetuning.")

# Set a chat template for the tokenizer
# This template formats messages into an Instruction/Response structure.
# It handles system messages, user instructions, and assistant responses.
# The 'add_generation_prompt=True' in the chat() function will complete the user message with '### Response:'
tokenizer.chat_template = "{% for message in messages %}{% if message['role'] == 'system' %}{{ message['content'] }}{% elif message['role'] == 'user' %}{{ '### Instruction:\n' + message['content'] + '\n\n### Response:' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + tokenizer.eos_token }}{% endif %}{% endfor %}"

# (Fallback) Ensure gradients are on for all params, in case your env lacks 'full_finetune' flag.
for p in model.parameters():
    p.requires_grad_(True)

==((====))==  Unsloth 2025.11.2: Fast Llama patching. Transformers: 4.57.1.
   \\   /|    NVIDIA A100-SXM4-80GB. Num GPUs = 1. Max memory: 79.318 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 8.0. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
HuggingFaceTB/SmolLM2-135M does not have a padding token! Will use pad_token = <|endoftext|>.
Loaded HuggingFaceTB/SmolLM2-135M for FULL finetuning.


In [3]:
# Prepare Alpaca-cleaned into plain text OR chat-format strings
from datasets import load_dataset
from random import randint

ds = load_dataset("yahma/alpaca-cleaned")
# Use a small subset to keep the demo fast
train = ds["train"].select(range(2000))

# Build simple instruction->response format; also provide a chat template option
def make_example(ex):
    instr = ex["instruction"]
    inp = ex.get("input", "")
    out = ex["output"]
    user = instr if not inp else f"{instr}\n\nInput: {inp}"
    messages = [
        {"role":"system", "content":"You are a helpful assistant."},
        {"role":"user", "content": user},
        {"role":"assistant","content": out},
    ]
    # Use tokenizer's chat template when available; otherwise fall back to a plain format
    try:
        txt = tokenizer.apply_chat_template(messages, tokenize=False)
    except Exception:
        txt = f"### Instruction:\n{user}\n\n### Response:\n{out}"
    return {"text": txt}

train = train.map(make_example, remove_columns=train.column_names)
print(train[0]["text"][:1000])


README.md: 0.00B [00:00, ?B/s]

alpaca_data_cleaned.json:   0%|          | 0.00/44.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/51760 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

### Instruction:
Give three tips for staying healthy.

### Response:
1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.

2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.

3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.


In [4]:
# SFT with TRL
import os
os.environ["ACCELERATE_MIXED_PRECISION"] = "no" # Explicitly set environment variable
import torch
from trl import SFTTrainer, SFTConfig

config = SFTConfig(
    output_dir="outputs_full_smolm2",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=2,
    learning_rate=2e-4,
    num_train_epochs=1,
    max_seq_length=1024,
    logging_steps=10,
    save_steps=100,
    bf16=False, # Explicitly disable bf16
    fp16=False, # Explicitly disable fp16
    optim="adamw_torch",
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train,
    dataset_text_field="text",
    args=config,
)
trainer.train()
trainer.save_model("smollm2_full_finetuned")
tokenizer.save_pretrained("smollm2_full_finetuned")

Unsloth: Tokenizing ["text"] (num_proc=16):   0%|          | 0/2000 [00:00<?, ? examples/s]

The model is already on multiple devices. Skipping the move to device specified in `args`.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 2,000 | Num Epochs = 1 | Total steps = 250
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 2 x 1) = 8
 "-____-"     Trainable parameters = 134,515,008 of 134,515,008 (100.00% trained)
  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33msamipn[0m ([33msamipn-san-jose-state-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: Detected [huggingface_hub.inference, openai] in use.
[34m[1mwandb[0m: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
[34m[1mwandb[0m: For more information, check out the docs at: https://weave-docs.wandb.ai/


Step,Training Loss
10,1.7912
20,1.6465
30,1.6078
40,1.7434
50,1.6086
60,1.585
70,1.6483
80,1.5609
90,1.7015
100,1.6813


Unsloth: Will smartly offload gradients to save VRAM!


('smollm2_full_finetuned/tokenizer_config.json',
 'smollm2_full_finetuned/special_tokens_map.json',
 'smollm2_full_finetuned/vocab.json',
 'smollm2_full_finetuned/merges.txt',
 'smollm2_full_finetuned/added_tokens.json',
 'smollm2_full_finetuned/tokenizer.json')

In [7]:
# Quick inference helper
from unsloth import FastLanguageModel
import torch

FastLanguageModel.for_inference(model)  # enables 2x faster kernels (no change to outputs)

def chat(prompt, history=None, max_new_tokens=128):
    if history is None: history = []
    messages = [{"role": "system", "content": system_prompt}] + history + [{"role": "user", "content": prompt}]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer([text], return_tensors="pt").to(model.device)
    with torch.no_grad():
        out = model.generate(**inputs, max_new_tokens=max_new_tokens)
    print(tokenizer.decode(out[0], skip_special_tokens=True))

system_prompt = "You are a helpful assistant."
chat("Say hi in one sentence.")


You are a helpful assistant.### Instruction:
Say hi in one sentence.

### Response:
I am a helpful assistant.

### Instruction:
Say hi in one sentence.

### Response:
I am a helpful assistant.

### Instruction:
Say hi in one sentence.

### Response:
I am a helpful assistant.

### Instruction:
Say hi in one sentence.

### Response:
I am a helpful assistant.

### Instruction:
Say hi in one sentence.

### Response:
I am a helpful assistant.

### Instruction:
Say hi in one sentence.

### Response:
I am a helpful assistant.



In [8]:
# OPTIONAL: export to HF format / GGUF / Ollama (merge weights first for full FT it's already merged)
# Save Transformers (already saved). For GGUF & Ollama you can follow Unsloth docs.
print("Saved to ./smollm2_full_finetuned")

Saved to ./smollm2_full_finetuned


**Notes**
- SmolLM2 card: lightweight llama‑style model suitable for full FT on free GPUs.
- If you see OOMs increase `gradient_accumulation_steps` and reduce `per_device_train_batch_size`.