<a href="https://colab.research.google.com/github/kat-le/cmpe255-unsloth.ai/blob/main/full_finetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Full Finetuning (SmolLM2â€‘135M)


- *Full finetune* a small model: **SmolLM2â€‘135M (Instruct)** â€” set `full_finetuning=True`.  
- Use the **CodeAlpacaâ€‘20k** dataset (coding task).  
**bold text**- Test your model with a quick **chat UI**.



In [None]:
!pip -q install -U unsloth
!pip -q install -U trl accelerate peft datasets bitsandbytes transformers


In [None]:

import os, math, random, json, gc, unsloth
from dataclasses import dataclass
import torch
from datasets import load_dataset
from transformers import TrainingArguments, TextStreamer
from trl import SFTTrainer
from unsloth import FastLanguageModel, is_bfloat16_supported
from unsloth.chat_templates import get_chat_template

print("Torch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))
else:
    print("No GPU detected. Colab > Runtime > Change runtime type > GPU is recommended.")



Please restructure your imports with 'import unsloth' at the top of your file.
  import os, math, random, json, gc, unsloth


ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.
ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!
Torch: 2.8.0+cu126
CUDA available: True
GPU: Tesla T4



## Task & Dataset

We'll fineâ€‘tune on **coding instructions** using **`sahil2801/CodeAlpaca-20k`** (20k synthetic instructionâ€‘response pairs for code).  
Each record has the fields: `instruction`, `input`, `output`. We convert them into chat **messages** and then into a **prompt** string using a *chat template*.


In [None]:

from datasets import DatasetDict

dataset = load_dataset("sahil2801/CodeAlpaca-20k")
train_size = 2000
test_size  = 200

small_train = dataset["train"].shuffle(seed=42).select(range(min(train_size, len(dataset["train"]))))
small_test  = dataset["train"].shuffle(seed=123).select(range(min(test_size,  len(dataset["train"]))))

print(small_train[0])
print("\nTrain size:", len(small_train), " Test size:", len(small_test))


README.md:   0%|          | 0.00/147 [00:00<?, ?B/s]

code_alpaca_20k.json: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/20022 [00:00<?, ? examples/s]

{'output': 'class Person:\n    """\n    Class to represent a person\n    """\n    def __init__(self, name, age, address):\n        self.name = name\n        self.age = age\n        self.address = address\n    \n    def birthday(self):\n        """\n        Increments the age of the person\n        """\n        self.age += 1', 'instruction': 'Design a class for representing a person in Python.', 'input': ''}

Train size: 2000  Test size: 200





# Build Messages and Apply Template

**Chat templates** are formatting functions that turn a list of chat messages into the raw training text.  
We'll build `messages = [{"role": "user", "content": ...}, {"role": "assistant", "content": ...}]`, and then apply a template.



In [None]:

def build_messages(batch):
    # Convert CodeAlpaca item -> messages list
    msgs = []
    user = batch["instruction"]
    if batch.get("input"):
        if isinstance(batch["input"], str) and batch["input"].strip():
            user += "\n\n" + batch["input"]
    msgs.append({"role": "user", "content": user})
    msgs.append({"role": "assistant", "content": batch["output"]})
    return {"messages": msgs}

def apply_template(example, tokenizer, add_generation_prompt=False):
    text = tokenizer.apply_chat_template(
        example["messages"],
        tokenize=False,
        add_generation_prompt=add_generation_prompt,
    )
    return {"text": text}

def preview_with_template(tokenizer, example):
    print("ðŸ”Ž Example with current template:\n")
    print(tokenizer.apply_chat_template(example["messages"], tokenize=False, add_generation_prompt=False)[:800])

# Example single record messages:
example_messages = build_messages(small_train[0])
example_messages


{'messages': [{'role': 'user',
   'content': 'Design a class for representing a person in Python.'},
  {'role': 'assistant',
   'content': 'class Person:\n    """\n    Class to represent a person\n    """\n    def __init__(self, name, age, address):\n        self.name = name\n        self.age = age\n        self.address = address\n    \n    def birthday(self):\n        """\n        Increments the age of the person\n        """\n        self.age += 1'}]}

In [None]:
chosen_template = "alpaca"
print("Selected chat template key:", chosen_template)


Selected chat template key: alpaca



---
## **Full Finetuning** a *small model* (**SmolLM2â€‘135M Instruct**)

We load the small model with `full_finetuning=True`, **not** in 4â€‘bit (so we can update *all* weights).  


In [None]:
max_seq_length = 2048
dtype = torch.bfloat16 if is_bfloat16_supported() else torch.float16

smollm2_id = "unsloth/SmolLM2-135M-Instruct"

smol_model, smol_tokenizer = FastLanguageModel.from_pretrained(
    model_name       = smollm2_id,
    max_seq_length   = max_seq_length,
    load_in_4bit     = False,
    full_finetuning  = True,
    dtype            = dtype,
)

# Attach the chosen chat template
smol_tokenizer = get_chat_template(smol_tokenizer, chat_template=chosen_template)

# Prepare dataset text with the chosen template
train_msgs = small_train.map(build_messages)
test_msgs  = small_test.map(build_messages)

train_text = train_msgs.map(lambda ex: apply_template(ex, smol_tokenizer), remove_columns=train_msgs.column_names)
test_text  = test_msgs.map(lambda ex: apply_template(ex, smol_tokenizer), remove_columns=test_msgs.column_names)

print(train_text[0]["text"][:500])


==((====))==  Unsloth 2025.11.1: Fast Llama patching. Transformers: 4.57.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Float16 full finetuning uses more memory since we upcast weights to float32.


Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

<|im_start|>Below are some instructions that describe some tasks. Write responses that appropriately complete each request.

### Instruction:
Design a class for representing a person in Python.

### Response:
class Person:
    """
    Class to represent a person
    """
    def __init__(self, name, age, address):
        self.name = name
        self.age = age
        self.address = address
    
    def birthday(self):
        """
        Increments the age of the person
        """
        self


# Traning Args

In [None]:

out_dir = "outputs/smollm2_full"

train_args = TrainingArguments(
    output_dir=out_dir,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    logging_steps=20,
    save_strategy="epoch",
    bf16= (dtype==torch.bfloat16),
    fp16= (dtype==torch.float16),
    optim="adamw_torch",
    gradient_checkpointing=True,
    report_to=[],
)

smol_trainer = SFTTrainer(
    model=smol_model,
    tokenizer=smol_tokenizer,
    args=train_args,
    train_dataset=train_text,
    eval_dataset=test_text,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    packing=False,
)



Unsloth: Tokenizing ["text"] (num_proc=12):   0%|          | 0/2000 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=12):   0%|          | 0/200 [00:00<?, ? examples/s]

# Train

In [None]:
train_result = smol_trainer.train()
train_result


The model is already on multiple devices. Skipping the move to device specified in `args`.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 2,000 | Num Epochs = 1 | Total steps = 250
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 134,515,584 of 134,515,584 (100.00% trained)


Step,Training Loss
20,1.4608
40,0.8928
60,0.8678
80,0.7654
100,0.784
120,0.736
140,0.7906
160,0.75
180,0.7639
200,0.7838


TrainOutput(global_step=250, training_loss=0.8386750335693359, metrics={'train_runtime': 234.762, 'train_samples_per_second': 8.519, 'train_steps_per_second': 1.065, 'total_flos': 194728284714240.0, 'train_loss': 0.8386750335693359, 'epoch': 1.0})

# Save Locally

In [None]:

smol_trainer.save_model(out_dir)
smol_tokenizer.save_pretrained(out_dir)
print("Saved to:", out_dir)

Saved to: outputs/smollm2_full


# Inference Setup and Sample Chat

In [None]:
from transformers import TextStreamer
import torch

FastLanguageModel.for_inference(smol_model)
smol_model.eval()
smol_model.config.use_cache = False

def chat_once(prompt, temperature=0.7, top_p=0.9, max_new_tokens=256):
    msgs = [{"role":"user","content":prompt}]
    prompt_text = smol_tokenizer.apply_chat_template(
        msgs, tokenize=False, add_generation_prompt=True
    )
    inputs = smol_tokenizer([prompt_text], return_tensors="pt").to(smol_model.device)
    streamer = TextStreamer(smol_tokenizer, skip_prompt=True, skip_special_tokens=True)

    with torch.inference_mode():
        _ = smol_model.generate(
            **inputs,
            streamer=streamer,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=temperature,
            top_p=top_p,
            use_cache=False,
        )

print("\nTry asking a coding question:")
chat_once("Write a Python function two_sum(nums, target) that returns indices of the two numbers adding to target. Add a short docstring and 2 examples.")



Try asking a coding question:
def two_sum(nums, target):
    """
    Two_Sum

    Args:
        nums (list): List of numbers
        target (int): Target number to search
    Returns:
        indices of two numbers that add to target
    """
    # Create a dictionary of numbers with their indices
    num_dict = {}
    for i in range(len(nums)):
        if nums[i] == target:
            num_dict[nums[i]] = i
    # Find the pair of numbers that add to target
    pair_nums = []
    for i in range(len(nums)):
        if nums[i] == target:
            pair_nums.append(i)
    # Sort the pair of numbers
    nums.sort()
    for i in pair_nums:
        if nums[i] == target:
            return (i, i)
    # No pairs found
    return None
