# Colab 1: Full Fine-tuning with SmolLM2-135M

## Overview
This notebook demonstrates **full fine-tuning** (not LoRA) using Unsloth with the SmolLM2-135M model.

### What is Full Fine-tuning?
- Updates **ALL** model parameters (not just adapters)
- More memory intensive than LoRA
- Better for small models like SmolLM2-135M (135 million parameters)

### Model: SmolLM2-135M
- Smallest model in the SmolLM2 family
- Perfect for learning and quick experiments
- Fast training on limited hardware

### Task: Chat/Instruction Following
We'll use a small subset of instruction-following data to teach the model conversational abilities.

In [1]:
# Install Unsloth
%%capture
!pip install unsloth
# Also get the latest Unsloth from source
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

In [2]:
# Import required libraries
from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments

ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.
ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!


## Step 1: Load the SmolLM2-135M Model

Key parameters:
- `max_seq_length`: Maximum sequence length (2048 tokens)
- `dtype`: None = auto detect (will use float16/bfloat16)
- `load_in_4bit`: Use 4-bit quantization to save memory

In [10]:
# Model configuration
max_seq_length = 2048
dtype = None  # Auto-detect
load_in_4bit = False  # Set to False for FULL fine-tuning

# Load SmolLM2-135M model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/SmolLM2-135M-Instruct",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

print(f"Model loaded: {model.__class__.__name__}")
print(f"Tokenizer vocab size: {len(tokenizer)}")

==((====))==  Unsloth 2025.11.2: Fast Llama patching. Transformers: 4.57.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Model loaded: LlamaForCausalLM
Tokenizer vocab size: 49153


## Step 2: Configure for Full Fine-tuning

**IMPORTANT**: Set `use_gradient_checkpointing="unsloth"` and DO NOT call `get_peft_model()`

For full fine-tuning, we skip LoRA and directly train all parameters.

In [11]:
# For full fine-tuning, we don't use LoRA, so the model is already ready.
# We ensure gradient checkpointing is enabled for memory efficiency.
model.config.use_cache = False # Disable cache for gradient checkpointing
model.gradient_checkpointing_enable() # Enable gradient checkpointing

print("Model prepared for FULL fine-tuning (not LoRA)")
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

Model prepared for FULL fine-tuning (not LoRA)
Trainable parameters: 106,203,456


## Step 3: Prepare Dataset

We'll use a small instruction dataset. The format is:
```
<|im_start|>user
{instruction}<|im_end|>
<|im_start|>assistant
{response}<|im_end|>
```

In [7]:
# Load a small instruction dataset (using only 100 samples for quick training)
dataset = load_dataset("yahma/alpaca-cleaned", split="train[:100]")

print(f"Dataset size: {len(dataset)} examples")
print("\nSample data:")
print(dataset[0])

README.md: 0.00B [00:00, ?B/s]

alpaca_data_cleaned.json:   0%|          | 0.00/44.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/51760 [00:00<?, ? examples/s]

Dataset size: 100 examples

Sample data:
{'output': '1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.', 'input': '', 'instruction': 'Give three tips for staying healthy.'}


In [8]:
# Chat template for SmolLM2
chat_template = """<|im_start|>user
{}<|im_end|>
<|im_start|>assistant
{}<|im_end|>"""

EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
    """Format dataset examples into chat format"""
    instructions = examples["instruction"]
    inputs = examples["input"]
    outputs = examples["output"]
    texts = []

    for instruction, input_text, output in zip(instructions, inputs, outputs):
        # Combine instruction and input
        user_message = instruction
        if input_text:
            user_message += f"\n\n{input_text}"

        # Format as chat
        text = chat_template.format(user_message, output) + EOS_TOKEN
        texts.append(text)

    return {"text": texts}

# Apply formatting
dataset = dataset.map(formatting_prompts_func, batched=True)

print("\nFormatted example:")
print(dataset[0]["text"][:500])

Map:   0%|          | 0/100 [00:00<?, ? examples/s]


Formatted example:
<|im_start|>user
Give three tips for staying healthy.<|im_end|>
<|im_start|>assistant
1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.

2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim 


## Step 4: Configure Training

Using minimal steps for quick demonstration:
- 10 max steps (increase for real training)
- Batch size of 2
- Gradient accumulation: 4 (effective batch size = 8)

In [12]:
# Training arguments optimized for speed
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=10,  # Small number for quick demo
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

print("Trainer configured for FULL fine-tuning")

Unsloth: We found double BOS tokens - we shall remove one automatically.
Trainer configured for FULL fine-tuning


## Step 5: Train the Model

This will perform full fine-tuning, updating all parameters in the model.

In [None]:
# Start training
trainer_stats = trainer.train()

print("\n" + "="*50)
print("Training completed!")
print("="*50)
print(f"Training loss: {trainer_stats.training_loss:.4f}")
print(f"Training steps: {trainer_stats.global_step}")

## Step 6: Test the Fine-tuned Model

Let's test our fine-tuned model with some sample questions.

In [14]:
# Enable inference mode
FastLanguageModel.for_inference(model)

# Test prompts
test_prompts = [
    "What is the capital of France?",
    "Write a haiku about coding.",
    "Explain what machine learning is in simple terms."
]

print("Testing fine-tuned model:\n")
print("="*60)

for prompt in test_prompts:
    # Format with chat template
    formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"

    # Tokenize
    inputs = tokenizer(formatted_prompt, return_tensors="pt").to("cuda")

    # Generate
    outputs = model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

    # Decode
    response = tokenizer.decode(outputs[0], skip_special_tokens=False)

    print(f"\nPrompt: {prompt}")
    print(f"Response: {response.split('<|im_start|>assistant')[-1].split('<|im_end|>')[0].strip()}")
    print("-"*60)

Testing fine-tuned model:


Prompt: What is the capital of France?
Response: The capital of France is Paris. It is located in the northern part of the country and is known for its cultural, historical, and artistic significance. Paris is famous for its iconic landmarks such as the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. It is also the seat of the French government and is the economic and financial center of France.
------------------------------------------------------------

Prompt: Write a haiku about coding.
Response: In the heart of a coding world, a thread is spun,
A spark ignites, and the flame burns bright,
The code is made, the mind is shaped, the soul is found,
Yet, the challenge remains, a mystery, waiting to be unraveled.
------------------------------------------------------------

Prompt: Explain what machine learning is in simple terms.
Response: Imagine you have a big box full of toys, and you want to find a toy that fits all your needs. You've got toy

## Step 7: Save the Model

Save the fully fine-tuned model for later use.

In [15]:
# Save model locally
model.save_pretrained("smollm2_135m_finetuned")
tokenizer.save_pretrained("smollm2_135m_finetuned")

print("Model saved to: smollm2_135m_finetuned/")

# Optional: Save to Hugging Face Hub
# model.push_to_hub("your_username/smollm2-135m-finetuned", token="your_token")
# tokenizer.push_to_hub("your_username/smollm2-135m-finetuned", token="your_token")

Model saved to: smollm2_135m_finetuned/


## Summary

### What we did:
1. âœ… Loaded SmolLM2-135M (smallest model, 135M parameters)
2. âœ… Configured for **FULL fine-tuning** (not LoRA)
3. âœ… Prepared instruction-following dataset with chat template
4. âœ… Trained the model (all parameters updated)
5. âœ… Tested inference
6. âœ… Saved the model

### Key Differences from LoRA:
- **Full fine-tuning**: Updates ALL 135M parameters
- **LoRA**: Would only update ~1-2M adapter parameters
- **Memory**: Full fine-tuning uses more memory
- **Quality**: Full fine-tuning can be better for small models

### Chat Template Used:
```
<|im_start|>user
{instruction}<|im_end|>
<|im_start|>assistant
{response}<|im_end|>
```

### Dataset:
- Alpaca cleaned dataset
- 100 samples for quick demo
- Format: instruction â†’ input â†’ output

### Next Steps:
- Increase `max_steps` for better results (e.g., 100-500 steps)
- Use full dataset instead of 100 samples
- Experiment with different learning rates
- Export to GGUF or Ollama for deployment