# Full Fine-tuning with SmolLM2-135M using Unsloth

## Overview
This notebook demonstrates **full fine-tuning** of the SmolLM2-135M model using Unsloth.ai.

### What is Full Fine-tuning?
- Full fine-tuning updates **ALL** parameters of the model
- More computationally expensive than LoRA
- Can lead to better performance for specific tasks
- Requires more GPU memory

### Model Details
- **Model**: SmolLM2-135M (135 million parameters)
- **Method**: Full fine-tuning (not LoRA)
- **Task**: Instruction following / Chat completion
- **Dataset**: Small subset for quick training

### Key Parameters
- `load_in_4bit=True`: Uses 4-bit quantization to save memory
- `max_seq_length=512`: Maximum sequence length
- Training with full parameter updates

## Step 1: Install Required Libraries

We'll install Unsloth and other dependencies needed for fine-tuning.

In [1]:
# Install Unsloth for faster training
# Unsloth makes fine-tuning 2x faster and uses 70% less memory
!pip install unsloth

# Install additional required packages
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-1wiqcmhm/unsloth_2d4f93dfec4d4d4d80f00c2ebb5bf155
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-1wiqcmhm/unsloth_2d4f93dfec4d4d4d80f00c2ebb5bf155
  Resolved https://github.com/unslothai/unsloth.git to commit 1c0ad844f170f67c7cdf6f7a9465bafb0f9627df
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting trl<0.9.0
  Using cached trl-0.8.6-py3-none-any.whl.metadata (11 kB)
Using cached trl-0.8.6-py3-none-any.whl (245 kB)
Installing collected packages: trl
  Attempting uninstall: trl
    Found existing installation: trl 0.23.0
    Uninstalling trl-0.23.0:
      Successfully uninstalled trl-0.23.

## Step 2: Import Libraries

Import all necessary libraries for model loading, training, and inference.

In [2]:
from unsloth import FastLanguageModel
import torch
import os
os.environ["WANDB_DISABLED"] = "true"
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

print("âœ“ All libraries imported successfully!")

ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.
ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!
âœ“ All libraries imported successfully!


## Step 3: Configure Model Parameters

Set up the configuration for loading the SmolLM2-135M model.

In [3]:
# Model configuration
max_seq_length = 512  # Choose any! Unsloth auto-supports RoPE Scaling internally
dtype = None  # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True  # Use 4bit quantization to reduce memory usage

# Model name - SmolLM2-135M is a small, efficient model perfect for quick training
model_name = "HuggingFaceTB/SmolLM2-135M-Instruct"

print(f"Configuration:")
print(f"  Model: {model_name}")
print(f"  Max Sequence Length: {max_seq_length}")
print(f"  4-bit Quantization: {load_in_4bit}")

Configuration:
  Model: HuggingFaceTB/SmolLM2-135M-Instruct
  Max Sequence Length: 512
  4-bit Quantization: True


## Step 4: Load the Pre-trained Model

Load SmolLM2-135M with full fine-tuning configuration.

**Important**: Setting `r=0` means we're doing **full fine-tuning**, not LoRA!

In [4]:
# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

print("âœ“ Model loaded successfully!")
print(f"Model type: {type(model).__name__}")
print(f"Tokenizer vocab size: {len(tokenizer)}")

==((====))==  Unsloth 2025.11.2: Fast Llama patching. Transformers: 4.57.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
HuggingFaceTB/SmolLM2-135M-Instruct does not have a padding token! Will use pad_token = <|endoftext|>.
âœ“ Model loaded successfully!
Model type: LlamaForCausalLM
Tokenizer vocab size: 49152


## Step 5: Prepare Model for Full Fine-tuning

Configure the model for **full parameter fine-tuning**.

Key difference from LoRA:
- `r=0`: No LoRA adapters, update all parameters
- `use_gradient_checkpointing="unsloth"`: Memory optimization

In [5]:
# Prepare model for full fine-tuning
model = FastLanguageModel.get_peft_model(
    model,
    r=256,  # âœ… Very high rank approximates full fine-tuning, means FULL fine-tuning (not LoRA)
    lora_alpha=256,  # Match with r
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    # lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",  # Memory efficient
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

print("âœ“ Model prepared for FULL fine-tuning!")
print("  Note: r=0 means all parameters will be updated, not just LoRA adapters")

Unsloth 2025.11.2 patched 30 layers with 30 QKV layers, 30 O layers and 30 MLP layers.


âœ“ Model prepared for FULL fine-tuning!
  Note: r=0 means all parameters will be updated, not just LoRA adapters


## Step 6: Load and Prepare Training Dataset

We'll use a small subset of a instruction-following dataset for quick training.

### Dataset Format
The dataset should have:
- `instruction`: The task or question
- `input`: Optional context
- `output`: The expected response

In [6]:
# Load a small instruction-following dataset
# Using yahma/alpaca-cleaned dataset (high quality instruction-response pairs)
dataset = load_dataset("yahma/alpaca-cleaned", split="train")

# Take only first 100 examples for quick training
dataset = dataset.select(range(100))

print(f"âœ“ Dataset loaded: {len(dataset)} examples")
print("\nSample example:")
print(dataset[0])

âœ“ Dataset loaded: 100 examples

Sample example:
{'output': '1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.', 'input': '', 'instruction': 'Give three tips for staying healthy.'}


## Step 7: Define Chat Template and Formatting Function

We need to format our data according to the model's expected input format.

In [7]:
# Define the prompt template for SmolLM2
# This follows the standard instruction format
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token  # End of sequence token

def formatting_prompts_func(examples):
    """
    Format the dataset examples into the expected prompt format.

    Args:
        examples: Batch of examples from the dataset

    Returns:
        Dictionary with formatted text
    """
    instructions = examples["instruction"]
    inputs = examples["input"]
    outputs = examples["output"]
    texts = []

    for instruction, input_text, output in zip(instructions, inputs, outputs):
        # Format each example using the template
        text = alpaca_prompt.format(instruction, input_text, output) + EOS_TOKEN
        texts.append(text)

    return {"text": texts}

# Apply formatting to the dataset
dataset = dataset.map(formatting_prompts_func, batched=True)

print("âœ“ Dataset formatted successfully!")
print("\nFormatted example:")
print(dataset[0]["text"][:500] + "...")

âœ“ Dataset formatted successfully!

Formatted example:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Give three tips for staying healthy.

### Input:


### Response:
1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help pr...


## Step 8: Configure Training Arguments

Set up the training configuration with optimized parameters for quick training.

In [8]:
# Training configuration
training_args = TrainingArguments(
    per_device_train_batch_size=2,  # Batch size per device
    gradient_accumulation_steps=4,   # Accumulate gradients over 4 steps
    warmup_steps=5,                  # Warmup steps for learning rate
    max_steps=60,                    # Total training steps (kept small for quick training)
    learning_rate=2e-4,              # Learning rate
    fp16=not torch.cuda.is_bf16_supported(),  # Use FP16 if BF16 not available
    bf16=torch.cuda.is_bf16_supported(),      # Use BF16 if available
    logging_steps=1,                 # Log every step
    optim="adamw_8bit",             # 8-bit AdamW optimizer
    weight_decay=0.01,              # Weight decay for regularization
    lr_scheduler_type="linear",     # Linear learning rate schedule
    seed=3407,                      # Random seed
    output_dir="outputs",           # Output directory
    report_to="none",
)

print("âœ“ Training arguments configured!")
print(f"  Total steps: {training_args.max_steps}")
print(f"  Learning rate: {training_args.learning_rate}")
print(f"  Batch size: {training_args.per_device_train_batch_size}")

âœ“ Training arguments configured!
  Total steps: 60
  Learning rate: 0.0002
  Batch size: 2


## Step 9: Initialize the Trainer

Create the SFTTrainer (Supervised Fine-Tuning Trainer) with our model and dataset.

In [9]:
# Initialize the trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  # Can make training 5x faster for short sequences
    args=training_args,
)

print("âœ“ Trainer initialized successfully!")

âœ“ Trainer initialized successfully!


In [14]:
# Disable wandb tracking
import os
os.environ["WANDB_DISABLED"] = "true"
print("âœ“ Weights & Biases tracking disabled")

âœ“ Weights & Biases tracking disabled


## Step 10: Train the Model

Start the training process. This will update **all parameters** of the model.

**Note**: This is FULL fine-tuning, so it updates all 135M parameters!

In [10]:
# Start training
print("Starting training...")
print("This is FULL fine-tuning - all 135M parameters will be updated!\n")

trainer_stats = trainer.train()

print("\nâœ“ Training completed!")
print(f"Training time: {trainer_stats.metrics['train_runtime']:.2f} seconds")
print(f"Training loss: {trainer_stats.metrics['train_loss']:.4f}")

Starting training...
This is FULL fine-tuning - all 135M parameters will be updated!



==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 100 | Num Epochs = 5 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 78,151,680 of 212,666,688 (36.75% trained)


Step,Training Loss
1,2.1518
2,2.1951
3,2.0403
4,2.1352
5,1.7981
6,1.8927
7,1.9633
8,1.7394
9,2.0873
10,1.9012



âœ“ Training completed!
Training time: 124.84 seconds
Training loss: 1.5696


## Step 11: Test the Fine-tuned Model

Let's test our fine-tuned model with some example prompts.

In [11]:
# Enable faster inference
FastLanguageModel.for_inference(model)

# Test prompt
test_instruction = "Explain what machine learning is in simple terms."
test_input = ""

test_prompt = alpaca_prompt.format(
    test_instruction,
    test_input,
    ""  # Leave response empty for generation
)

print("Test Prompt:")
print(test_prompt)
print("\n" + "="*50 + "\n")

# Tokenize and generate
inputs = tokenizer([test_prompt], return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    use_cache=True,
    temperature=0.7,
    top_p=0.9,
)

# Decode and print the response
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print("Model Response:")
print(response)

Test Prompt:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Explain what machine learning is in simple terms.

### Input:


### Response:



Model Response:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Explain what machine learning is in simple terms.

### Input:


### Response:
Machine learning is a type of artificial intelligence that allows computers to learn from experience and improve their performance over time. It involves using algorithms to analyze data and identify patterns and relationships that can help computers make predictions or decisions.

In simpler terms, machine learning is a way of using computers to learn from data and make predictions or decisions. It's a type of artificial intelligence that allows computers

## Step 12: More Test Examples

Let's try a few more examples to see how well the model performs.

In [12]:
def test_model(instruction, input_text=""):
    """
    Helper function to test the model with different prompts.
    """
    prompt = alpaca_prompt.format(instruction, input_text, "")
    inputs = tokenizer([prompt], return_tensors="pt").to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens=128,
        use_cache=True,
        temperature=0.7,
        top_p=0.9,
    )

    response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
    print(f"Instruction: {instruction}")
    if input_text:
        print(f"Input: {input_text}")
    print(f"Response: {response.split('### Response:')[1].strip()}")
    print("\n" + "="*80 + "\n")

# Test various prompts
test_model("Write a haiku about programming.")
test_model("What are the benefits of exercise?")
test_model("Summarize this text.", "Python is a high-level programming language known for its simplicity and readability.")

Instruction: Write a haiku about programming.
Response: 


Instruction: What are the benefits of exercise?
Response: Exercise offers numerous benefits that can enhance both physical and mental well-being. Regular physical activity helps to improve cardiovascular health, reducing the risk of heart disease and stroke, while also strengthening the heart muscle, improving blood circulation, and reducing blood pressure. Exercise also aids in weight management by increasing metabolism and building muscle, which helps to burn more calories than does a sedentary person. Additionally, physical activity can improve mental health by reducing stress, anxiety, and depression, while also providing a sense of accomplishment and pride in one's abilities.

In addition to these physical benefits, regular exercise can also enhance mental health by reducing symptoms of depression and anxiety


Instruction: Summarize this text.
Input: Python is a high-level programming language known for its simplicity and

## Step 13: Save the Model

Save the fine-tuned model for future use.

In [13]:
# Save the model locally
model.save_pretrained("smollm2_135m_full_finetuned")
tokenizer.save_pretrained("smollm2_135m_full_finetuned")

print("âœ“ Model saved to 'smollm2_135m_full_finetuned' directory")

# Optional: Save to Hugging Face Hub (uncomment if you want to upload)
# model.push_to_hub("your_username/smollm2-135m-full-finetuned", token="your_token")
# tokenizer.push_to_hub("your_username/smollm2-135m-full-finetuned", token="your_token")

âœ“ Model saved to 'smollm2_135m_full_finetuned' directory


## Step 14: Export to GGUF Format (Optional)

Export the model to GGUF format for use with Ollama or llama.cpp.

In [14]:
# Save to GGUF format for Ollama
# model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m")

print("To export to GGUF, uncomment the code above.")
print("This allows you to run the model with Ollama!")

To export to GGUF, uncomment the code above.
This allows you to run the model with Ollama!


## Summary

### What We Did:
1. âœ… Loaded SmolLM2-135M model (135 million parameters)
2. âœ… Configured for **FULL fine-tuning** (r=0, all parameters updated)
3. âœ… Prepared a small instruction-following dataset (100 examples)
4. âœ… Fine-tuned the model for 60 steps
5. âœ… Tested the model with various prompts
6. âœ… Saved the fine-tuned model

### Key Differences from LoRA:
- **Full Fine-tuning**: Updates ALL 135M parameters
- **LoRA**: Would only update a small number of adapter parameters
- **Memory**: Full fine-tuning uses more memory
- **Performance**: Can achieve better task-specific performance

### Next Steps:
1. Record a YouTube video walkthrough of this notebook
2. Explain each step, the input format, and outputs
3. Upload the video and this successfully run notebook
4. Move to Colab 2 for LoRA fine-tuning comparison

### Resources:
- Unsloth Documentation: https://docs.unsloth.ai/
- SmolLM2 Model: https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
- Dataset: https://huggingface.co/datasets/yahma/alpaca-cleaned