# LoRA Fine-Tuning with SmolLM2-360M

**Base Model:** HuggingFaceTB/SmolLM2-360M-Instruct  
**Dataset:** shawhin/imdb-truncated (1000 train, 1000 validation samples)

In [1]:
from google.colab import files
uploaded = files.upload()


Saving lora_finetuning.py to lora_finetuning (1).py


## 1. Setup and Installation

In [2]:
# Install required packages
!pip install -q transformers datasets peft accelerate bitsandbytes trl torch

In [3]:
# Import from our lora_finetuning module
from lora_finetuning import (
    setup_lora_model,
    prepare_dataset,
    train_lora_model,
    evaluate_model,
    generate_response,
    load_finetuned_model
)

import torch
import numpy as np
from datasets import load_dataset

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

## 2. Load Dataset and Inspect

In [4]:
# Load the IMDB dataset
dataset = load_dataset('shawhin/imdb-truncated')
print(dataset)
print("\nSample from training set:")
print(dataset['train'][0])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/592 [00:00<?, ?B/s]

data/train-00000-of-00001-5a744bf76a1d84(…):   0%|          | 0.00/836k [00:00<?, ?B/s]

data/validation-00000-of-00001-a3a52fabb(…):   0%|          | 0.00/853k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1000 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 1000
    })
    validation: Dataset({
        features: ['label', 'text'],
        num_rows: 1000
    })
})

Sample from training set:
{'label': 1, 'text': '. . . or type on a computer keyboard, they\'d probably give this eponymous film a rating of "10." After all, no elephants are shown being killed during the movie; it is not even implied that any are hurt. To the contrary, the master of ELEPHANT WALK, John Wiley (Peter Finch), complains that he cannot shoot any of the pachyderms--no matter how menacing--without a permit from the government (and his tone suggests such permits are not within the realm of probability). Furthermore, the elements conspire--in the form of an unusual drought and a human cholera epidemic--to leave the Wiley plantation house vulnerable to total destruction by the Elephant People (as the natives dub them) to close the story. If you happen to see the current release EARTH

## 3. Load Base Model and Tokenizer

In [5]:
# Model configuration
model_name = "HuggingFaceTB/SmolLM2-360M-Instruct"

# Import necessary classes for base model loading
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Set padding token if not present
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

# Load base model (without LoRA for initial testing)
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

print(f"Model loaded: {model_name}")
print(f"Model parameters: {base_model.num_parameters():,}")

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/655 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/846 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/724M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

Model loaded: HuggingFaceTB/SmolLM2-360M-Instruct
Model parameters: 361,821,120


## 4. Test Base Model (Before Fine-Tuning)

Let's evaluate the base model on a subset of the validation dataset to establish a baseline.

In [6]:
# Using imported functions from lora_finetuning module
# - generate_response()
# - evaluate_model()

print("=" * 80)
print("BASE MODEL EVALUATION (Before Fine-Tuning)")
print("=" * 80)
print()

base_acc, base_correct, base_total = evaluate_model(base_model, tokenizer, dataset, num_samples=100, debug=True)

print("\n" + "=" * 80)
print(f"BASE MODEL RESULTS:")
print(f"Accuracy: {base_acc:.2%} ({base_correct}/{base_total} correct)")
print("=" * 80)

BASE MODEL EVALUATION (Before Fine-Tuning)

Evaluating on 100 validation samples...

Example 1:
  Review: Disgused as an Asian Horror, "A Tale Of Two Sisters" is actually a complex character driven psycholo...
  True sentiment: positive
  Raw output: positive

What is the sentiment of the review
  Predicted: positive
  Correct: ✓

Example 2:
  Review: I am from Texas and my family vacationed a couple of years ago to Sante Fe with my brother. He sugge...
  True sentiment: positive
  Raw output: positive

Now classify this one:
Review
  Predicted: positive
  Correct: ✓

Example 3:
  Review: Robert Altman's "Quintet" is a dreary, gloomy, hard to follow thriller where you finally give up aft...
  True sentiment: negative
  Raw output: negative

Now classify this one:
Review
  Predicted: negative
  Correct: ✓

Example 4:
  Review: ** HERE BE SPOILERS ** <br /><br />Recap: Macleane (Miller) witnesses a robbery by Plunkett (Carlyle...
  True sentiment: positive
  Raw output: positive

What is

## 5. Prepare Dataset for Fine-Tuning

Format the IMDB dataset for sentiment analysis training.

In [7]:
# Using the imported prepare_dataset() function
# This function handles:
# - Creating prompts with create_prompt()
# - Tokenizing with tokenize_function()
# - Returning train, validation, and raw datasets

print("Preparing dataset...")
tokenized_train, tokenized_val, dataset = prepare_dataset(
    dataset_name='shawhin/imdb-truncated',
    tokenizer=tokenizer,
    max_length=256
)

print(f"\nTokenized training samples: {len(tokenized_train)}")
print(f"Tokenized validation samples: {len(tokenized_val)}")

print("\nSample from formatted dataset:")
print("(The raw text has been converted to instruction-formatted prompts)")

Preparing dataset...


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]


Tokenized training samples: 1000
Tokenized validation samples: 1000

Sample from formatted dataset:
(The raw text has been converted to instruction-formatted prompts)


In [8]:
# Dataset preparation is now complete!
# The prepare_dataset() function has already:
# 1. Loaded the dataset
# 2. Created instruction-formatted prompts
# 3. Tokenized the text with max_length=256
# 4. Added labels for training

print("✓ Dataset ready for training")
print(f"  Training samples: {len(tokenized_train)}")
print(f"  Validation samples: {len(tokenized_val)}")

✓ Dataset ready for training
  Training samples: 1000
  Validation samples: 1000


## 6. Configure LoRA and PEFT

Set up Low-Rank Adaptation (LoRA) configuration for efficient fine-tuning.

In [9]:
# Using setup_lora_model() to configure and apply LoRA
# This function handles:
# - Loading the base model
# - Creating LoRA configuration
# - Applying PEFT/LoRA to the model
# - Enabling gradient checkpointing

print("Setting up model with LoRA configuration...")

model, tokenizer, lora_config = setup_lora_model(
    model_name=model_name,
    lora_r=16,                 # Rank of the low-rank matrices
    lora_alpha=32,             # Scaling factor
    lora_dropout=0.05,         # Dropout probability
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"]  # Modules to apply LoRA
)

print("\n✓ Model ready for fine-tuning with LoRA")

Setting up model with LoRA configuration...
LoRA Model Configuration:
LoraConfig(task_type=<TaskType.CAUSAL_LM: 'CAUSAL_LM'>, peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path='HuggingFaceTB/SmolLM2-360M-Instruct', revision=None, inference_mode=False, r=16, target_modules={'q_proj', 'v_proj', 'o_proj', 'k_proj'}, exclude_modules=None, lora_alpha=32, lora_dropout=0.05, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', trainable_token_indices=None, loftq_config={}, eva_config=None, corda_config=None, use_dora=False, use_qalora=False, qalora_group_size=16, layer_replication=None, runtime_config=LoraRuntimeConfig(ephemeral_gpu_offload=False), lora_bias=False, target_parameters=None)

Trainable Parameters:
trainable params: 3,276,800 || all params: 365,097,920 || trainable%: 0.8975


In [10]:
# The setup_lora_model() function has completed all setup steps:
# - Loaded base model with float16 precision
# - Configured LoRA (r=16, alpha=32, dropout=0.05)
# - Applied LoRA to attention projection layers
# - Enabled gradient checkpointing for memory efficiency
# - Prepared model for k-bit training

print("Model configuration complete!")
print(f"Ready to train {model.num_parameters():,} parameters")
print(f"(Only {model.get_nb_trainable_parameters()} are trainable with LoRA)")

Model configuration complete!
Ready to train 365,097,920 parameters
(Only (3276800, 365097920) are trainable with LoRA)


## 7. Configure Training Arguments and Trainer

In [11]:
# Using train_lora_model() to train the model
# This function handles:
# - Setting up training arguments
# - Creating data collator
# - Initializing Trainer
# - Running the training loop

output_dir = "./lora_finetuned_smollm2"

trainer = train_lora_model(
    model=model,
    tokenizer=tokenizer,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    output_dir=output_dir,
    num_epochs=3,
    batch_size=4,
    learning_rate=2e-4,
    gradient_accumulation_steps=4,
    warmup_steps=100,
    logging_steps=50
)

print("\n✓ Training configuration complete, ready to train!")

The model is already on multiple devices. Skipping the move to device specified in `args`.


Starting training...
Start time: 2025-11-11 15:06:56


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Epoch,Training Loss,Validation Loss
1,2.8519,2.345087
2,2.3262,2.240114
3,2.2323,2.234172



Training completed!
End time: 2025-11-11 15:12:55

✓ Training configuration complete, ready to train!


## 8. Train the Model

This will take several minutes depending on your hardware.

In [12]:
# Training has been completed by train_lora_model()!
# The function already called trainer.train() and displayed progress

print("=" * 80)
print("Training Summary:")
print("=" * 80)
print(f"✓ Model trained for 3 epochs")
print(f"✓ Best model saved to: {output_dir}")
print(f"✓ Training logs available in trainer.state")
print("=" * 80)

Training Summary:
✓ Model trained for 3 epochs
✓ Best model saved to: ./lora_finetuned_smollm2
✓ Training logs available in trainer.state


## 9. Test Fine-Tuned Model and Compare

Load the fine-tuned model and compare its performance with the base model on the same validation samples.

In [16]:
lora_adapter_path = "./lora_adapter_smollm2_sentiment"
model.save_pretrained(lora_adapter_path)
tokenizer.save_pretrained(lora_adapter_path)

print("Loading fine-tuned model...")

finetuned_model = load_finetuned_model(
    base_model_name=model_name,
    adapter_path=lora_adapter_path
)

print("Fine-tuned model loaded successfully!")

Loading fine-tuned model...
Fine-tuned model loaded successfully!


In [17]:
# Using evaluate_model() to test the fine-tuned model
print("=" * 80)
print("FINE-TUNED MODEL EVALUATION")
print("=" * 80)
print()

ft_acc, ft_correct, ft_total = evaluate_model(finetuned_model, tokenizer, dataset, num_samples=100)

print("\n" + "=" * 80)
print(f"FINE-TUNED MODEL RESULTS:")
print(f"Accuracy: {ft_acc:.2%} ({ft_correct}/{ft_total} correct)")
print("=" * 80)

FINE-TUNED MODEL EVALUATION

Evaluating on 100 validation samples...

Example 1:
  Review: Disgused as an Asian Horror, "A Tale Of Two Sisters" is actually a complex character driven psycholo...
  True sentiment: positive
  Predicted: positive
  Correct: ✓

Example 2:
  Review: I am from Texas and my family vacationed a couple of years ago to Sante Fe with my brother. He sugge...
  True sentiment: positive
  Predicted: positive
  Correct: ✓

Example 3:
  Review: Robert Altman's "Quintet" is a dreary, gloomy, hard to follow thriller where you finally give up aft...
  True sentiment: negative
  Predicted: negative
  Correct: ✓

Example 4:
  Review: ** HERE BE SPOILERS ** <br /><br />Recap: Macleane (Miller) witnesses a robbery by Plunkett (Carlyle...
  True sentiment: positive
  Predicted: positive
  Correct: ✓

Example 5:
  Review: I first saw this movie in the theater. I was 10. I just watched it a second time and I must say it w...
  True sentiment: positive
  Predicted: positive
  Co

## 10. Compare Results

In [18]:
print("\n" + "=" * 80)
print("MODEL COMPARISON: Base vs Fine-Tuned")
print("=" * 80)
print(f"\nBase Model Accuracy:       {base_acc:.2%} ({base_correct}/{base_total})")
print(f"Fine-Tuned Model Accuracy: {ft_acc:.2%} ({ft_correct}/{ft_total})")
print(f"\nAbsolute Improvement:      {(ft_acc - base_acc):.2%}")
print(f"Relative Improvement:      {((ft_acc - base_acc) / base_acc * 100):.1f}%")
print("=" * 80)



MODEL COMPARISON: Base vs Fine-Tuned

Base Model Accuracy:       64.00% (64/100)
Fine-Tuned Model Accuracy: 81.00% (81/100)

Absolute Improvement:      17.00%
Relative Improvement:      26.6%
