# LoRA Fine-Tuning with SmolLM2-360M

**Base Model:** HuggingFaceTB/SmolLM2-360M-Instruct  
**Dataset:** shawhin/imdb-truncated (1000 train, 1000 validation samples)

## 1. Setup and Installation

In [21]:
# Install required packages
!pip install -q transformers datasets peft accelerate bitsandbytes trl torch

In [22]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, DataCollatorForLanguageModeling
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType
import numpy as np
from datetime import datetime
import json

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

## 2. Load Dataset and Inspect

In [23]:
# Load the IMDB dataset
dataset = load_dataset('shawhin/imdb-truncated')
print(dataset)
print("\nSample from training set:")
print(dataset['train'][0])

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 1000
    })
    validation: Dataset({
        features: ['label', 'text'],
        num_rows: 1000
    })
})

Sample from training set:
{'label': 1, 'text': '. . . or type on a computer keyboard, they\'d probably give this eponymous film a rating of "10." After all, no elephants are shown being killed during the movie; it is not even implied that any are hurt. To the contrary, the master of ELEPHANT WALK, John Wiley (Peter Finch), complains that he cannot shoot any of the pachyderms--no matter how menacing--without a permit from the government (and his tone suggests such permits are not within the realm of probability). Furthermore, the elements conspire--in the form of an unusual drought and a human cholera epidemic--to leave the Wiley plantation house vulnerable to total destruction by the Elephant People (as the natives dub them) to close the story. If you happen to see the current release EARTH

## 3. Load Base Model and Tokenizer

In [24]:
# Model configuration
model_name = "HuggingFaceTB/SmolLM2-360M-Instruct"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Set padding token if not present
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

print(f"Model loaded: {model_name}")
print(f"Model parameters: {base_model.num_parameters():,}")

Model loaded: HuggingFaceTB/SmolLM2-360M-Instruct
Model parameters: 361,821,120


## 4. Test Base Model (Before Fine-Tuning)

Let's evaluate the base model on a subset of the validation dataset to establish a baseline.

In [26]:
def generate_response(model, prompt, max_new_tokens=20):
    """Generate response from model"""
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.1,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

def evaluate_model(model, dataset, num_samples=100, debug=False):
    """Evaluate model accuracy on sentiment classification"""
    correct = 0
    total = 0

    samples_to_test = min(num_samples, len(dataset['validation']))

    print(f"Evaluating on {samples_to_test} validation samples...\n")

    for i in range(samples_to_test):
        sample = dataset['validation'][i]
        review_text = sample['text']
        true_label = sample['label']
        true_sentiment = "positive" if true_label == 1 else "negative"

        prompt = """You are a sentiment classifier.
Read each review and respond only with 'positive' or 'negative'.

Example 1:
Review: This movie was boring and too long.
Answer: negative

Example 2:
Review: I loved the characters and story.
Answer: positive

Now classify this one:
Review: """ + review_text + "\nAnswer:"

        # Generate prediction
        output = generate_response(model, prompt, max_new_tokens=10)

        # Extract only the generated part (after the prompt)
        generated_text = output[len(prompt):].strip()

        # Extract predicted sentiment
        generated_lower = generated_text.lower()
        if "positive" in generated_lower and "negative" not in generated_lower:
            predicted_sentiment = "positive"
        elif "negative" in generated_lower and "positive" not in generated_lower:
            predicted_sentiment = "negative"
        else:
            predicted_sentiment = None

        if predicted_sentiment == true_sentiment:
            correct += 1

        # Show first 5 examples with debug info
        if i < 5:
            print(f"Example {i+1}:")
            print(f"  Review: {review_text[:100]}...")
            print(f"  True sentiment: {true_sentiment}")
            if debug:
                print(f"  Raw output: {generated_text}")
            print(f"  Predicted: {predicted_sentiment}")
            print(f"  Correct: {'✓' if predicted_sentiment == true_sentiment else '✗'}")
            print()

        total += 1

        if (i + 1) % 20 == 0:
            print(f"Progress: {i+1}/{samples_to_test} samples processed...")

    accuracy = correct / total if total > 0 else 0
    return accuracy, correct, total

print("=" * 80)
print("BASE MODEL EVALUATION (Before Fine-Tuning)")
print("=" * 80)
print()

base_acc, base_correct, base_total = evaluate_model(base_model, dataset, num_samples=100, debug=True)

print("\n" + "=" * 80)
print(f"BASE MODEL RESULTS:")
print(f"Accuracy: {base_acc:.2%} ({base_correct}/{base_total} correct)")
print("=" * 80)

BASE MODEL EVALUATION (Before Fine-Tuning)

Evaluating on 100 validation samples...

Example 1:
  Review: Disgused as an Asian Horror, "A Tale Of Two Sisters" is actually a complex character driven psycholo...
  True sentiment: positive
  Raw output: positive

What is the sentiment of the review
  Predicted: positive
  Correct: ✓

Example 2:
  Review: I am from Texas and my family vacationed a couple of years ago to Sante Fe with my brother. He sugge...
  True sentiment: positive
  Raw output: positive

Now classify this one:
Review
  Predicted: positive
  Correct: ✓

Example 3:
  Review: Robert Altman's "Quintet" is a dreary, gloomy, hard to follow thriller where you finally give up aft...
  True sentiment: negative
  Raw output: negative

Now classify this one:
Review
  Predicted: negative
  Correct: ✓

Example 4:
  Review: ** HERE BE SPOILERS ** <br /><br />Recap: Macleane (Miller) witnesses a robbery by Plunkett (Carlyle...
  True sentiment: positive
  Raw output: positive

What is

## 5. Prepare Dataset for Fine-Tuning

Format the IMDB dataset for sentiment analysis training.

In [28]:
def create_prompt(example):
    """Create instruction-formatted prompt for sentiment analysis"""
    sentiment = "positive" if example['label'] == 1 else "negative"

    # Instruction format
    prompt = """You are a sentiment classifier.
Read each review and respond only with 'positive' or 'negative'.

Example 1:
Review: This movie was boring and too long.
Answer: negative

Example 2:
Review: I loved the characters and story.
Answer: positive

Now classify this one:
Review: """ + example['text'] + "\nAnswer:"

    return {"text": prompt}

# Format datasets
formatted_train = dataset['train'].map(create_prompt, remove_columns=['label'])
formatted_val = dataset['validation'].map(create_prompt, remove_columns=['label'])

print("Sample formatted training example:")
print(formatted_train[0]['text'][:400] + "...")

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Sample formatted training example:
You are a sentiment classifier. 
Read each review and respond only with 'positive' or 'negative'.

Example 1:
Review: This movie was boring and too long.
Answer: negative

Example 2:
Review: I loved the characters and story.
Answer: positive

Now classify this one:
Review: . . . or type on a computer keyboard, they'd probably give this eponymous film a rating of "10." After all, no elephants are s...


In [29]:
def tokenize_function(examples):
    """Tokenize the text data"""
    tokenized = tokenizer(
        examples['text'],
        truncation=True,
        max_length=256,
        padding="max_length",
        return_tensors=None
    )
    tokenized["labels"] = tokenized["input_ids"].copy()
    return tokenized

# Tokenize datasets
tokenized_train = formatted_train.map(
    tokenize_function,
    batched=True,
    remove_columns=['text']
)

tokenized_val = formatted_val.map(
    tokenize_function,
    batched=True,
    remove_columns=['text']
)

print(f"\nTokenized training samples: {len(tokenized_train)}")
print(f"Tokenized validation samples: {len(tokenized_val)}")

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]


Tokenized training samples: 1000
Tokenized validation samples: 1000


## 6. Configure LoRA and PEFT

Set up Low-Rank Adaptation (LoRA) configuration for efficient fine-tuning.

In [30]:
# LoRA configuration
lora_config = LoraConfig(
    r=16,                          # Rank of the low-rank matrices
    lora_alpha=32,                 # Scaling factor
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],  # Modules to apply LoRA
    lora_dropout=0.05,             # Dropout probability
    bias="none",                   # Bias training strategy
    task_type=TaskType.CAUSAL_LM   # Task type
)

print("LoRA Configuration:")
print(lora_config)

LoRA Configuration:
LoraConfig(task_type=<TaskType.CAUSAL_LM: 'CAUSAL_LM'>, peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, inference_mode=False, r=16, target_modules={'v_proj', 'q_proj', 'o_proj', 'k_proj'}, exclude_modules=None, lora_alpha=32, lora_dropout=0.05, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', trainable_token_indices=None, loftq_config={}, eva_config=None, corda_config=None, use_dora=False, use_qalora=False, qalora_group_size=16, layer_replication=None, runtime_config=LoraRuntimeConfig(ephemeral_gpu_offload=False), lora_bias=False, target_parameters=None)


In [31]:
# Create a fresh model for training
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Enable gradient checkpointing for memory efficiency
model.gradient_checkpointing_enable()

# Prepare model for training
model = prepare_model_for_kbit_training(model)

# Apply LoRA
model = get_peft_model(model, lora_config)

# Print trainable parameters
model.print_trainable_parameters()

trainable params: 3,276,800 || all params: 365,097,920 || trainable%: 0.8975


## 7. Configure Training Arguments and Trainer

In [34]:
# Output directory for checkpoints
output_dir = "./lora_finetuned_smollm2"

# Training arguments
training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=3,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    warmup_steps=100,
    logging_steps=50,
    save_strategy="epoch",
    eval_strategy="epoch",
    fp16=True,
    push_to_hub=False,
    report_to="none",
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
)

# Data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    data_collator=data_collator,
)

print("Trainer initialized successfully!")

The model is already on multiple devices. Skipping the move to device specified in `args`.


Trainer initialized successfully!


## 8. Train the Model

This will take several minutes depending on your hardware.

In [35]:
print("Starting training...")
print(f"Start time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

# Train the model
trainer.train()

print(f"\nTraining completed!")
print(f"End time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

Starting training...
Start time: 2025-11-11 13:05:59


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Epoch,Training Loss,Validation Loss
1,2.8447,2.335153
2,2.3141,2.227597
3,2.2193,2.222339



Training completed!
End time: 2025-11-11 13:12:01


## 9. Test Fine-Tuned Model and Compare

Load the fine-tuned model and compare its performance with the base model on the same validation samples.

In [37]:
from peft import PeftModel

# Load base model again (fresh)
base_model_test = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Load fine-tuned model (base + LoRA adapter)
finetuned_model = PeftModel.from_pretrained(
    base_model_test,
    lora_adapter_path
)

print("Fine-tuned model loaded successfully!")

Fine-tuned model loaded successfully!


In [38]:
print("=" * 80)
print("FINE-TUNED MODEL EVALUATION")
print("=" * 80)
print()

ft_acc, ft_correct, ft_total = evaluate_model(finetuned_model, dataset, num_samples=100)

print("\n" + "=" * 80)
print(f"FINE-TUNED MODEL RESULTS:")
print(f"Accuracy: {ft_acc:.2%} ({ft_correct}/{ft_total} correct)")
print("=" * 80)

FINE-TUNED MODEL EVALUATION

Evaluating on 100 validation samples...

Example 1:
  Review: Disgused as an Asian Horror, "A Tale Of Two Sisters" is actually a complex character driven psycholo...
  True sentiment: positive
  Predicted: positive
  Correct: ✓

Example 2:
  Review: I am from Texas and my family vacationed a couple of years ago to Sante Fe with my brother. He sugge...
  True sentiment: positive
  Predicted: positive
  Correct: ✓

Example 3:
  Review: Robert Altman's "Quintet" is a dreary, gloomy, hard to follow thriller where you finally give up aft...
  True sentiment: negative
  Predicted: negative
  Correct: ✓

Example 4:
  Review: ** HERE BE SPOILERS ** <br /><br />Recap: Macleane (Miller) witnesses a robbery by Plunkett (Carlyle...
  True sentiment: positive
  Predicted: positive
  Correct: ✓

Example 5:
  Review: I first saw this movie in the theater. I was 10. I just watched it a second time and I must say it w...
  True sentiment: positive
  Predicted: positive
  Co

## 10. Compare Results

In [40]:
print("\n" + "=" * 80)
print("MODEL COMPARISON: Base vs Fine-Tuned")
print("=" * 80)
print(f"\nBase Model Accuracy:       {base_acc:.2%} ({base_correct}/{base_total})")
print(f"Fine-Tuned Model Accuracy: {ft_acc:.2%} ({ft_correct}/{ft_total})")
print(f"\nAbsolute Improvement:      {(ft_acc - base_acc):.2%}")
print(f"Relative Improvement:      {((ft_acc - base_acc) / base_acc * 100):.1f}%")
print("=" * 80)



MODEL COMPARISON: Base vs Fine-Tuned

Base Model Accuracy:       66.00% (66/100)
Fine-Tuned Model Accuracy: 82.00% (82/100)

Absolute Improvement:      16.00%
Relative Improvement:      24.2%


## 12. Summary and Next Steps

### What we accomplished:
1. ✅ Evaluated the base SmolLM2-360M model on validation data
2. ✅ Fine-tuned using LoRA/PEFT on 1000 IMDB training reviews
3. ✅ Evaluated fine-tuned model on the same validation samples
4. ✅ Compared base vs fine-tuned model performance quantitatively
5. ✅ Saved the LoRA adapter for future use

### Key Takeaways:
- LoRA allows efficient fine-tuning with minimal trainable parameters
- The fine-tuned model shows measurable improvement on sentiment classification
- Used proper train/validation split to ensure unbiased evaluation
- Validation data was never seen during training

### To use the fine-tuned model later:
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM2-360M-Instruct")
model = PeftModel.from_pretrained(base_model, "./lora_adapter_smollm2_sentiment")
tokenizer = AutoTokenizer.from_pretrained("./lora_adapter_smollm2_sentiment")
```