**A comprehensive Python notebook for training a LoRA adapter on a Vision Transformer (ViT) model using PEFT.**

Here's what the notebook covers:
Key Features:

1. <b>LoRA Configuration:</b> Uses rank-16 LoRA adapters on the attention layers (query and value projections), which dramatically reduces trainable parameters

2. <b>Dataset:</b> Uses Food101 as an example (you can easily swap this with your own dataset)

3. <b>Data Augmentation:</b> Includes random cropping and horizontal flipping for training

4. <b>Training:</b> Configured with 3 epochs, evaluation during training, and automatic best model selection

5. <b>Metrics:</b> Computes accuracy and F1 score

6. <b>Inference Example:</b> Shows how to load and use the trained LoRA adapter

Key Parameters to Adjust:

- *r* :--> LoRA rank (higher = more parameters but potentially better performance)
- *lora_alpha* :--> Scaling factor (typically set equal to r)
- *target_modules* :--> Which layers to apply LoRA to
- *learning_rate* :--> Start with 5e-4 for LoRA, adjust as needed
- *Dataset* :--> food101 currently in use

The notebook will print the number of trainable parameters, showing you the efficiency of LoRA (typically only 0.1-1% of total parameters need to be trained). The final model saves only the LoRA adapter weights, which are very small compared to the full model

In [1]:
# LoRA Fine-tuning Vision Transformer (ViT) with PEFT
# This notebook demonstrates parameter-efficient fine-tuning of ViT for image classification

# Install required packages
# !pip install transformers datasets peft pillow torch torchvision accelerate

import torch
from transformers import ViTImageProcessor, ViTForImageClassification, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model, TaskType
from datasets import load_dataset
from torchvision import transforms 

import numpy as np
from sklearn.metrics import accuracy_score, f1_score

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

  from .autonotebook import tqdm as notebook_tqdm


Using device: cpu


In [None]:
# ============================================================================
# 1. Load and Prepare Dataset
# ============================================================================
# Using a sample dataset (food101 subset for demo - you can replace with your own)
print("Loading dataset...")
dataset = load_dataset("food101", split="train[:2%]")  # Using 2% for demo
dataset = dataset.train_test_split(test_size=0.2, seed=42)

# Get label information
labels = dataset["train"].features["label"].names
num_labels = len(labels)
print(f"Number of classes: {num_labels}")
# print(labels[:10])


Loading dataset...
Number of classes: 101


In [3]:
# ============================================================================
# 2. Initialize Model and Processor
# ============================================================================
print("\nInitializing model and processor...")
model_name = "google/vit-base-patch16-224-in21k"
processor = ViTImageProcessor.from_pretrained(model_name)

# Load pre-trained ViT model
model = ViTForImageClassification.from_pretrained(
    model_name,
    num_labels=num_labels,
    id2label={str(i): label for i, label in enumerate(labels)},
    label2id={label: str(i) for i, label in enumerate(labels)},
    ignore_mismatched_sizes=True
)


Initializing model and processor...


Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [4]:
# ============================================================================
# 3. Configure LoRA
# ============================================================================
print("\nConfiguring LoRA...")
lora_config = LoraConfig(
    r=4,  # Rank of the low-rank matrices
    lora_alpha=16,  # Scaling factor
    target_modules=["query", "value"],  # Apply LoRA to attention layers
    lora_dropout=0.1,
    bias="none",
    modules_to_save=["classifier"],  # Train the classification head fully
)

# Apply LoRA to the model
# model.unload() # to be used when re-running
model = get_peft_model(model=model, peft_config=lora_config)
model.print_trainable_parameters()



Configuring LoRA...
trainable params: 225,125 || all params: 86,101,450 || trainable%: 0.2615


In [5]:
dataset

DatasetDict({
    train: Dataset({
        features: ['image', 'label'],
        num_rows: 3030
    })
    test: Dataset({
        features: ['image', 'label'],
        num_rows: 758
    })
})

In [6]:
# ============================================================================
# 4. Data Preprocessing
# ============================================================================
print("\nPreparing data transforms...")

# Define transforms for training
train_transforms = transforms.Compose([
    transforms.RandomResizedCrop(processor.size["height"]),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=processor.image_mean, std=processor.image_std),
])

# Define transforms for validation
val_transforms = transforms.Compose([
    lambda img: img.resize((processor.size["height"], processor.size["width"])),
    transforms.ToTensor(),
    transforms.Normalize(mean=processor.image_mean, std=processor.image_std),
])

def preprocess_train(examples):
    """Preprocess training images"""
    examples["pixel_values"] = [
        train_transforms(img.convert("RGB")) for img in examples["image"]
    ]
    return examples

def preprocess_val(examples):
    """Preprocess validation images"""
    examples["pixel_values"] = [
        val_transforms(img.convert("RGB")) for img in examples["image"]
    ]
    return examples

# Apply preprocessing
train_dataset = dataset["train"].with_transform(preprocess_train)
val_dataset = dataset["test"].with_transform(preprocess_val)


Preparing data transforms...


In [7]:
# ============================================================================
# 5. Define Metrics
# ============================================================================
def compute_metrics(eval_pred):
    """Compute accuracy and F1 score"""
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    
    accuracy = accuracy_score(labels, predictions)
    f1 = f1_score(labels, predictions, average="weighted")
    
    return {
        "accuracy": accuracy,
        "f1": f1
    }

In [8]:
# ============================================================================
# 6. Define Collator
# ============================================================================
def collate_fn(batch):
    """Custom collator to stack images and labels"""
    return {
        "pixel_values": torch.stack([x["pixel_values"] for x in batch]),
        "labels": torch.tensor([x["label"] for x in batch])
    }

In [9]:
# ============================================================================
# 7. Training Configuration
# ============================================================================
print("\nSetting up training configuration...")
training_args = TrainingArguments(
    output_dir="./vit_lora_finetuned",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    learning_rate=5e-3,
    warmup_ratio=0.1,
    logging_steps=10,
    eval_strategy="steps",
    eval_steps=50,
    save_strategy="steps",
    save_steps=50,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    remove_unused_columns=False,
    push_to_hub=False,
    report_to="none",  # Change to "wandb" or "tensorboard" if you want logging
    fp16=torch.cuda.is_available(),  # Use mixed precision if GPU available
)


Setting up training configuration...


In [10]:
# ============================================================================
# 8. Initialize Trainer
# ============================================================================
print("\nInitializing trainer...")
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    processing_class=processor,
    compute_metrics=compute_metrics,
    data_collator=collate_fn,
)


Initializing trainer...


In [11]:
# ============================================================================
# 9. Train the Model
# ============================================================================
print("\nStarting training...")
train_results = trainer.train()

# ============================================================================
# 10. Evaluate the Model
# ============================================================================
print("\nEvaluating model...")
eval_results = trainer.evaluate()
print(f"\nEvaluation Results:")
for key, value in eval_results.items():
    print(f"  {key}: {value:.4f}")


Starting training...




Step,Training Loss,Validation Loss,Accuracy,F1
50,0.2838,0.145584,0.961741,0.958685
100,0.4763,0.29006,0.902375,0.902764
150,0.3578,0.196565,0.935356,0.935058
200,0.262,0.233288,0.922164,0.921699
250,0.3274,0.146911,0.953826,0.953892
300,0.2835,0.204269,0.932718,0.9323
350,0.2302,0.118851,0.969657,0.969292
400,0.2231,0.20628,0.930079,0.928777
450,0.1568,0.134148,0.957784,0.957782
500,0.1776,0.12438,0.959103,0.958899





Evaluating model...





Evaluation Results:
  eval_loss: 0.1189
  eval_accuracy: 0.9697
  eval_f1: 0.9693
  eval_runtime: 68.6757
  eval_samples_per_second: 11.0370
  eval_steps_per_second: 0.6990
  epoch: 3.0000


1. **LoRA Adapter only** ```(./vit_lora_adapter)``` - Small file size, requires base model for inference
2. **Merged Model** ```(./vit_lora_merged)``` - Standalone model with LoRA weights merged into base model, larger file size but easier to deploy

The inference section now demonstrates both approaches:

- Loading the LoRA adapter (requires loading base model first)
- Loading the merged model directly (no base model needed)

The merged model is particularly useful for deployment scenarios where you want a single, self-contained model without managing separate base model and adapter files.

In [12]:
# ============================================================================
# 11. Save the Model
# ============================================================================
print("\nSaving LoRA adapter...")
model.save_pretrained("./vit_lora_adapter")
processor.save_pretrained("./vit_lora_adapter")
print("LoRA adapter saved successfully!")

# Merge LoRA weights with base model and save the composed model
print("\nMerging LoRA weights with base model...")
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./vit_lora_merged")
processor.save_pretrained("./vit_lora_merged")
print("Merged model saved successfully!")


Saving LoRA adapter...
LoRA adapter saved successfully!

Merging LoRA weights with base model...
Merged model saved successfully!


In [13]:
# ============================================================================
# 12. Inference Example
# ============================================================================
print("\n" + "="*50)
print("INFERENCE EXAMPLE")
print("="*50)

# Get a sample image from validation set
sample_image = val_dataset[0]["image"]

# Preprocess
inputs = processor(images=sample_image, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}
#--------------------------------------------------------------------------------------------------

# Option 1: Load merged model (no need for base model)
print("\n--- Using Merged Model ---")
merged_model = ViTForImageClassification.from_pretrained("./vit_lora_merged")
merged_model.eval()
merged_model.to(device)

# Predict with merged model
with torch.no_grad():
    outputs = merged_model(**inputs)
    logits = outputs.logits
    predicted_class_idx = logits.argmax(-1).item()

print(f"Predicted class: {labels[predicted_class_idx]}")
print(f"Actual class: {labels[val_dataset[0]['label']]}")

print("\nâœ… First training complete!")
print("   - LoRA adapter saved to: ./vit_lora_adapter")
print("   - Merged model saved to: ./vit_lora_merged")
#--------------------------------------------------------------------------------------------------

# # Option 2: Load LoRA adapter (smaller file size)
# print("\n--- Using LoRA Adapter ---")
# base_model = ViTForImageClassification.from_pretrained( #load base model
#     model_name,
#     num_labels=num_labels,
#     ignore_mismatched_sizes=True
# )
# inference_model = PeftModel.from_pretrained(base_model, "./vit_lora_adapter") #load lora adapter
# inference_model.eval()
# inference_model.to(device)

# # Predict
# with torch.no_grad():
#     outputs = inference_model(**inputs)
#     logits = outputs.logits
#     predicted_class_idx = logits.argmax(-1).item()

# print(f"Predicted class: {labels[predicted_class_idx]}")
# print(f"Actual class: {labels[val_dataset[0]['label']]}")

# print("\nâœ… Training complete! LoRA adapter has been trained and saved.")


INFERENCE EXAMPLE

--- Using Merged Model ---
Predicted class: hamburger
Actual class: hamburger

âœ… First training complete!
   - LoRA adapter saved to: ./vit_lora_adapter
   - Merged model saved to: ./vit_lora_merged


**Section 13 - Add Noise and Test:**
- Adds Gaussian noise to the validation data (30% noise level)
- Tests the first composed model on noisy data to see how it performs


**Section 14 - Train Second LoRA:**
- Creates a noisy training dataset
- Loads the first composed model as the base
- Applies a second LoRA adapter on top of it
- Trains on the noisy data to adapt to noise


**Section 15 - Save Second Composed Model:**

- Saves the second LoRA adapter
- Merges and saves the second composed model (LoRA 1 + LoRA 2)


**Section 16 - Final Comparison:**

- Compares both models on clean and noisy data
- Shows performance summary to see if the second LoRA improved robustness


This demonstrates iterative LoRA training where you can stack multiple LoRA adapters by training each one on top of the previous composed model. The second LoRA learns to handle noisy data while preserving the knowledge from the first training phase.

In [16]:
# ============================================================================
# 13. Add Noise to Data and Test Composed Model
# ============================================================================
print("\n" + "="*50)
print("TESTING WITH NOISY DATA")
print("="*50)

from torchvision.transforms import GaussianBlur
import random

def add_noise_transform(noise_level=0.3):
    """Create transform that adds noise to images"""
    return transforms.Compose([
        lambda img: img.resize((processor.size["height"], processor.size["width"])),
        transforms.ToTensor(),
        lambda x: x + torch.randn_like(x) * noise_level,  # Add Gaussian noise
        lambda x: torch.clamp(x, 0, 1),  # Clamp values to valid range
        transforms.Normalize(mean=processor.image_mean, std=processor.image_std),
    ])

def preprocess_noisy(examples, noise_level=0.3):
    """Preprocess images with added noise"""
    transform = add_noise_transform(noise_level)
    examples["pixel_values"] = [
        transform(img.convert("RGB")) for img in examples["image"]
    ]
    return examples

# Create noisy validation dataset
print("\nCreating noisy validation dataset...")
noisy_val_dataset = dataset["test"].with_transform(
    lambda ex: preprocess_noisy(ex, noise_level=0.3)
)

# Test merged model on noisy data
print("\nEvaluating merged model on noisy data...")
noisy_trainer = Trainer(
    model=merged_model,
    args=TrainingArguments(
        output_dir="./temp",
        per_device_eval_batch_size=16,
        remove_unused_columns=False,
        report_to="none"
    ),
    eval_dataset=noisy_val_dataset,
    tokenizer=processor,
    compute_metrics=compute_metrics,
    data_collator=collate_fn,
)

noisy_eval_results = noisy_trainer.evaluate()
print(f"\nNoisy Data Evaluation Results:")
for key, value in noisy_eval_results.items():
    print(f"  {key}: {value:.4f}")



TESTING WITH NOISY DATA

Creating noisy validation dataset...

Evaluating merged model on noisy data...


  noisy_trainer = Trainer(



Noisy Data Evaluation Results:
  eval_loss: 1.5316
  eval_model_preparation_time: 0.0024
  eval_accuracy: 0.4749
  eval_f1: 0.4483
  eval_runtime: 64.2474
  eval_samples_per_second: 11.7980
  eval_steps_per_second: 0.7470


In [18]:
# ============================================================================
# 14. Train Second LoRA on Composed Model
# ============================================================================
print("\n" + "="*50)
print("TRAINING SECOND LORA ON COMPOSED MODEL")
print("="*50)

# Prepare noisy training dataset
print("\nPreparing noisy training dataset...")
noisy_train_dataset = dataset["train"].with_transform(
    lambda ex: preprocess_noisy(ex, noise_level=0.3)
)

# Load the merged model (first composed model)
print("\nLoading first composed model...")
second_base_model = ViTForImageClassification.from_pretrained("./vit_lora_merged")

# Configure second LoRA
print("\nConfiguring second LoRA...")
second_lora_config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules=["query", "value"],
    lora_dropout=0.1,
    bias="none",
    modules_to_save=["classifier"],
)

# Apply second LoRA to the composed model
second_lora_model = get_peft_model(second_base_model, second_lora_config)
print("\nSecond LoRA trainable parameters:")
second_lora_model.print_trainable_parameters()

# Training configuration for second LoRA
print("\nSetting up second training configuration...")
second_training_args = TrainingArguments(
    output_dir="./vit_lora2_finetuned",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    learning_rate=5e-4,
    warmup_ratio=0.1,
    logging_steps=10,
    eval_strategy="steps",
    eval_steps=50,
    save_strategy="steps",
    save_steps=50,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    remove_unused_columns=False,
    push_to_hub=False,
    report_to="none",
    fp16=torch.cuda.is_available(),
)

# Initialize second trainer
print("\nInitializing second trainer...")
second_trainer = Trainer(
    model=second_lora_model,
    args=second_training_args,
    train_dataset=noisy_train_dataset,
    eval_dataset=noisy_val_dataset,
    processing_class=processor,
    compute_metrics=compute_metrics,
    data_collator=collate_fn,
)

# Train second LoRA
print("\nStarting second LoRA training...")
second_train_results = second_trainer.train()

# Evaluate second LoRA
print("\nEvaluating second composed model on noisy data...")
second_eval_results = second_trainer.evaluate()
print(f"\nSecond Model Evaluation Results:")
for key, value in second_eval_results.items():
    print(f"  {key}: {value:.4f}")



TRAINING SECOND LORA ON COMPOSED MODEL

Preparing noisy training dataset...

Loading first composed model...

Configuring second LoRA...

Second LoRA trainable parameters:
trainable params: 667,493 || all params: 86,543,818 || trainable%: 0.7713

Setting up second training configuration...

Initializing second trainer...

Starting second LoRA training...


  second_trainer = Trainer(


Step,Training Loss,Validation Loss,Accuracy,F1
50,0.833,0.689426,0.773087,0.771061
100,0.5455,0.521339,0.827177,0.823966
150,0.3041,0.458816,0.837731,0.83683
200,0.4553,0.438597,0.840369,0.840236
250,0.3027,0.384517,0.873351,0.872433
300,0.4343,0.392466,0.873351,0.873192
350,0.2651,0.38952,0.873351,0.872572
400,0.346,0.337733,0.885224,0.885176
450,0.2525,0.375041,0.873351,0.87305
500,0.3602,0.340367,0.879947,0.879693





Evaluating second composed model on noisy data...





Second Model Evaluation Results:
  eval_loss: 0.3469
  eval_accuracy: 0.8852
  eval_f1: 0.8848
  eval_runtime: 69.0327
  eval_samples_per_second: 10.9800
  eval_steps_per_second: 0.6950
  epoch: 3.0000


In [19]:
# ============================================================================
# 15. Save Second Composed Model
# ============================================================================
print("\nSaving second LoRA adapter...")
second_lora_model.save_pretrained("./vit_lora2_adapter")
processor.save_pretrained("./vit_lora2_adapter")
print("Second LoRA adapter saved successfully!")

# Merge second LoRA weights and save
print("\nMerging second LoRA weights with model...")
second_merged_model = second_lora_model.merge_and_unload()
second_merged_model.save_pretrained("./vit_lora2_merged")
processor.save_pretrained("./vit_lora2_merged")
print("Second merged model saved successfully!")


Saving second LoRA adapter...
Second LoRA adapter saved successfully!

Merging second LoRA weights with model...
Second merged model saved successfully!


In [20]:
# ============================================================================
# 16. Final Comparison
# ============================================================================
print("\n" + "="*50)
print("FINAL COMPARISON")
print("="*50)

# Test on clean data
clean_eval_first = Trainer(
    model=merged_model,
    args=TrainingArguments(
        output_dir="./temp",
        per_device_eval_batch_size=16,
        remove_unused_columns=False,
        report_to="none"
    ),
    eval_dataset=val_dataset,
    processing_class=processor,
    compute_metrics=compute_metrics,
    data_collator=collate_fn,
).evaluate()

clean_eval_second = Trainer(
    model=second_merged_model,
    args=TrainingArguments(
        output_dir="./temp",
        per_device_eval_batch_size=16,
        remove_unused_columns=False,
        report_to="none"
    ),
    eval_dataset=val_dataset,
    tokenizer=processor,
    compute_metrics=compute_metrics,
    data_collator=collate_fn,
).evaluate()

print("\nðŸ“Š Performance Summary:")
print("\nFirst Composed Model (LoRA 1):")
print(f"  Clean data accuracy: {clean_eval_first['eval_accuracy']:.4f}")
print(f"  Noisy data accuracy: {noisy_eval_results['eval_accuracy']:.4f}")

print("\nSecond Composed Model (LoRA 1 + LoRA 2):")
print(f"  Clean data accuracy: {clean_eval_second['eval_accuracy']:.4f}")
print(f"  Noisy data accuracy: {second_eval_results['eval_accuracy']:.4f}")

print("\nâœ… All training complete!")
print("   - First LoRA adapter: ./vit_lora_adapter")
print("   - First merged model: ./vit_lora_merged")
print("   - Second LoRA adapter: ./vit_lora2_adapter")
print("   - Second merged model: ./vit_lora2_merged")


FINAL COMPARISON




  clean_eval_second = Trainer(



ðŸ“Š Performance Summary:

First Composed Model (LoRA 1):
  Clean data accuracy: 0.9697
  Noisy data accuracy: 0.4749

Second Composed Model (LoRA 1 + LoRA 2):
  Clean data accuracy: 0.9723
  Noisy data accuracy: 0.8852

âœ… All training complete!
   - First LoRA adapter: ./vit_lora_adapter
   - First merged model: ./vit_lora_merged
   - Second LoRA adapter: ./vit_lora2_adapter
   - Second merged model: ./vit_lora2_merged
