# SEG-PEFT - Targeted Ablation on 2D Segmentation

This notebook performs a comprehensive ablation study comparing Full Fine-Tuning (FFT) against LoRA with varying rank and alpha configurations on the Kvasir-SEG dataset.

## Objective
Evaluate how LoRA hyperparameters (rank r and scaling factor α) affect segmentation performance compared to FFT baseline.

## Experimental Setup
- **Dataset**: Kvasir-SEG (gastrointestinal polyp segmentation)
- **Model**: SegFormer-B0 pretrained on ADE20K
- **Training**: 50 epochs, learning rate 5e-4, dropout 0.0
- **Evaluation**: Epoch-based metrics (Mean Dice, Mean IoU, Mean Accuracy)

## Experiments
1. **FFT Baseline**: Full model fine-tuning
2. **LoRA Ablation**: 15 configurations with rank r ∈ {4, 8, 16, 32, 64} and α/r ratios ∈ {1, 2, 4}

In [1]:
!git clone https://github.com/rossoc/SEG-PEFT
%cd SEG-PEFT
!pip install evaluate

Cloning into 'SEG-PEFT'...
remote: Enumerating objects: 162, done.[K
remote: Counting objects: 100% (162/162), done.[K
remote: Compressing objects: 100% (89/89), done.[K
remote: Total 162 (delta 75), reused 128 (delta 45), pack-reused 0 (from 0)[K
Receiving objects: 100% (162/162), 211.49 KiB | 15.11 MiB/s, done.
Resolving deltas: 100% (75/75), done.
/content/SEG-PEFT
Collecting evaluate
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.6


In [2]:
import torch
from transformers import TrainingArguments, Trainer, EarlyStoppingCallback
from src.segpeft import kvasir_dataset, compute_metrics_fn, segformer, set_seed, Metrics
import time
import yaml
import pandas as pd
import os
import zipfile
from peft import get_peft_model, LoraConfig

set_seed(42)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading builder script: 0.00B [00:00, ?B/s]

In [None]:
# Colab utility: Download experiment results
# After each experiment, results are zipped and downloaded automatically
import shutil
from google.colab import files


def download_results(save_dir):
    """Zip and download experiment results folder"""
    output_path = f"./outputs/{save_dir}"
    zip_name = f"{save_dir}_results"

    # Create zip file
    shutil.make_archive(zip_name, "zip", output_path)

    # Download the zip file
    files.download(f"{zip_name}.zip")
    print(f"Downloaded {zip_name}.zip")

## Dataset Setup

Download and prepare the Kvasir-SEG dataset containing 1000 polyp images with corresponding segmentation masks. The dataset is split into 80% training and 20% validation.

Dataset: [Kvasir-SEG](https://datasets.simula.no/kvasir-seg/)

In [3]:
dataset_dir = "data"
os.makedirs(dataset_dir, exist_ok=True)
!wget --no-check-certificate https://datasets.simula.no/downloads/kvasir-seg.zip -O kvasir-seg.zip

with zipfile.ZipFile("kvasir-seg.zip", "r") as zip_ref:
    zip_ref.extractall(dataset_dir)

--2025-11-05 19:26:48--  https://datasets.simula.no/downloads/kvasir-seg.zip
Resolving datasets.simula.no (datasets.simula.no)... 128.39.36.14
Connecting to datasets.simula.no (datasets.simula.no)|128.39.36.14|:443... connected.
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: 46227172 (44M) [application/zip]
Saving to: ‘kvasir-seg.zip’


2025-11-05 19:26:53 (12.4 MB/s) - ‘kvasir-seg.zip’ saved [46227172/46227172]



## Baseline: Full Fine-Tuning (FFT)

Train the complete SegFormer model with all parameters trainable. This serves as the performance upper bound for comparison with LoRA experiments.

**Key characteristics:**
- All 3.7M parameters are trainable
- Higher computational cost and memory requirements
- Evaluation and logging performed every epoch

In [None]:
batch_size = 64
gradient_accumulation_steps = 4
use_bf16 = True
dataloader_num_workers = 8


def train_segformer_fft(epochs, lr, save_dir):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")

    test_size = 0.2
    model, model_name, _ = segformer()
    train_dataset, test_dataset = kvasir_dataset(model_name, test_size)
    N = len(train_dataset)

    training_args = TrainingArguments(
        output_dir="./outputs/" + save_dir,
        num_train_epochs=epochs,
        # A100 Optimization: Larger batch sizes
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size * 2,  # Can use larger batch for eval
        # A100 Optimization: Gradient accumulation for effective larger batches
        gradient_accumulation_steps=gradient_accumulation_steps,
        # A100 Optimization: Mixed precision with bfloat16 (A100's specialty)
        bf16=use_bf16 and torch.cuda.is_available(),
        bf16_full_eval=use_bf16 and torch.cuda.is_available(),
        # A100 Optimization: Efficient data loading
        dataloader_num_workers=dataloader_num_workers,
        dataloader_pin_memory=True,
        dataloader_prefetch_factor=2,
        # Training settings - EPOCH-BASED
        learning_rate=lr,
        save_total_limit=2,
        prediction_loss_only=False,
        remove_unused_columns=True,
        push_to_hub=False,
        report_to="none",
        eval_strategy="epoch",
        save_strategy="epoch",
        logging_strategy="epoch",  # Log every epoch
        load_best_model_at_end=True,
        logging_dir=f"./outputs/{save_dir}/logs",
        # Performance optimization
        optim="adamw_torch_fused" if torch.cuda.is_available() else "adamw_torch",
        # Better learning rate schedule
        warmup_ratio=0.1,
        lr_scheduler_type="cosine",
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=test_dataset,
        compute_metrics=compute_metrics_fn(model_name),  # type: ignore
        callbacks=[EarlyStoppingCallback(early_stopping_patience=10)],
    )

    print("Starting training...")
    start_time = time.time()
    trainer.train()
    end_time = time.time() - start_time

    all_metrics = {
        "training_history": trainer.state.log_history,
        "final_evaluation": trainer.evaluate(),
        "training_time": end_time,
    }

    with open(f"./outputs/{save_dir}/all_metrics.json", "w") as f:
        yaml.dump(all_metrics, f, indent=2)

    df = pd.DataFrame(trainer.state.log_history)
    df.to_csv(f"./outputs/{save_dir}/training_history.csv", index=False)
    trainer.save_model(f"./outputs/{save_dir}/final")

    log = trainer.state.log_history.copy()
    final_train_metrics = trainer.evaluate(eval_dataset=train_dataset)
    log.append({"epoch": epochs, "loss": final_train_metrics["eval_loss"]})
    metrics = Metrics(f"./outputs/{save_dir}/")
    metrics.plot_curves(log)
    return trainer

In [None]:
# FFT Configuration
epochs = 50
learning_rate = 5e-4
save_dir = "segformer_fft_baseline"

In [None]:
# Train FFT baseline
fft_trainer = train_segformer_fft(epochs, learning_rate, save_dir)

In [None]:
# Download FFT results
download_results(save_dir)

## Train
[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer) with
LoRA.
Namely, we use [PEFT](https://github.com/huggingface/peft) to implmenent LoRA.

In [None]:
def train_segformer_lora(epochs, lr, r, lora_alpha, lora_dropout, save_dir):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")

    test_size = 0.2
    model, model_name, modules = segformer()

    peft_config = LoraConfig(
        r=r,
        lora_alpha=lora_alpha,
        lora_dropout=lora_dropout,
        target_modules=modules,
    )

    model = get_peft_model(model, peft_config)

    model.print_trainable_parameters()

    train_dataset, test_dataset = kvasir_dataset(model_name, test_size)
    N = len(train_dataset)

    training_args = TrainingArguments(
        output_dir="./outputs/" + save_dir,
        num_train_epochs=epochs,
        # A100 Optimization: Larger batch sizes
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size * 2,  # Can use larger batch for eval
        # A100 Optimization: Gradient accumulation for effective larger batches
        gradient_accumulation_steps=gradient_accumulation_steps,
        # A100 Optimization: Mixed precision with bfloat16 (A100's specialty)
        bf16=use_bf16 and torch.cuda.is_available(),
        bf16_full_eval=use_bf16 and torch.cuda.is_available(),
        # A100 Optimization: Efficient data loading
        dataloader_num_workers=dataloader_num_workers,
        dataloader_pin_memory=True,
        dataloader_prefetch_factor=2,
        # Training settings - EPOCH-BASED
        learning_rate=lr,
        save_total_limit=2,
        prediction_loss_only=False,
        remove_unused_columns=True,
        push_to_hub=False,
        report_to="none",
        eval_strategy="epoch",
        save_strategy="epoch",
        logging_strategy="epoch",  # Log every epoch
        load_best_model_at_end=True,
        logging_dir=f"./outputs/{save_dir}/logs",
        # Performance optimization
        optim="adamw_torch_fused" if torch.cuda.is_available() else "adamw_torch",
        # Better learning rate schedule
        warmup_ratio=0.1,
        lr_scheduler_type="cosine",
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=test_dataset,
        compute_metrics=compute_metrics_fn(model_name),  # type: ignore
        callbacks=[EarlyStoppingCallback(early_stopping_patience=10)],
    )

    print("Starting training...")
    start_time = time.time()
    trainer.train()
    end_time = time.time() - start_time

    all_metrics = {
        "training_history": trainer.state.log_history,
        "final_evaluation": trainer.evaluate(),
        "training_time": end_time,
    }

    with open(f"./outputs/{save_dir}/all_metrics.json", "w") as f:
        yaml.dump(all_metrics, f, indent=2)

    df = pd.DataFrame(trainer.state.log_history)
    df.to_csv(f"./outputs/{save_dir}/training_history.csv", index=False)
    trainer.save_model(f"./outputs/{save_dir}/final")

    log = trainer.state.log_history.copy()
    final_train_metrics = trainer.evaluate(eval_dataset=train_dataset)
    log.append({"epoch": epochs, "loss": final_train_metrics["eval_loss"]})
    metrics = Metrics(f"./outputs/{save_dir}/")
    metrics.plot_curves(log)

    return trainer

## LoRA Ablation Study

Parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA). Only low-rank matrices are trained while the base model remains frozen, drastically reducing trainable parameters (~1.7% of full model).

**Fixed hyperparameters across all experiments:**
- Epochs: 50
- Learning rate: 5e-4
- Dropout: 0.0
- Evaluation: Every epoch

**Variable hyperparameters:**
- Rank (r): Controls capacity of low-rank adaptation matrices
- Alpha (α): Scaling factor for LoRA updates

Each experiment follows this pattern: higher rank = more parameters but potentially better performance.

---

### Experiment 1: r=4, α ∈ {4, 8, 16}

Minimal parameter overhead (~0.17% trainable). Tests different α/r scaling ratios (1, 2, 4) with the smallest rank.

In [None]:
# r=4, alpha=4
epochs = 50
learning_rate = 5e-4
rank = 4
lora_alpha = 4
lora_dropout = 0.0
save_dir = "lora_r4_alpha4"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r4_a4 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r4_alpha4")

In [None]:
# r=4, alpha=8
epochs = 50
learning_rate = 5e-4
rank = 4
lora_alpha = 8
lora_dropout = 0.0
save_dir = "lora_r4_alpha8"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r4_a8 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r4_alpha8")

In [None]:
# r=4, alpha=16
epochs = 50
learning_rate = 5e-4
rank = 4
lora_alpha = 16
lora_dropout = 0.0
save_dir = "lora_r4_alpha16"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r4_a16 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r4_alpha16")

### Experiment 2: r=8, α ∈ {8, 16, 32}

Double the rank capacity of Experiment 1. Evaluates if increased rank improves segmentation quality with the same α/r ratios.

In [None]:
# r=8, alpha=8
epochs = 50
learning_rate = 5e-4
rank = 8
lora_alpha = 8
lora_dropout = 0.0
save_dir = "lora_r8_alpha8"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r8_a8 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r8_alpha8")

In [None]:
# r=8, alpha=16
epochs = 50
learning_rate = 5e-4
rank = 8
lora_alpha = 16
lora_dropout = 0.0
save_dir = "lora_r8_alpha16"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r8_a16 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r8_alpha16")

In [None]:
# r=8, alpha=32
epochs = 50
learning_rate = 5e-4
rank = 8
lora_alpha = 32
lora_dropout = 0.0
save_dir = "lora_r8_alpha32"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r8_a32 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r8_alpha32")

### Experiment 3: r=16, α ∈ {16, 32, 64}

Medium rank configuration. Explores the performance-parameter tradeoff in the mid-range.

In [None]:
# r=16, alpha=16
epochs = 50
learning_rate = 5e-4
rank = 16
lora_alpha = 16
lora_dropout = 0.0
save_dir = "lora_r16_alpha16"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r16_a16 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r16_alpha16")

In [None]:
# r=16, alpha=32
epochs = 50
learning_rate = 5e-4
rank = 16
lora_alpha = 32
lora_dropout = 0.0
save_dir = "lora_r16_alpha32"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r16_a32 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r16_alpha32")

In [None]:
# r=16, alpha=64
epochs = 50
learning_rate = 5e-4
rank = 16
lora_alpha = 64
lora_dropout = 0.0
save_dir = "lora_r16_alpha64"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r16_a64 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r16_alpha64")

### Experiment 4: r=32, α ∈ {32, 64, 128}

High rank configuration with increased model capacity. Tests if higher rank approaches FFT performance.

In [None]:
# r=32, alpha=32
epochs = 50
learning_rate = 5e-4
rank = 32
lora_alpha = 32
lora_dropout = 0.0
save_dir = "lora_r32_alpha32"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r32_a32 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r32_alpha32")

In [None]:
# r=32, alpha=64
epochs = 50
learning_rate = 5e-4
rank = 32
lora_alpha = 64
lora_dropout = 0.0
save_dir = "lora_r32_alpha64"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r32_a64 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r32_alpha64")

In [None]:
# r=32, alpha=128
epochs = 50
learning_rate = 5e-4
rank = 32
lora_alpha = 128
lora_dropout = 0.0
save_dir = "lora_r32_alpha128"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r32_a128 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r32_alpha128")

### Experiment 5: r=64, α ∈ {64, 128, 256}

Maximum rank configuration with highest parameter count among LoRA experiments. Evaluates the upper limit of LoRA's expressiveness.

In [None]:
# r=64, alpha=64
epochs = 50
learning_rate = 5e-4
rank = 64
lora_alpha = 64
lora_dropout = 0.0
save_dir = "lora_r64_alpha64"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r64_a64 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r64_alpha64")

In [None]:
# r=64, alpha=128
epochs = 50
learning_rate = 5e-4
rank = 64
lora_alpha = 128
lora_dropout = 0.0
save_dir = "lora_r64_alpha128"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r64_a128 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r64_alpha128")

In [None]:
# r=64, alpha=256
epochs = 50
learning_rate = 5e-4
rank = 64
lora_alpha = 256
lora_dropout = 0.0
save_dir = "lora_r64_alpha256"

print(f"Training LoRA with r={rank}, alpha={lora_alpha}")
trainer_r64_a256 = train_segformer_lora(
    epochs, learning_rate, rank, lora_alpha, lora_dropout, save_dir
)

In [None]:
download_results("lora_r64_alpha256")