<a href="https://colab.research.google.com/github/japonkei/jk-colab/blob/main/peftLoraImgClassf.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Image classification using LoRA

### Install dependencies

In [1]:
!pip install transformers accelerate evaluate datasets peft -q


In [2]:
import transformers
import accelerate
import peft

print(f"Transformers version: {transformers.__version__}")
print(f"Accelerate version: {accelerate.__version__}")
print(f"PEFT version: {peft.__version__}")
"Transformers version: 4.27.4"
"Accelerate version: 0.18.0"
"PEFT version: 0.2.0"

Transformers version: 4.35.2
Accelerate version: 0.25.0
PEFT version: 0.7.1


'PEFT version: 0.2.0'

# Select a model checkpoint to fine-tune

In [3]:
model_checkpoint = "google/vit-base-patch16-224-in21k"

In [4]:
# load a dataset
from datasets import load_dataset

dataset = load_dataset("food101", split="train[:5000]")

Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/75750 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/25250 [00:00<?, ? examples/s]

# Prepare dataset

In [5]:
labels = dataset.features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = i
    id2label[i] = label

id2label[2]
"baklava"

'baklava'

In [6]:
from transformers import AutoImageProcessor

image_processor = AutoImageProcessor.from_pretrained(model_checkpoint)

preprocessor_config.json:   0%|          | 0.00/160 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/502 [00:00<?, ?B/s]

In [7]:
from torchvision.transforms import (
    CenterCrop,
    Compose,
    Normalize,
    RandomHorizontalFlip,
    RandomResizedCrop,
    Resize,
    ToTensor,
)

normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
train_transforms = Compose(
    [
        RandomResizedCrop(image_processor.size["height"]),
        RandomHorizontalFlip(),
        ToTensor(),
        normalize,
    ]
)

val_transforms = Compose(
    [
        Resize(image_processor.size["height"]),
        CenterCrop(image_processor.size["height"]),
        ToTensor(),
        normalize,
    ]
)


def preprocess_train(example_batch):
    """Apply train_transforms across a batch."""
    example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]]
    return example_batch


def preprocess_val(example_batch):
    """Apply val_transforms across a batch."""
    example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
    return example_batch

In [8]:
splits = dataset.train_test_split(test_size=0.1)
train_ds = splits["train"]
val_ds = splits["test"]

In [9]:
train_ds.set_transform(preprocess_train)
val_ds.set_transform(preprocess_val)

# Load and prepare a model

In [10]:
def print_trainable_parameters(model):
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}"
    )

In [11]:
from transformers import AutoModelForImageClassification, TrainingArguments, Trainer

model = AutoModelForImageClassification.from_pretrained(
    model_checkpoint,
    label2id=label2id,
    id2label=id2label,
    ignore_mismatched_sizes=True,  # provide this in case you're planning to fine-tune an already fine-tuned checkpoint
)

pytorch_model.bin:   0%|          | 0.00/346M [00:00<?, ?B/s]

Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [12]:
print_trainable_parameters(model)
"trainable params: 85876325 || all params: 85876325 || trainable%: 100.00"

trainable params: 85876325 || all params: 85876325 || trainable%: 100.00


'trainable params: 85876325 || all params: 85876325 || trainable%: 100.00'

In [13]:
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules=["query", "value"],
    lora_dropout=0.1,
    bias="none",
    modules_to_save=["classifier"],
)
lora_model = get_peft_model(model, config)
print_trainable_parameters(lora_model)
"trainable params: 667493 || all params: 86466149 || trainable%: 0.77"

trainable params: 667493 || all params: 86543818 || trainable%: 0.77


'trainable params: 667493 || all params: 86466149 || trainable%: 0.77'

Let’s unpack what’s going on here. To use LoRA, you need to specify the target modules in LoraConfig so that get_peft_model() knows which modules inside our model need to be amended with LoRA matrices. In this example, we’re only interested in targeting the query and value matrices of the attention blocks of the base model. Since the parameters corresponding to these matrices are “named” “query” and “value” respectively, we specify them accordingly in the target_modules argument of LoraConfig.

We also specify modules_to_save. After wrapping the base model with get_peft_model() along with the config, we get a new model where only the LoRA parameters are trainable (so-called “update matrices”) while the pre-trained parameters are kept frozen. However, we want the classifier parameters to be trained too when fine-tuning the base model on our custom dataset. To ensure that the classifier parameters are also trained, we specify modules_to_save. This also ensures that these modules are serialized alongside the LoRA trainable parameters when using utilities like save_pretrained() and push_to_hub().

Here’s what the other parameters mean:

* r: The dimension used by the LoRA update matrices.
* alpha: Scaling factor.
* bias: Specifies if the bias parameters should be trained. None denotes none of the bias parameters will be trained.

`r` and `alpha` together control the total number of final trainable parameters when using LoRA, giving you the flexibility to balance a trade-off between end performance and compute efficiency.

By looking at the number of trainable parameters, you can see how many parameters we’re actually training. Since the goal is to achieve parameter-efficient fine-tuning, you should expect to see fewer trainable parameters in the lora_model in comparison to the original model, which is indeed the case here.

# Define training arguments

In [14]:
from transformers import TrainingArguments, Trainer


model_name = model_checkpoint.split("/")[-1]
batch_size = 128

args = TrainingArguments(
    f"{model_name}-finetuned-lora-food101",
    remove_unused_columns=False,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-3,
    per_device_train_batch_size=batch_size,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=batch_size,
    fp16=True,
    num_train_epochs=5,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    #push_to_hub=True,
    push_to_hub=False,
    label_names=["labels"],
)

# Prepare evaluation metric

In [15]:
pip install evaluate



In [16]:
import numpy as np

import evaluate # first install using pip if missing

metric = evaluate.load("accuracy")


def compute_metrics(eval_pred):
    """Computes accuracy on a batch of predictions"""
    predictions = np.argmax(eval_pred.predictions, axis=1)
    return metric.compute(predictions=predictions, references=eval_pred.label_ids)

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

# Define collation function

In [17]:
import torch


def collate_fn(examples):
    pixel_values = torch.stack([example["pixel_values"] for example in examples])
    labels = torch.tensor([example["label"] for example in examples])
    return {"pixel_values": pixel_values, "labels": labels}

# Train and evaluate

In [18]:
from transformers import Trainer

In [19]:
trainer = Trainer(
    lora_model,
    args,
    train_dataset=train_ds,
    eval_dataset=val_ds,
    tokenizer=image_processor,
    compute_metrics=compute_metrics,
    data_collator=collate_fn,
)
train_results = trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.536706,0.892
2,2.151600,0.213323,0.94
3,0.346900,0.175512,0.952
4,0.198600,0.156985,0.95
5,0.166500,0.156821,0.95


In [21]:
trainer.evaluate(val_ds)
# output:
# {
#     "eval_loss": 0.14475855231285095,
#     "eval_accuracy": 0.96,
#     "eval_runtime": 3.5725,
#     "eval_samples_per_second": 139.958,
#     "eval_steps_per_second": 1.12,
#     "epoch": 5.0,
# }

{'eval_loss': 0.17551171779632568,
 'eval_accuracy': 0.952,
 'eval_runtime': 6.1591,
 'eval_samples_per_second': 81.181,
 'eval_steps_per_second': 0.649,
 'epoch': 5.0}

In [20]:
{
    "eval_loss": 0.14475855231285095,
    "eval_accuracy": 0.96,
    "eval_runtime": 3.5725,
    "eval_samples_per_second": 139.958,
    "eval_steps_per_second": 1.12,
    "epoch": 5.0,
}