[PyTorch] Code for the paper - 'Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting' (CVPR - eLVM 2024). Paper
Includes standard full model, linear probing and parameter efficient strategies like Block Expansion and LoRA for fine-tuning Vision Transformers (ViTs) for image classification.
- Python 3.8+
pip install -r requirements.txt
Dataset | --data.dataset |
---|---|
CIFAR-10 | cifar10 |
CIFAR-100 | cifar100 |
Oxford-IIIT Pet Dataset | pets37 |
Oxford Flowers-102 | flowers102 |
Food-101 | food101 |
Describable Textures Dataset | dtd |
Image Folder | custom dataset |
config/
contains example configuration files which can be run with:
python main.py fit --config path/to/config
You can either edit the existing config for your own choice of hyperparameters or choose to do it from command line as follows:
python main.py fit --trainer.accelerator gpu --trainer.devices 1 --trainer.precision 16-mixed
--trainer.max_steps 5000 --model.warmup_steps 500 --model.lr 0.01
--trainer.val_check_interval 500 --data.batch_size 128 --data.dataset cifar100
- To fully fine-tune a ViT-B/16 model on Foods-101 run:
python main.py fit --config configs/full/food101.yaml
- To train linear probes on top of a ViT-B/16 model on Foods-101 run:
python main.py fit --config configs/linear/food101.yaml
- To fine-tuning a ViT-B/16 model using LoRA on Foods-101 run:
python main.py fit --config configs/lora/food101.yaml
- To fine-tune a ViT-B/16 model using block expansion on Foods-101 run:
python main.py fit --config configs/block/food101.yaml
To train on a custom dataset first organize the images into
Image Folder
format. Then set --data.dataset custom
, --data.root path/to/custom/dataset
and --data.num_classes <num-dataset-classes>
.
To evaluate a trained model on its test set, find the path of the saved config file for the checkpoint (eg. output/cifar10/version_0/config.yaml
) and run:
python main.py test --ckpt_path path/to/checkpoint --config path/to/config
- Note: Make sure the
--trainer.precision
argument is set to the same level as used during training.
All results are from fine-tuned ViT-B/16 models which were pretrained on ImageNet-21k (--model.model_name vit-b16-224-in21k
).
Model | # Params | Cifar-100 | IN-1k | MEAN | Config |
---|---|---|---|---|---|
All | 85.9 M | 88.13 | 25.24 | 56.69 | Link |
Top-3 | 21.3 M | 84.56 | 74.15 | 79.36 | Link |
Linear | 76.9 K | 80.57 | 76.11 | 78.34 | Link |
Model | # Params | Cifar-100 | IN-1k | MEAN | Config |
---|---|---|---|---|---|
r=4 | 301 K | 87.91 | 66.82 | 77.37 | Link |
r=8 | 448 K | 88.27 | 65.99 | 77.13 | Link |
r=16 | 743 K | 87.84 | 65.06 | 76.45 | Link |
Model | # Params | Cifar-100 | IN-1k | MEAN | Config |
---|---|---|---|---|---|
p=1 | 7.2 M | 82.72 | 75.75 | 79.24 | Link |
p=2 | 14.3 M | 86.70 | 75.54 | 81.12 | Link |
p=3 | 21.3 M | 88.58 | 74.61 | 81.60 | Link |
p=4 | 28.4 M | 89.09 | 72.28 | 80.69 | Link |
You can cite us using the following:
@InProceedings{Bafghi_2024_CVPR,
author = {Bafghi, Reza Akbarian and Harilal, Nidhin and Monteleoni, Claire and Raissi, Maziar},
title = {Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2024},
pages = {3679-3684}
}