Skip to content

Official code of "Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer", Guglielmo Camporese, Elena Izzo, Lamberto Ballan. BMVC, 2022.

Notifications You must be signed in to change notification settings

guglielmocamporese/relvit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RelViT

arXiv arXiv arXiv

If you use the code of this repo and you find this project useful, 
please consider to give a star ⭐!

This repository hosts the official code related to the paper "Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer", Guglielmo Camporese, Elena Izzo, Lamberto Ballan - BMVC 2022. [arXiv] [video]

relvit

BibTex Citation

@inproceedings{Camporese2022WhereAM,
  title     = {Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer},
  author    = {Guglielmo Camporese, Elena Izzo, Lamberto Ballan},
  booktitle = {British Machine Vision Conference (BMVC)},
  year      = {2022}
}

Updates

  • [22/10/13] Our paper has been accepted to BMVC 2022 (oral spotlight)!
  • [22/06/02] Our paper is on arXiv! Here you can find the link.
  • [22/05/24] Our paper has been selected for a spotlight oral presentation at the CVPR 2022 "T4V: Transformers for Vision" workshop!
  • [22/05/23] Our paper just got accepted at the CVPR 2022 "T4V: Transformers for Vision" workshop!

Install

Install
# clone the repo
git clone https://github.com/guglielmocamporese/relvit.git

# install and activate the conda env
cd relvit
conda env create -f env.yml
conda activate relvit

Training

All the commands are based on the training scripts in the scripts folder.

Self-Supervised Pre-Training + Supervised Finetuning

Self-Supervised Pre-Training + Supervised Finetuning

Here you can find the commands for:

  1. Running the self-supervised learning pre-training
# SSL upstream pre-training
bash scripts/upstream.sh \
    --exp_id upstream_cifar10 --backbone vit \
    --model_size small --num_gpus 1 --epochs 100 --dataset cifar10  \
    --weight_decay 0.1 --drop_path_rate 0.1 --dropout 0.0
  1. Running the supervised finetuning using the checkpoint obtained in the previous step.

After running the upstream pre-training, the directory tmp/relvit will contain the file checkpoint checkpoints/best.ckpt file that has to be passed to the finetuning script in the --model_checkpoint argument.

# supervised downstream
bash scripts/downstream.sh \
    --exp_id downstream_cifar10 --backbone vit --num_gpus 1 \
    --epochs 100 --dataset cifar10  --weight_decay 0.1 --drop_path_rate 0.1 \
    --model_size small --dropout 0.0 --model_checkpoint checkpoint_path
Downstream-Only Experiment

Downstream-Only Experiment

Here you can find the commands for training the ViT, Swin, and T2T models for the downstram-only supervised task.

# ViT downstream-only
bash scripts/downstream-only.sh \
    --seed 2022 --exp_id downstream-only_vit_cifar10 \
    --backbone vit --dataset cifar10 --weight_decay 0.1 --drop_path_rate 0.1 \
    --model_size small --dropout 0.0 --patch_trans colJitter:0.8-grayScale:0.2

# Swin downstream-only
bash scripts/downstream-only.sh \
    --seed 2022 --exp_id downstream-only_swin_cifar10 \
    --backbone swin  --dataset cifar10_224 --batch_size 64 --weight_decay 0.1 \
    --drop_path_rate 0.1 --model_size tiny --dropout 0.0 \
    --patch_trans colJitter:0.8-grayScale:0.2 

# T2T downstream-only
bash scripts/downstream-only.sh \
    --seed 2022--exp_id downstream-only_t2t_cifar10 \
    --backbone t2t_vit --dataset cifar10_224 --batch_size 64 --weight_decay 0.1 \
    --drop_path_rate 0.1 --model_size 14 --dropout 0.0
Mega-Patches Ablation

Mega-Patches Ablation

Here you can find the experiments with the use of the mega-patches described in the paper. Also in this case, you can find the commands for the SSL upstream with the mega-patches and the subsequent supervised finetuning.

# SSL upstream pre-training with 6x6 megapatches 
bash scripts/upstream_MEGApatch.sh \
    --exp_id upstream_megapatch_imagenet100 \
    --backbone vit --model_size small --dataset imagenet100 \
    --batch_size 256 --weight_decay 0.1 --drop_path_rate 0.1 \
    --dropout 0.0 --side_megapatches 6

After running the upstream pre-training, the directory tmp/relvit will contain the file checkpoint checkpoints/best.ckpt file that has to be passed to the finetuning script in the --model_checkpoint argument.

# downstream finetuning
bash scripts/downstream.sh \
    --exp_id downstream_imagenet100 --backbone vit \
    --dataset imagenet100 --weight_decay 0.1 --drop_path_rate 0.1 \
    --model_size small --dropout 0.0 --model_checkpoint checkpoint_path
Supported Datasets

Supported Datasets

Here you can find the list of all the supported datasets in the repo that can be specified using the --datasets input argument in the previous commands.

Datasets

  • CIFAR10
  • CIFAR100
  • Flower102
  • SVHN
  • Tiny ImageNet
  • ImageNet100

Validate

Validation Scripts
# validation on the upstream task
bash scripts/upstream.sh --dataset cifar10 --mode validation

# validation on the downstream task
bash scripts/downstream.sh --dataset cifar10 --mode validation

About

Official code of "Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer", Guglielmo Camporese, Elena Izzo, Lamberto Ballan. BMVC, 2022.

Topics

Resources

Stars

Watchers

Forks