Skip to content

Latest commit

 

History

History
164 lines (121 loc) · 9.1 KB

README.md

File metadata and controls

164 lines (121 loc) · 9.1 KB

IbM2

This repo is the official implementation of our CVPR2024 paper "Instance-based Max-margin for Practical Few-shot Recognition" [arXiv][paper][appendix][poster][video].

TL;DR

This paper proposes:

  • A practical FSL (pFSL) setting based on unsupervised pretrained models and recognizes many novel classes simultaneously.
  • IbM2, an instance-based max-margin methd based on the Gaussian Annulus Theorem. IbM2 converts random noise applied to the instances into a mechanism to achieve maximum margin.


Environment

  • python 3.8
  • pytorch >= 1.7
  • torchvision >= 0.8
  • timm 0.4.9

Data Preparation

1. Datasets

ImageNet

The ImageNet dataset is a large-scale image dataset widely used for various computer vision tasks.

Download
  1. Register and download the dataset from the official ImageNet website.
  2. Follow the instructions to download the ILSVRC2012 dataset, which includes training and validation images.
Structure

After downloading, organize the dataset into the following directory structure:

/path/to/imagenet/
train/
n01440764/
n01440764_18.JPEG
...
...
val/
n01440764/
ILSVRC2012_val_00000293.JPEG
...
...

CUB-200-2011

The CUB-200-2011 (Caltech-UCSD Birds-200-2011) dataset is a widely used dataset for fine-grained visual categorization tasks, specifically bird species classification.

Download
  1. Download the dataset from the official CUB-200-2011 website.
  2. Extract the downloaded tar file.
Structure

After extracting, organize the dataset into the following directory structure:

/path/to/cub_200_2011/
images/
001.Black_footed_Albatross/
Black_Footed_Albatross_0001_796111.jpg
...
...
train_test_split.txt

Use the script datasets/cub_preprocess.py to organize the images folder into train and test folders. After running the script, you will have two additional folders named train and test in your root directory.

2. Feature Extraction

IbM2 is a method for directly interacting with features extracted by backbones. To facilitate the evaluation process, you can preliminarily store the features using the provided scripts: extract_features.py and extract_features_imagenet_1pt.py.

python extract_features.py

Parameters:

  • dataset - choices: Imagenet or CUB.
  • shot - choices: 1, 2, 3, 4, 5, 8, 16.
  • save_test: whether to save the features for testing.
  • arch: model architecture - choice: deit_small_p16, deit_large_p7, deit_base_p4, resnet50
  • batch_size: batch size to extract the features
  • pretrain_method: unsupervised pretraining method - choice: DINO, MSN, MoCov3, SimCLR, BYOL.

or

python extract_features_imagenet_1pt.py

Note: Before that, you should download the

  1. few-shot annotations (few_shot_split folder) (link)
  2. backbone checkpoints (refer to table below)
  3. correctly set the path in config.py.

The template of config.py looks like:

  • IMAGENET_PATH: root path of imagenet dataset
  • CUB_PATH: root path of CUB dataset
  • IMAGENET_1PT_PATH: root path of 1%-imagnet dataset
  • SPLIT_PATH: path of few-shot annotations files (/path/to/few_shot_split)

We experiment our IbM2 with various unsupervised pretraining method pretrained on ImageNet-1K. We provide the backbone checkpoints and the correponding extracted features to reproduce our results:

Method Architecture Checkpoint ImageNet Features CUB Features 1-pct Features
DINO ViT-S/16 backbone imagenet features CUB features 1%-imagenet features
MoCov3 ViT-S/16 backbone imagenet features - 1%-imagenet features
MSN ViT-S/16 backbone imagenet features CUB features 1%-imagenet features
MSN ViT-B/4 backbone imagenet features - 1%-imagenet features
MSN ViT-L/7 backbone imagenet features CUB features 1%-imagenet features
SimCLR ResNet50 backbone imagenet features - 1%-imagenet features
BYOL ResNet50 backbone imagenet features - 1%-imagenet features

Usage

Experiments on ImageNet-1K & CUB datasets

bash scripts/bsearch_finetune_search_continue_channel_wise.sh $cuda_id $shot $dataset $arch $pretrain_method

Parameters:

  • cuda_id - int: cuda index to run the code.
  • shot - int: training shot, choices - 1, 2, 3, 4, 5, 8, 16.
  • dataset- string: choices - Imagenet or CUB.
  • arch: model architecture - choice: deit_small_p16, deit_large_p7, deit_base_p4, resnet50
  • pretrain_method: unsupervised pretraining method - choice: DINO, MSN, MoCov3, SimCLR, BYOL.

An example to run the code is:

bash scripts/bsearch_finetune_search_continue_channel_wise.sh 0 1 Imagenet deit_small_p16 DINO

which means the configuration is to experiment with Imagenet 1shot using features from DINO(Vit-S/16).

Experiments on 1%-ImageNet semi-supervised leanring

The script to run the code is similar in this case:

bash scripts/bsearch_finetune_search_continue_channel_wise_imagenet_subsets.sh $cuda_id $arch $pretrain_method

Citation

If this project is helpful for you, you can cite our paper:

@inproceedings{fu2024ibm2,
      title={Instance-based Max-margin for Practical Few-shot Recognition},
      author={Fu, Minghao and Zhu, Ke},
      booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2024},
}

Acknowledgement

The code is built upon on timm.