This repo is the official implementation of our CVPR2024 paper "Instance-based Max-margin for Practical Few-shot Recognition" [arXiv][paper][appendix][poster][video].
This paper proposes:
- A practical FSL (pFSL) setting based on unsupervised pretrained models and recognizes many novel classes simultaneously.
- IbM2, an instance-based max-margin methd based on the Gaussian Annulus Theorem. IbM2 converts random noise applied to the instances into a mechanism to achieve maximum margin.
- python 3.8
- pytorch >= 1.7
- torchvision >= 0.8
- timm 0.4.9
The ImageNet dataset is a large-scale image dataset widely used for various computer vision tasks.
- Register and download the dataset from the official ImageNet website.
- Follow the instructions to download the ILSVRC2012 dataset, which includes training and validation images.
After downloading, organize the dataset into the following directory structure:
/path/to/imagenet/
train/
n01440764/
n01440764_18.JPEG
...
...
val/
n01440764/
ILSVRC2012_val_00000293.JPEG
...
...
- For 1%-ImageNet variant, please refer to imagenet_subsets and build_imagenet_subsets for more details. Or for convinence, you can directly download the training set from this link.
The CUB-200-2011 (Caltech-UCSD Birds-200-2011) dataset is a widely used dataset for fine-grained visual categorization tasks, specifically bird species classification.
- Download the dataset from the official CUB-200-2011 website.
- Extract the downloaded tar file.
After extracting, organize the dataset into the following directory structure:
/path/to/cub_200_2011/
images/
001.Black_footed_Albatross/
Black_Footed_Albatross_0001_796111.jpg
...
...
train_test_split.txt
Use the script datasets/cub_preprocess.py to organize the images folder into train and test folders. After running the script, you will have two additional folders named train and test in your root directory.
IbM2 is a method for directly interacting with features extracted by backbones. To facilitate the evaluation process, you can preliminarily store the features using the provided scripts: extract_features.py and extract_features_imagenet_1pt.py.
python extract_features.py
Parameters:
- dataset - choices: Imagenet or CUB.
- shot - choices: 1, 2, 3, 4, 5, 8, 16.
- save_test: whether to save the features for testing.
- arch: model architecture - choice: deit_small_p16, deit_large_p7, deit_base_p4, resnet50
- batch_size: batch size to extract the features
- pretrain_method: unsupervised pretraining method - choice: DINO, MSN, MoCov3, SimCLR, BYOL.
or
python extract_features_imagenet_1pt.py
Note: Before that, you should download the
- few-shot annotations (few_shot_split folder) (link)
- backbone checkpoints (refer to table below)
- correctly set the path in config.py.
The template of config.py looks like:
- IMAGENET_PATH: root path of imagenet dataset
- CUB_PATH: root path of CUB dataset
- IMAGENET_1PT_PATH: root path of 1%-imagnet dataset
- SPLIT_PATH: path of few-shot annotations files (/path/to/few_shot_split)
We experiment our IbM2 with various unsupervised pretraining method pretrained on ImageNet-1K. We provide the backbone checkpoints and the correponding extracted features to reproduce our results:
Method | Architecture | Checkpoint | ImageNet Features | CUB Features | 1-pct Features |
---|---|---|---|---|---|
DINO | ViT-S/16 | backbone | imagenet features | CUB features | 1%-imagenet features |
MoCov3 | ViT-S/16 | backbone | imagenet features | - | 1%-imagenet features |
MSN | ViT-S/16 | backbone | imagenet features | CUB features | 1%-imagenet features |
MSN | ViT-B/4 | backbone | imagenet features | - | 1%-imagenet features |
MSN | ViT-L/7 | backbone | imagenet features | CUB features | 1%-imagenet features |
SimCLR | ResNet50 | backbone | imagenet features | - | 1%-imagenet features |
BYOL | ResNet50 | backbone | imagenet features | - | 1%-imagenet features |
bash scripts/bsearch_finetune_search_continue_channel_wise.sh $cuda_id $shot $dataset $arch $pretrain_method
Parameters:
- cuda_id - int: cuda index to run the code.
- shot - int: training shot, choices - 1, 2, 3, 4, 5, 8, 16.
- dataset- string: choices - Imagenet or CUB.
- arch: model architecture - choice: deit_small_p16, deit_large_p7, deit_base_p4, resnet50
- pretrain_method: unsupervised pretraining method - choice: DINO, MSN, MoCov3, SimCLR, BYOL.
An example to run the code is:
bash scripts/bsearch_finetune_search_continue_channel_wise.sh 0 1 Imagenet deit_small_p16 DINO
which means the configuration is to experiment with Imagenet 1shot using features from DINO(Vit-S/16).
The script to run the code is similar in this case:
bash scripts/bsearch_finetune_search_continue_channel_wise_imagenet_subsets.sh $cuda_id $arch $pretrain_method
If this project is helpful for you, you can cite our paper:
@inproceedings{fu2024ibm2,
title={Instance-based Max-margin for Practical Few-shot Recognition},
author={Fu, Minghao and Zhu, Ke},
booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024},
}
The code is built upon on timm.