The official implementation of A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models.
Julio Silva-Rodriguez,
Sina Hajimiri,
Ismail Ben Ayed,
Jose Dolz
ÉTS Montreal
| Project | Paper | Code |
When adapting CLIP using only few-shot, it is unrealistic to assume the presence of a validation subset to empirically fix a set of hyperparameters per task, i.e. model selection. We propose two solutions, which do not require any hyperparameter tuning, and thus is adapted strictly using only the support samples.
- A revisited zero-shot initialized Linear Probe (ZS-LP), tailored for CLIP-alike vision-language models.
- A constraint formulation to retain prior knowledge of the robust zero-shot prototypes per class, CLass adaptive Linear Probing (CLAP).
This repository requires to install the environment and datasets:
- follow here to install Dassl.pytorch and PyTorch.
- run
pip install -r requirements.txt
underCLAP/
to install a few more packages required by CLIP (this should be done whendassl
is activated). - follow DATASETS.md to install the datasets.
PS: You can also follow CoOp to perform the installation.
We present the basic usage here.
(a) Zero-shot initialized Linear Probe (ZS-LP):
bash scripts/adapt.sh 0 imagenet SGD_lr1e-1_B256_ep300 1 ZS none RN50
(b) CLass adaptive Linear Probing (CLAP):
bash scripts/adapt.sh 0 imagenet SGD_lr1e-1_B256_ep300 1 ZS l2 RN50
(c) Test domain generalization:
bash scripts/eval.sh 0 imagenet imagenetv2 SGD_lr1e-1_B256_ep300 1 ZS l2 RN50
This repository is mainly based on CoOp and TaskRes code base. We sincerely thank prior authors on this topic for his awesome code base.
If you find this repository useful, please consider citing this paper:
@inproceedings{clap24,
title={A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models},
author={Julio Silva-Rodr\'iguez and Sina Hajimiri and Ismail Ben Ayed and Jose Dolz},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}