Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts

This repo contains the code as well as auxiliary data for our ICCV-W paper 'Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts'

The GPT-4 generated dataset is available in the folder gpt4_data as {dataset_name}.pt files.

How to Install

This code is built on top of the awesome toolbox Dassl.pytorch so you need to install the dassl environment first. Simply follow the instructions described here to install dassl as well as PyTorch. After that, run pip install -r requirements.txt under CoOp/ to install a few more packages required by CLIP (this should be done when dassl is activated). Then, you are ready to go.

Follow DATASETS.md to install the datasets.

How to Run

For the ZS experiments use the following script.

bash scripts/clip/main_gpt.sh cub vit_b16_c16_ep10_batch1 all zs_gpt_v

Arguments are dataset_name, encoder_config, class-sampling(base, new or all classes), exp_name.

For the few-shot experiments use the following script

bash scripts/clip_adapter/main_gpt.sh cub vit_b16_c16_ep10_batch1 end 16 16 False base self_attn 0.2 self0.2_b2n_3-5

Arguments are dataset_name, encoder_config, coop1,coop2, n_shot, coop3, class-sampling(base, new or all classes), adapter-type, residual-ratio, exp_name. Arguments coop1-3 are from older code and not used in clip_adapter_gpt.py

main.sh is the script for running default clip adapter.

Please refer to b2n_adapters.sh for the scripts for all shots and all datasets (with tuned residual ratio) for CLIP-A-self in the base 2 new setting.

Residual ratio has to be tuned for each dataset/shot setting

Citation

If you use this code in your research, please kindly cite our paper.

@article{maniparambil2023enhancing,
  title={Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts},
  author={Maniparambil, Mayug and Vorster, Chris and Molloy, Derek and Murphy, Noel and McGuinness, Kevin and O'Connor, Noel E},
  journal={arXiv preprint arXiv:2307.11661},
  year={2023}
}

This code-base is built on top of CoOP and CLIP-Adapter.

Contact me at mayugmaniparambil@gmail.com for any issues, discussions or collaborations.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
clip		clip
configs		configs
cub_split/split_v2		cub_split/split_v2
datasets		datasets
gpt4_data		gpt4_data
lpclip		lpclip
scripts		scripts
trainers		trainers
DATASETS.md		DATASETS.md
LICENSE		LICENSE
README.md		README.md
b2n_adapters.sh		b2n_adapters.sh
draw_curves.py		draw_curves.py
interpret_prompt.py		interpret_prompt.py
parse_test_res.py		parse_test_res.py
requirements.txt		requirements.txt
run.sh		run.sh
train.py		train.py
zero_shot_gpt.sh		zero_shot_gpt.sh

License

mayug/VDT-Adapter

Folders and files

Latest commit

History

Repository files navigation

Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts

How to Install

How to Run

Citation

About

Resources

License

Stars

Watchers

Forks

Languages