GIST: Generating Image Specific Text

Code for ArXiv paper "GIST: Generating Image-Specific Text for Fine-grained Object Classification"

Setting up Environment

You can set up a conda environment with required packages for GIST as follows:

conda create -n gist python=3.9.16
conda activate gist
conda install --file requirements.txt

Setting up Datasets

All datasets are located in the datasets/ directory. Please download images for each dataset into an images/ directory in each dataset folder. The download links are below:

aircraft: https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/archives/fgvc-aircraft-2013b.tar.gz
CUB: https://data.caltech.edu/records/20098
flower: https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz
fitzpatrick: https://github.com/mattgroh/fitzpatrick17k

We provide example 5, 3, and 1 shot metadata files for each dataset as well. The Fitzpatrick40 metadata file (datasets/fitzpatrick40/metadata.csv) can be used to create our cleaned Fitzpatrick40 dataset from the original Fitzpatrick17k dataset.

Caption Matching

We provide json files of our top-5 matched captions for training images of the aircraft, CUB, flower, and fitzpatrick datasets. We additionally provide an example script for how to match captions run_caption_matching_example.sh in case you want to try matching captions for a novel dataset.

Fine-Grained Classification

We provide scripts to run fine-grained all, 5, 3, and 1 shot classification on each of the aircraft, CUB, flower, and fitzpatrick datasets. As an example, if a user wants to run 5 shot fine-grained classification on the aircraft dataset, they would first run contrastive fine-tuning

CUDA_VISIBLE_DEVICES=0 python contrastive_training_clip.py --metadata datasets/aircraft/metadata_5shot.csv --captions_file datasets/aircraft/captions_top5.json --num_captions 4 --image_folder datasets/aircraft/images/ --output_file aircraft_5shot

Once the contrastive fine-tuning is run, you can create clip embeddings for images using a fine-tuned network as follows

CUDA_VISIBLE_DEVICES=0 python create_clip_embeddings.py --metadata datasets/aircraft/metadata_5shot.csv --clip_weights aircraft_5shot_epoch_39.pt --image_folder datasets/aircraft/images/ --output_file datasets/aircraft/aircraft_5shot_embeddings.pkl

Finally, we can learn a linear image probe for the fine-tuned CLIP network as follows

CUDA_VISIBLE_DEVICES=0 python linear_probe.py --image_embedding_file datasets/aircraft/aircraft_5shot_embeddings.pkl --metadata datasets/aircraft/metadata_5shot.csv

Citation

If you find this repository helpful, please cite our paper:

@article{LewisMu2023,
  title={GIST: Generating Image-Specific Text for Fine-grained Object Classification},
  author={Kathleen M Lewis* and Emily Mu* and Adrian V Dalca and John Guttag},
  journal={arXiv preprint arXiv:2307.11315},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
__pycache__		__pycache__
datasets		datasets
LICENSE		LICENSE
README.md		README.md
caption_matching.py		caption_matching.py
clip_featurizer.py		clip_featurizer.py
contrastive_training_clip.py		contrastive_training_clip.py
create_clip_embeddings.py		create_clip_embeddings.py
linear_probe.py		linear_probe.py
requirements.txt		requirements.txt
run_caption_matching_example.sh		run_caption_matching_example.sh
run_contrastive_training_clip.sh		run_contrastive_training_clip.sh
run_create_clip_embeddings.sh		run_create_clip_embeddings.sh
run_linear_probe.sh		run_linear_probe.sh

License

emu1729/GIST

Folders and files

Latest commit

History

Repository files navigation

GIST: Generating Image Specific Text

Setting up Environment

Setting up Datasets

Caption Matching

Fine-Grained Classification

Citation

About

Resources

License

Stars

Watchers

Forks

Languages