Skip to content

emu1729/GIST

Repository files navigation

GIST: Generating Image Specific Text

Code for ArXiv paper "GIST: Generating Image-Specific Text for Fine-grained Object Classification"

Setting up Environment

You can set up a conda environment with required packages for GIST as follows:

conda create -n gist python=3.9.16
conda activate gist
conda install --file requirements.txt

Setting up Datasets

All datasets are located in the datasets/ directory. Please download images for each dataset into an images/ directory in each dataset folder. The download links are below:

We provide example 5, 3, and 1 shot metadata files for each dataset as well. The Fitzpatrick40 metadata file (datasets/fitzpatrick40/metadata.csv) can be used to create our cleaned Fitzpatrick40 dataset from the original Fitzpatrick17k dataset.

Caption Matching

We provide json files of our top-5 matched captions for training images of the aircraft, CUB, flower, and fitzpatrick datasets. We additionally provide an example script for how to match captions run_caption_matching_example.sh in case you want to try matching captions for a novel dataset.

Fine-Grained Classification

We provide scripts to run fine-grained all, 5, 3, and 1 shot classification on each of the aircraft, CUB, flower, and fitzpatrick datasets. As an example, if a user wants to run 5 shot fine-grained classification on the aircraft dataset, they would first run contrastive fine-tuning

CUDA_VISIBLE_DEVICES=0 python contrastive_training_clip.py --metadata datasets/aircraft/metadata_5shot.csv --captions_file datasets/aircraft/captions_top5.json --num_captions 4 --image_folder datasets/aircraft/images/ --output_file aircraft_5shot

Once the contrastive fine-tuning is run, you can create clip embeddings for images using a fine-tuned network as follows

CUDA_VISIBLE_DEVICES=0 python create_clip_embeddings.py --metadata datasets/aircraft/metadata_5shot.csv --clip_weights aircraft_5shot_epoch_39.pt --image_folder datasets/aircraft/images/ --output_file datasets/aircraft/aircraft_5shot_embeddings.pkl

Finally, we can learn a linear image probe for the fine-tuned CLIP network as follows

CUDA_VISIBLE_DEVICES=0 python linear_probe.py --image_embedding_file datasets/aircraft/aircraft_5shot_embeddings.pkl --metadata datasets/aircraft/metadata_5shot.csv

Citation

If you find this repository helpful, please cite our paper:

@article{LewisMu2023,
  title={GIST: Generating Image-Specific Text for Fine-grained Object Classification},
  author={Kathleen M Lewis* and Emily Mu* and Adrian V Dalca and John Guttag},
  journal={arXiv preprint arXiv:2307.11315},
  year={2023}
}

About

Generating Image Specific Text

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published