This repository provides the code for our paper CLID (WACV 2024).
conda create --name clid python=3.7
conda activate clid
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=10.1 -c pytorch
pip install h5py tqdm transformers==2.1.1 tensorboardX yacs
pip install git+https://github.com/salaniz/pycocoevalcap
-
Prepare MS-COCO data following this link.
-
Prepare MS-COCO trusted and noisy annotation files, as in these files (link).
-
Download the pretrained BERT model (link).
First update config_denoise.py with the correct data and pretrained model paths.
The hierarchy above $ _C.data_dir $ should contain two folders. The first is region_feat_gvd_wo_bgd (downloaded in the previous section), containing the visual features. The second is the annotation folder (name defined by $ _C.data_dir $), containing the annotations.
|── Data parent folder
| |── region_feat_gvd_wo_bgd (containing feature files)
| |── $ _C.data_dir $ (containing annotations)
Note that we use a yacs configuration file, hence you may modify the config file or use command line arguments. A command line argument overrides the config file's one.
Train
python train_denoise.py save_dir checkpoints_noisy samples_per_gpu 64
Continue from checkpoint
python train_denoise.py model_path <path_to_.pth> save_dir checkpoints_noisy samples_per_gpu 64
Inference & evaluate
You shall set the path to the test captions in gt_caption. it is the path to a file named id2captions_test.json, which will be created and located in the folder of the annotation files. Hence, set it to $ _C.data_dir $/id2captions_test.json.
In model_path, set the model you desire to infer.
python infer_and_eval.py \
--gt_caption <path_to_'id2captions_test.json'> \
--pd_caption output/caption_results.json \
--save_dir output \
model_path <path_to_.pth> \
save_dir output \
samples_per_gpu 64
Please consider citing our paper if the project helps your research.
@inproceedings{hirsch2024clid,
title={CLID: Controlled-Length Image Descriptions with Limited Data},
author={Hirsch, Elad and Tal, Ayellet},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={5531--5541},
year={2024}
}
We thank the authors of LaBERT, both for their research and for sharing their code. Our repository is built upon their environment.