CLID: Controlled-Length Image Descriptions with Limited Data

This repository provides the code for our paper CLID (WACV 2024).

Installation

Python environment

conda create --name clid python=3.7
conda activate clid

conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=10.1 -c pytorch
pip install h5py tqdm transformers==2.1.1 tensorboardX yacs
pip install git+https://github.com/salaniz/pycocoevalcap

Data

Prepare MS-COCO data following this link.
Prepare MS-COCO trusted and noisy annotation files, as in these files (link).
Download the pretrained BERT model (link).

Running the code

First update config_denoise.py with the correct data and pretrained model paths.

The hierarchy above $ _C.data_dir $ should contain two folders. The first is region_feat_gvd_wo_bgd (downloaded in the previous section), containing the visual features. The second is the annotation folder (name defined by $ _C.data_dir $), containing the annotations.

Note that we use a yacs configuration file, hence you may modify the config file or use command line arguments. A command line argument overrides the config file's one.

Train

python train_denoise.py save_dir checkpoints_noisy samples_per_gpu 64

Continue from checkpoint

python train_denoise.py model_path <path_to_.pth> save_dir checkpoints_noisy samples_per_gpu 64

Inference & evaluate

You shall set the path to the test captions in gt_caption. it is the path to a file named id2captions_test.json, which will be created and located in the folder of the annotation files. Hence, set it to $ _C.data_dir $/id2captions_test.json.

In model_path, set the model you desire to infer.

python infer_and_eval.py \
--gt_caption <path_to_'id2captions_test.json'> \
--pd_caption output/caption_results.json \
--save_dir output \
model_path <path_to_.pth> \
save_dir output \
samples_per_gpu 64

Citing our work

Please consider citing our paper if the project helps your research.

@inproceedings{hirsch2024clid,
  title={CLID: Controlled-Length Image Descriptions with Limited Data},
  author={Hirsch, Elad and Tal, Ayellet},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={5531--5541},
  year={2024}
}

Acknowledgements

We thank the authors of LaBERT, both for their research and for sharing their code. Our repository is built upon their environment.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
others		others
utils		utils
README.md		README.md
config_denoise.py		config_denoise.py
dataset_denoise.py		dataset_denoise.py
evaluate.py		evaluate.py
infer_and_eval.py		infer_and_eval.py
inference.py		inference.py
measure_noise.py		measure_noise.py
modeling.py		modeling.py
train_denoise.py		train_denoise.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

others

others

utils

utils

README.md

README.md

config_denoise.py

config_denoise.py

dataset_denoise.py

dataset_denoise.py

evaluate.py

evaluate.py

infer_and_eval.py

infer_and_eval.py

inference.py

inference.py

measure_noise.py

measure_noise.py

modeling.py

modeling.py

train_denoise.py

train_denoise.py

Repository files navigation

CLID: Controlled-Length Image Descriptions with Limited Data

Installation

Python environment

Data

Running the code

Citing our work

Acknowledgements

About

Releases

Packages

Languages

Eladhi/CLID

Folders and files

Latest commit

History

Repository files navigation

CLID: Controlled-Length Image Descriptions with Limited Data

Installation

Python environment

Data

Running the code

Citing our work

Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages