Pytorch code (lite version) of Finetuning Pretrained Vision-Language Models with Correlation Information Bottleneck for Robust Visual Question Answering . [Slide]
This implementation is based on LXMERT. Thanks for their pioneering work.
pip install -r requirements.txt
Please see data/README.md for generation or download from here.
├── data
│ ├── cv_vqa
│ │ ├── edited_targets.json
│ │ └── original_targets.json
│ ├── iv_vqa
│ │ ├── edited_targets.json
│ │ └── original_targets.json
│ ├── lxmert
│ │ └── all_ans.json
│ ├── vqa_ce
│ │ ├── all_targets.json
│ │ ├── counterexample_targets.json
│ │ ├── easy_targets.json
│ │ └── hard_targets.json
│ ├── vqa_p2
│ │ ├── vqa_p2_original_targets.json
│ │ └── vqa_p2_targets.json
│ ├── vqa_rep
│ │ └── val2014_humans_vqa-rephrasings_targets.json
│ └── vqav2
│ ├── img_id_wh
│ │ ├── test.json
│ │ ├── train.json
│ │ ├── trainval.json
│ │ └── val.json
│ ├── minival.json
│ ├── nominival.json
│ ├── test.json
│ ├── train.json
│ ├── trainval_ans2label.json
│ └── trainval_label2ans.json
bash train.sh
If you find our work useful in your research, please consider citing:
@article{jiang2022finetuning,
title={Finetuning Pretrained Vision-Language Models with Correlation Information Bottleneck for Robust Visual Question Answering},
author={Jiang, Jingjing and Liu, Ziyi and Zheng, Nanning},
journal={arXiv preprint arXiv:2209.06954},
year={2022}
}