Passage Retrieval for Outside-Knowledge Visual Question Answering

This repository contains code and data for our paper Passage Retrieval for Outside-Knowledge Visual Question Answering

Data and checkpoints

Our data is based on the OK-VQA dataset.

First download all OK-VQA files
Then download the collecton file (all_blocks.txt)
Finally, download other files here.
- passage_id_to_line_id.json: map passages ids to line ids in all_blocks.txt
- data: train/val/test split and a small validation collection
- okvqa.datasets: pre-extracted image features with this script
- (Optional) checkpoint: our model checkpoint. No need to download if you want to train your own model

Sample commands

Training, and evaluating on the validation set with the small validation collection

python -u -m torch.distributed.launch --nproc_per_node 4 train_retriever.py \
    --input_file=DATA_DIR/data/{}_pairs_cap_combine_sum.txt \
    --image_features_path=DATA_DIR/okvqa.datasets \
    --output_dir=OUTPUT_DIR \
    --ann_file=DATA_DIR/mscoco_val2014_annotations.json \
    --ques_file=DATA_DIR/OpenEnded_mscoco_val2014_questions.json \
    --passage_id_to_line_id_file=DATA_DIR/passage_id_to_line_id.json \
    --all_blocks_file=DATA_DIR/all_blocks.txt \
    --gen_passage_rep=False \
    --gen_passage_rep_input=DATA_DIR/val2014_blocks_cap_combine_sum.txt \
    --cache_dir=HUGGINGFACE_CACHE_DIR (optional) \
    --gen_passage_rep_output="" \
    --retrieve_checkpoint="" \
    --collection_reps_path="" \
    --val_data_sub_type=val2014 \
    --do_train=True \
    --do_eval=True \
    --do_eval_pairs=True \
    --per_gpu_train_batch_size=4 \
    --per_gpu_eval_batch_size=10 \
    --learning_rate=1e-5 \
    --num_train_epochs=2.0 \
    --logging_steps=5 \
    --save_steps=5000 \
    --eval_all_checkpoints=True \
    --overwrite_output_dir=False \
    --fp16=True \
    --load_small=False \
    --num_workers=4 \
    --query_encoder_type=lxmert \
    --query_model_name_or_path="unc-nlp/lxmert-base-uncased" \
    --lxmert_rep_type="{\"pooled_output\":\"none\"}" \
    --proj_size=768 \
    --neg_type=other_pos+all_neg

If experiencing OOM during evaluation, the evaluation process can be restarted with the same command but set --do_train=False and --overwrite_output_dir=True.

Generating representations for all passages in the collection

python -u train_retriever.py \
    --input_file=DATA_DIR/data/{}_pairs_cap_combine_sum.txt \
    --image_features_path=DATA_DIR/okvqa.datasets \
    --output_dir=OUTPUT_DIR \
    --ann_file=DATA_DIR/mscoco_val2014_annotations.json \
    --ques_file=DATA_DIR/OpenEnded_mscoco_val2014_questions.json \
    --passage_id_to_line_id_file=DATA_DIR/passage_id_to_line_id.json \
    --all_blocks_file=DATA_DIR/all_blocks.txt \
    --gen_passage_rep=True \
    --gen_passage_rep_input=DATA_DIR/all_blocks.txt (or a split of this file) \
    --cache_dir=HUGGINGFACE_CACHE_DIR (optional) \
    --gen_passage_rep_output=OUTPUT_DIR_OF_PASSAGE_REPS \
    --retrieve_checkpoint=DIR_TO_YOUR_BEST_CHECKPOINT \
    --collection_reps_path="" \
    --val_data_sub_type=test2014 (doesn't matter here) \
    --do_train=False \
    --do_eval=False \
    --do_eval_pairs=False \
    --per_gpu_train_batch_size=4 \
    --per_gpu_eval_batch_size=300 \
    --learning_rate=1e-5 \
    --num_train_epochs=2.0 \
    --logging_steps=5 \
    --save_steps=5000 \
    --eval_all_checkpoints=True \
    --overwrite_output_dir=True \
    --fp16=True \
    --load_small=False \
    --num_workers=8 \
    --query_encoder_type=lxmert \
    --query_model_name_or_path=unc-nlp/lxmert-base-uncased \
    --lxmert_rep_type="{\"pooled_output\":\"none\"}" \
    --proj_size=768 \
    --neg_type=other_pos+all_neg

Evaluating on the test set with the whole passage collection

python -u train_retriever.py \
    --input_file=DATA_DIR/data/{}_pairs_cap_combine_sum.txt \
    --image_features_path=DATA_DIR/okvqa.datasets \
    --output_dir=OUTPUT_DIR \
    --ann_file=DATA_DIR/mscoco_val2014_annotations.json \
    --ques_file=DATA_DIR/OpenEnded_mscoco_val2014_questions.json \
    --passage_id_to_line_id_file=DATA_DIR/passage_id_to_line_id.json \
    --all_blocks_file=DATA_DIR/all_blocks.txt \
    --gen_passage_rep=False \
    --gen_passage_rep_input=DATA_DIR/val2014_blocks_cap_combine_sum.txt \
    --cache_dir=HUGGINGFACE_CACHE_DIR (optional) \
    --gen_passage_rep_output="" \
    --retrieve_checkpoint=DIR_TO_YOUR_BEST_CHECKPOINT \
    --collection_reps_path=DIR_OF_PASSAGE_REPS \
    --val_data_sub_type=test2014 \
    --do_train=False \
    --do_eval=True \
    --do_eval_pairs=False \
    --per_gpu_train_batch_size=4 \
    --per_gpu_eval_batch_size=10 \
    --learning_rate=1e-5 \
    --num_train_epochs=2.0 \
    --logging_steps=5 \
    --save_steps=5000 \
    --eval_all_checkpoints=True \
    --overwrite_output_dir=True \
    --fp16=True \
    --load_small=False \
    --num_workers=1 \
    --query_encoder_type=lxmert \
    --query_model_name_or_path="unc-nlp/lxmert-base-uncased" \
    --lxmert_rep_type="{\"pooled_output\":\"none\"}" \
    --proj_size=768 \
    --neg_type=other_pos+all_neg

Environment

Versions info available in env.yml

conda create --name okvqa python=3.8
conda install faiss-gpu cudatoolkit=10.1 -c pytorch
conda install pytorch torchvision torchaudio cudatoolkit=10.1 -c pytorch

git clone https://github.com/NVIDIA/apex
cd apex
export TORCH_CUDA_ARCH_LIST="5.2;6.1;7.5" (optional)
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .

conda install -c conda-forge tensorboard
pip install datasets
pip install pytrec_eval
conda install scikit-image

To generate image features, we need opencv, which doesn't work with python 3.8 at the moment. So image features are generated with an environment with python 3.7 (no need to do this if you have downloaded the features we extracted linked above). Use command conda install -c conda-forge opencv to install opencv.

Acknowledgement

Our training data is based on OK-VQA. We thank the OK-VQA authors for creating and releasing this useful resource.
vqa_tools.py is built on VQA. We thank the VQA authors for releasing their code.
coco_tools.py is built on cocoapi. We thank the cocoapi authors for releasing their code.

See copyright information in LICENSE.

Citation

@inproceedings{prokvqa, title={{Passage Retrieval for Outside-Knowledge Visual Question Answering}}, author={Chen Qu and and Hamed Zamani and Liu Yang and W. Bruce Croft and Erik Learned-Miller}, booktitle={SIGIR}, year={2021} }

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
coco_tools.py		coco_tools.py
env.yml		env.yml
modeling.py		modeling.py
retriever_utils.py		retriever_utils.py
train_retriever.py		train_retriever.py
vqa_tools.py		vqa_tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Passage Retrieval for Outside-Knowledge Visual Question Answering

Data and checkpoints

Sample commands

Training, and evaluating on the validation set with the small validation collection

Generating representations for all passages in the collection

Evaluating on the test set with the whole passage collection

Environment

Acknowledgement

Citation

About

Releases

Packages

Languages

License

prdwb/okvqa-release

Folders and files

Latest commit

History

Repository files navigation

Passage Retrieval for Outside-Knowledge Visual Question Answering

Data and checkpoints

Sample commands

Training, and evaluating on the validation set with the small validation collection

Generating representations for all passages in the collection

Evaluating on the test set with the whole passage collection

Environment

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages