A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA

Official implementation for the MM'22 paper.

There are two config files in cfgs, individually for the OK-VQA and FVQA datasets. Note that we mainly test our method on the OK-VQA dataset.

Prerequisites

python==3.7
pytorch==1.10.0

Dataset

First of all, make sure all the data in the right position according to the config file settings.

Please download the OK-VQA dataset from the link of the original paper.
The image features can be found at the LXMERT (If you need only the ViLT model, then skip these features and only download the mscoco images.).

Pre-processing:

The last step is optional for LXMERT and VisualBert only.

Process answers:
```
python tools/answer_parse_okvqa.py 
```
Extract knowledge base with Roberta:
```
python tools/kb_parse.py
```

Convert image features to h5 (optional):

python tools/detection_features_converter.py

Model Training:

python main.py --name unifer --gpu 0

Model Evaluation:

python main.py --name unifer --test-only

Citation:

If you found this repo helpful, please consider cite the following paper 👍 :

@inproceedings{unifer,
  author    = {Yangyang Guo and Liqiang Nie and Yongkang Wong and Yibing Liu and Zhiyong Cheng and Mohan S. Kankanhalli},
  title     = {A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA},
  booktitle = {ACM Multimedia Conference},
  publisher = {ACM},
  year      = {2022}}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cfgs		cfgs
data/img_ids		data/img_ids
imgs		imgs
modules		modules
tools		tools
utils		utils
vqa_eval		vqa_eval
.gitignore		.gitignore
README.md		README.md
main.py		main.py
param.py		param.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cfgs

cfgs

data/img_ids

data/img_ids

imgs

imgs

modules

modules

tools

tools

utils

utils

vqa_eval

vqa_eval

.gitignore

.gitignore

README.md

README.md

main.py

main.py

param.py

param.py

train.py

train.py

Repository files navigation

A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA

Prerequisites

Dataset

Pre-processing:

Model Training:

Model Evaluation:

Citation:

About

Releases

Packages

Languages

guoyang9/UnifER

Folders and files

Latest commit

History

Repository files navigation

A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA

Prerequisites

Dataset

Pre-processing:

Model Training:

Model Evaluation:

Citation:

About

Topics

Resources

Stars

Watchers

Forks

Languages