Skip to content

guoyang9/UnifER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA

Official implementation for the MM'22 paper.

model structure

There are two config files in cfgs, individually for the OK-VQA and FVQA datasets. Note that we mainly test our method on the OK-VQA dataset.

Prerequisites

  • python==3.7
  • pytorch==1.10.0

Dataset

First of all, make sure all the data in the right position according to the config file settings.

  • Please download the OK-VQA dataset from the link of the original paper.
  • The image features can be found at the LXMERT (If you need only the ViLT model, then skip these features and only download the mscoco images.).

Pre-processing:

The last step is optional for LXMERT and VisualBert only.

  1. Process answers:

    python tools/answer_parse_okvqa.py 
  2. Extract knowledge base with Roberta:

    python tools/kb_parse.py
  3. Convert image features to h5 (optional):

    python tools/detection_features_converter.py 

Model Training:

python main.py --name unifer --gpu 0

Model Evaluation:

python main.py --name unifer --test-only

Citation:

If you found this repo helpful, please consider cite the following paper 👍 :

@inproceedings{unifer,
  author    = {Yangyang Guo and Liqiang Nie and Yongkang Wong and Yibing Liu and Zhiyong Cheng and Mohan S. Kankanhalli},
  title     = {A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA},
  booktitle = {ACM Multimedia Conference},
  publisher = {ACM},
  year      = {2022}}

About

Official implementation for the MM'22 paper.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages