GitHub - guilk/KAT: Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"

KAT: A Knowledge Augmented Transformer for Vision-and-Language

Introduction

Can multimodal transformers leverage explicit knowledge in their reasoning?

Existing, primarily unimodal, methods have explored approaches under the paradigm of knowledge retrieval followed by answer prediction, but leave open questions about the quality and relevance of the retrieved knowledge used, and how the reasoning processes over implicit and explicit knowledge should be integrated.

To address these challenges, we propose a - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result on the open-domain multimodal task of OK-VQA. Our approach integrates implicit and explicit knowledge in an encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation. Additionally, explicit knowledge integration improves interpretability of model predictions in our analysis.

Install

pip install -r requirements.txt
pip install -e .

Pre-processed Data

We provide pre-processed data, pre-extracted explicit/implicit knowledge here. We build a entity database based on Wikidata. Here is one tutorial about how to write Wikidata queries.

Pre-trained models

Model	Description	Accuracy	Download
`base_both_knowledge`	base size, both implicit and explicit knowledge	50.58	base_both_knowledge.zip
`large_explicit_only`	large size, explicit only	44.25	large_explicit_only.zip
`large_both_knowledge`	large size, both implicit and explicit knowledge	53.09	large_both_knowledge.zip

Train

You can specify --model_size with large or base. --use_gpt means if you use implicit knowledge or not.

python -m torch.distributed.launch --nproc_per_node=16 train_KAT.py \
  --train_data /mnt/root/knowledge_reasoning/okvqa/train2014 \
  --eval_data /mnt/root/knowledge_reasoning/okvqa/val2014 \
  --model_size large \
  --lr 0.00003 \
  --optim adamw \
  --scheduler linear \
  --weight_decay 0.01 \
  --text_maxlength 64 \
  --per_gpu_batch_size 1 \
  --n_context 40 \
  --total_step 8000 \
  --warmup_step 1000 \
  --name check_kat \
  --checkpoint_dir /mnt/root/checkpoint \
  --accumulation_steps 1 \
  --use_gpt

TEST

python -m torch.distributed.launch --nproc_per_node=1 evaluate_KAT.py \
  --train_data /mnt/root/knowledge_reasoning/okvqa/train2014 \
  --eval_data /mnt/root/knowledge_reasoning/okvqa/val2014 \
  --model_size base \
  --text_maxlength 64 \
  --per_gpu_batch_size 8 \
  --n_context 40 \
  --model_path /mnt/root/okvqa_best_models/base_w_gpt3_best_5058 \
  --use_gpt

References

KAT: A Knowledge Augmented Transformer for Vision-and-Language

@inproceedings{gui2021kat,
  title={KAT: A Knowledge Augmented Transformer for Vision-and-Language},
  author={Gui, Liangke and Wang, Borui and Huang, Qiuyuan and Hauptmann, Alex and Bisk, Yonatan and Gao, Jianfeng},
  booktitle={NAACL},
  year={2022}
}

Acknowledgements

Our code is built on FiD which is under the LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
README.md		README.md
build_wikidata_entity_db.py		build_wikidata_entity_db.py
evaluate_KAT.py		evaluate_KAT.py
index.py		index.py
requirements.txt		requirements.txt
setup.py		setup.py
train_KAT.py		train_KAT.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KAT: A Knowledge Augmented Transformer for Vision-and-Language

Introduction

Install

Pre-processed Data

Pre-trained models

Train

TEST

References

Acknowledgements

About

Releases 1

Packages

Languages

guilk/KAT

Folders and files

Latest commit

History

Repository files navigation

KAT: A Knowledge Augmented Transformer for Vision-and-Language

Introduction

Install

Pre-processed Data

Pre-trained models

Train

TEST

References

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages