SGLKT-VisDial

Pytorch Implementation for the paper:

Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer
Gi-Cheon Kang, Junseok Park, Hwaran Lee, Byoung-Tak Zhang^*, and Jin-Hwa Kim^* (* corresponding authors)
In EMNLP 2021 Findings

Setup and Dependencies

This code is implemented using PyTorch v1.7, and provides out of the box support with CUDA 11 and CuDNN 8. Anaconda/Miniconda is the recommended to set up this codebase:

Install Anaconda or Miniconda distribution based on Python3+ from their downloads' site.
Clone this repository and create an environment:

git clone https://www.github.com/gicheonkang/sglkt-visdial
conda create -n sglkt python=3.8

# activate the environment and install all dependencies
conda activate sglkt
cd sglkt-visdial/
pip install -r requirements.txt

# install this codebase as a package in development version
python setup.py develop

Download Data

We used the Faster-RCNN pre-trained with Visual Genome as image features. Download the image features below, and put each feature under $PROJECT_ROOT/data/{SPLIT_NAME}_feature directory. We need image_id to RCNN bounding box index file ({SPLIT_NAME}_imgid2idx.pkl) because the number of bounding box per image is not fixed (ranging from 10 to 100).

train_btmup_f.hdf5: Bottom-up features of 10 to 100 proposals from images of train split (32GB).
val_btmup_f.hdf5: Bottom-up features of 10 to 100 proposals from images of validation split (0.5GB).
test_btmup_f.hdf5: Bottom-up features of 10 to 100 proposals from images of test split (2GB).

Download the pre-trained, pre-processed word vectors from here (glove840b_init_300d.npy), and keep them under $PROJECT_ROOT/data/ directory. You can manually extract the vectors by executing data/init_glove.py.
Download visual dialog dataset from here (visdial_1.0_train.json, visdial_1.0_val.json, visdial_1.0_test.json, and visdial_1.0_val_dense_annotations.json) under $PROJECT_ROOT/data/ directory.
Download the additional data for Sparse Graph Learning and Knowledge Transfer under $PROJECT_ROOT/data/ directory.

visdial_1.0_train_coref_structure.json: structural supervision for train split.
visdial_1.0_val_coref_structure.json: structural supervision for val split.
visdial_1.0_test_coref_structure.json: structural supervision for test split.
visdial_1.0_train_dense_labels.json: pseudo labels for knowledge transfer.
visdial_1.0_word_counts_train.json: word counts for train split.

Training

Train the model provided in this repository as:

python train.py --gpu-ids 0 1 # provide more ids for multi-GPU execution other args...

Saving model checkpoints

This script will save model checkpoints at every epoch as per path specified by --save-dirpath. Default path is $PROJECT_ROOT/checkpoints.

Evaluation

Evaluation of a trained model checkpoint can be done as follows:

python evaluate.py --load-pthpath /path/to/checkpoint.pth --split val --gpu-ids 0 1

Validation scores can be checked in offline setting. But if you want to check the test split score, you have to submit a json file to EvalAI online evaluation server. You can make json format with --save_ranks True option.

Pre-trained model & Results

We provide the pre-trained models for SGL+KT and SGL.
To reproduce the results reported in the paper, please run the command below.

python evaluate.py --load-pthpath SGL+KT.pth --split test --gpu-ids 0 1 --save-ranks True

Performance on v1.0 test-std (trained on v1.0 train):

Model	Overall	NDCG	MRR	R@1	R@5	R@10	Mean
SGL+KT	65.31	72.60	58.01	46.20	71.01	83.20	5.85

Citation

If you use this code in your published research, please consider citing:

@article{kang2021reasoning,
  title={Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer},
  author={Kang, Gi-Cheon and Park, Junseok and Lee, Hwaran and Zhang, Byoung-Tak and Kim, Jin-Hwa},
  journal={arXiv preprint arXiv:2004.06698},
  year={2021}
}

License

MIT License

Acknowledgements

We use Visual Dialog Challenge Starter Code and MCAN-VQA as reference code.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
data		data
visdialch		visdialch
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt
setup.py		setup.py
sgl_overview.png		sgl_overview.png
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

data

data

visdialch

visdialch

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

evaluate.py

evaluate.py

requirements.txt

requirements.txt

setup.py

setup.py

sgl_overview.png

sgl_overview.png

train.py

train.py

Repository files navigation

SGLKT-VisDial

Setup and Dependencies

Download Data

Training

Saving model checkpoints

Evaluation

Pre-trained model & Results

Citation

License

Acknowledgements

About

Releases

Packages

Languages

License

gicheonkang/sglkt-visdial

Folders and files

Latest commit

History

Repository files navigation

SGLKT-VisDial

Setup and Dependencies

Download Data

Training

Saving model checkpoints

Evaluation

Pre-trained model & Results

Citation

License

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages