Skip to content
PyTorch Implementation for EMNLP'19 "Dual Attention Networks for Visual Reference Resolution in Visual Dialog"
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
checkpoints Initial commit Aug 19, 2019
data Delete test_imgid2idx.pkl Sep 1, 2019
decoders Initial commit Aug 19, 2019
encoders
utils Initial commit Aug 19, 2019
.gitignore Initial commit Aug 19, 2019
LICENSE Initial commit Aug 19, 2019
README.md Update README.md Nov 4, 2019
dan_overview.jpg Initial commit Aug 19, 2019
dataloader.py Initial commit Aug 19, 2019
evaluate.py Initial commit Aug 19, 2019
requirements.txt Bump pillow from 6.0.0 to 6.2.0 Oct 23, 2019
train.py Initial commit Aug 19, 2019

README.md

DAN-VisDial

Pytorch Implementation for the paper:

Dual Attention Networks for Visual Reference Resolution in Visual Dialog
Gi-Cheon Kang, Jaeseo Lim, and Byoung-Tak Zhang
In EMNLP 2019

If you use this code in your published research, please consider citing:

@inproceedings{kang2019dual,
  title={Dual Attention Networks for Visual Reference Resolution in Visual Dialog},
  author={Kang, Gi-Cheon and Lim, Jaeseo and Zhang, Byoung-Tak},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
  pages = {2024--2033},
  year={2019}
}

Setup and Dependencies

This starter code is implemented using PyTorch v0.3.1 with CUDA 8 and CuDNN 7.
It is recommended to set up this source code using Anaconda or Miniconda.

  1. Install Anaconda or Miniconda distribution based on Python 3.6+ from their downloads' site.
  2. Clone this repository and create an environment:
git clone https://github.com/gicheonkang/DAN-VisDial
conda create -n dan_visdial python=3.6

# activate the environment and install all dependencies
conda activate dan_visdial
cd DAN-VisDial/
pip install -r requirements.txt

Download Features

  1. We used the Faster-RCNN pre-trained with Visual Genome as image features. Download the image features below, and put each feature under $PROJECT_ROOT/data/{SPLIT_NAME}_feature directory. We need image_id to RCNN bounding box index file ({SPLIT_NAME}_imgid2idx.pkl) because the number of bounding box per image is not fixed (ranging from 10 to 100).
  1. Download the GloVe pretrained word vectors from here, and keep glove.6B.300d.txt under $PROJECT_ROOT/data/glove directory.

Data preprocessing & Word embedding initialization

# data preprocessing
cd DAN-VisDial/data/
python prepro.py

# Word embedding vector initialization (GloVe)
cd ../utils
python utils.py

Training

Simple run

python train.py 

Saving model checkpoints

By default, our model save model checkpoints at every epoch. You can change it by using -save_step option.

Logging

Logging data checkpoints/start/time/log.txt shows epoch, loss, and learning rate.

Evaluation

Evaluation of a trained model checkpoint can be evaluated as follows:

python evaluate.py -load_path /path/to/.pth -split val

Validation scores can be checked in offline setting. But if you want to check the test split score, you have to submit a json file to online evaluation server. You can make json format with -save_ranks=True option.

Pre-trained model & Results

We provide the pre-trained model reported as the best single model in the paper.
To reproduce the results reported in the paper, please run the command below and submit the json file to online evaluation server.

python evaluate.py -load_path /path/to/dan_disc_epoch_12.pth -split test -use_gt False -save_ranks True

Performance on v1.0 test-std (trained on v1.0 train):

Model NDCG MRR R@1 R@5 R@10 Mean
DAN 0.5759 0.6320 49.63 79.75 89.35 4.30

License

MIT License

You can’t perform that action at this time.