Skip to content

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

License

Notifications You must be signed in to change notification settings

phellonchen/DMRM

Repository files navigation

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

Pytorch Implementation for the paper:

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog
Feilong Chen, Fandong Meng, Jiaming Xu, Peng Li, Bo Xu,and Jie Zhou
In AAAI 2020

Setup and Dependencies

This code is implemented using PyTorch v0.3.0 with CUDA 8 and CuDNN 7.
It is recommended to set up this source code using Anaconda or Miniconda.

  1. Install Anaconda or Miniconda distribution based on Python 3.6+ from their downloads' site.
  2. Clone this repository and create an environment:
git clone https://github.com/phellonchen/DMRM.git
conda create -n dmrm_visdial python=3.6

# activate the environment and install all dependencies
conda activate dmrm_visdial
cd $PROJECT_ROOT/
pip install -r requirements.txt

Download Features

  1. Download the VisDial dialog json files from here and keep it under $PROJECT_ROOT/data directory, for default arguments to work effectively.

  2. We used the Faster-RCNN pre-trained with Visual Genome as image features. Download the image features below, and put each feature under $PROJECT_ROOT/data directory.

  1. Download the GloVe pretrained word vectors from here, and keep glove.6B.300d.txt under $PROJECT_ROOT/data directory.

Data preprocessing & Word embedding initialization

# data preprocessing
cd $PROJECT_ROOT/script/
python prepro.py

# Word embedding vector initialization (GloVe)
cd $PROJECT_ROOT/script/
python create_glove.py

Training

Simple run

python main_v0.9.py or python main_v1.0.py 

Saving model checkpoints

Our model save model checkpoints at every epoch and undate the best one. You can change it by editing the train.py.

Logging

Logging data $PROJECT_ROOT/save_models/time/log.txt shows epoch, loss, and learning rate.

Evaluation

Evaluation of a trained model checkpoint can be evaluated as follows:

python eval_v0.9.py or python eval_v1.0.py

Results

Performance on v0.9 val-std (trained on v0.9 train):

Model MRR R@1 R@5 R@10 Mean
DMRM 55.96 46.20 66.02 72.43 13.15

Performance on v1.0 val-std (trained on v1.0 train):

Model MRR R@1 R@5 R@10 Mean
DMRM 50.16 40.15 60.02 67.21 15.19

If you find this repository useful, please consider citing our work:

@inproceedings{chen2020dmrm,
 title={DMRM: A dual-channel multi-hop reasoning model for visual dialog},
 author={Chen, Feilong and Meng, Fandong and Xu, Jiaming and Li, Peng and Xu, Bo and Zhou, Jie},
 booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
 volume={34},
 number={05},
 pages={7504--7511},
 year={2020}
}

License

MIT License

About

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages