Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Hadamard Product for Low-rank Bilinear Pooling

Multimodal Low-rank Bilinear Attention Networks (MLB) have an efficient attention mechanism by low-rank bilinear pooling for visual question-answering tasks. MLB achieves a new state-of-the-art performance, having a better parsimonious property than previous methods.

This current code can get 65.07 on Open-Ended and 68.89 on Multiple-Choice on test-standard split for the VQA dataset. For an ensemble model, 66.89 and 70.29, resepectively.

Dependencies

You can install the dependencies:

luarocks install rnn

Training

Please follow the instruction from VQA_LSTM_CNN for preprocessing. --split 2 option allows to use train+val set to train, and test-dev or test-standard set to evaluate. Set --num_ans to 2000 to reproduce the result.

For question features, you need to use this:

for image features,

$ th prepro_res.lua -input_json data_train-val_test-dev_2k/data_prepro.json -image_root path_to_image_root -cnn_model path to cnn_model

The pretrained ResNet-152 model and related scripts can be found in fb.resnet.torch.

$ th train.lua

With the default parameter, this will take around 2.6 days on a sinlge NVIDIA Titan X GPU, and will generate the model under model/. For the result of the paper, use -seconds option for answer sampling in Section 5. seconds.json file can be optained using prepro_seconds.lua or from here (updated as default).

Evaluation

$ th eval.lua

References

If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:

@inproceedings{Kim2017,
author = {Kim, Jin-Hwa and On, Kyoung Woon and Lim, Woosang and Kim, Jeonghee and Ha, Jung-Woo and Zhang, Byoung-Tak},
booktitle = {The 5th International Conference on Learning Representations},
title = {{Hadamard Product for Low-rank Bilinear Pooling}},
year = {2017}
}

This code uses Torch7 rnn package and its TrimZero module for question embeddings. Notice that following papers:

@article{Leonard2015a,
author = {L{\'{e}}onard, Nicholas and Waghmare, Sagar and Wang, Yang and Kim, Jin-Hwa},
journal = {arXiv preprint arXiv:1511.07889},
title = {{rnn : Recurrent Library for Torch}},
year = {2015}
}
@inproceedings{Kim2016a,
author = {Kim, Jin-Hwa and Kim, Jeonghee and Ha, Jung-Woo and Zhang, Byoung-Tak},
booktitle = {Proceedings of KIIS Spring Conference},
isbn = {2093-4025},
number = {1},
pages = {165--166},
title = {{TrimZero: A Torch Recurrent Module for Efficient Natural Language Processing}},
volume = {26},
year = {2016}
}

License

BSD 3-Clause License

Patent (Pending)

METHOD AND SYSTEM FOR PROCESSING DATA USING ELEMENT-WISE MULTIPLICATION AND MULTIMODAL RESIDUAL LEARNING FOR VISUAL QUESTION-ANSWERING

About

Hadamard Product for Low-rank Bilinear Pooling

Topics

Resources

License

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.