Pytorch Implementation of winner from VQA Chllange Workshop in CVPR'17
Switch branches/tags
Nothing to show
Clone or download
Latest commit 9ac023a Sep 18, 2017
Type Name Latest commit message Commit time
Failed to load latest commit information.
data fix pickle dump name, doesn't really matter Sep 18, 2017 add tokenized data link Sep 18, 2017 add comment Sep 18, 2017 fix eval load data bug Sep 18, 2017 should be good Sep 15, 2017

2017 VQA Challenge Winner (CVPR'17 Workshop)

Pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge by Teney et al.

Model architecture




  • For questions and answers, go to data/ folder and execute directly.
  • You'll need to install the Stanford Tokenizer, follow the instructions in their page.
  • The tokenizing step may take up to 36 hrs to process the training questions (I have a Xeon E5 CPU already), write a pure java code to tokenize them should be a lot faster. (Since python nltk will call the java binding, and python is slow)
  • For image feature, slightly modify this code to convert tsv to a npy file coco_features.npy that contains a list of dictionaries with key being image id and value being the feature (shape: 36, 2048).
  • Download and extract GloVe to data/ folder as well.
  • Now we should be able to train, reassure that the data/ folder should now contain at least:
    - glove.6B.300d.txt
    - vqa_train_final.json
    - coco_features.npy
    - train_q_dict.p
    - train_a_dict.p
  • (Update) For convenience, here is the link to tokenized questions vqa_train_toked.json and vqa_val_toked.json, make sure you run data/ to generate vqa_train_final.json, train_q_dict.p, etc.


Use default parameters:

python --train

Train from a previous checkpoint:

python --train --modelpath=/path/to/saved.pth.tar

Check out tunable parameters:



python --eval

This will generate result.json (validation set only), format is referred to vqa evaluation format.


  • The default classifier is softmax classifier, sigmoid multi-label classifier is also implemented but I can't train based on that.
  • Training for 50 epochs reach around 64.42% training accuracy.
  • For the output classifier, I did not use the pretrained weight since it's hard to retrieve so I followed eq. 5 in the paper.
  • To prepare validation data you need to uncomment some line of code in data/
  • coco_features.npy is a really fat file (34GB including train+val image features), you can split it and modify the data loading mechanisms in
  • This code is tested with train = train and eval = val, no test data included.
  • Issues are welcome!