Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
76 lines (49 sloc) 4.18 KB

Training the VQA model (torch)

Installation and set up

The code requires Torch

bash
$ curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-deps | bash
$ git clone https://github.com/torch/distro.git ~/torch --recursive
$ cd ~/torch; 
$ ./install.sh      # and enter "yes" at the end to modify your bashrc
$ source ~/.bashrc
$ luarocks install nn
$ luarocks install nngraph 
$ luarocks install image 
$ luarocks install cutorch
$ luarocks install cunn

Also install fblualib. Installation for fblualib.

This repo is over-riding some of the functions of the nn.LinearNB package and this may throw errors like this.

Files

  • Trained model

Download the trained iBOWIMG-2x model and update the appropriate paths.

  • Preprocessed text data

Download the processed data files for various combinations of the target question and unanswered questions. Folder names are according to the names of the unanswered question because the random target questions remain the same.

Unanswered Questions

As described in the paper, there are 2^3 combinations possible for target question-unanswered question pair. Consider an image x with a question q, a corresponding answer a and two additional unanswered questions q_1 and q_2. For iBOWIMG, the single training example corresponding to this image would be (x, q, a). For iBOWIMG-2x there would be eight training examples, with E = {null, q, q_1, q_2, [q,q_1], [q,q_2], [q_1,q_2], [q,q_1,q_2]} making use of the extra information that is available about this image during training in the form of unanswered asked questions.

Training Example unanswered Question Datafile link
null For the null unanswered question, use the null tensor in Torch.
q Unanswered question is same as the target question (same link as 1 Randomly chosen target question)
q_1 1 Randomly chosen unanswered question from the remaining 2 questions (hence, not the target question)
q_2 1 Randomly chosen unanswered question from the remaining 2 questions (hence, not the target question)-2
[q,q_1] Target question and 1 randomly chosen from the remaining 2 questions
[q,q_2] Target question and 1 randomly chosen from the remaining 2 questions-2
[q_1,q_2] unanswered 2 questions
[q,q_1,q_2] All 3 questions (includes the target question)

Target Question

1 Randomly chosen target question: Download here

All training files

Download the all training files here. These are needed to generate the vocabulary.

  • Image Features

Download the image features from AlexNet or GoogLeNet or ResNet in binary format and update the path. The zip files below provide the image features and the image list which maps the filename of the image to its corresponding features. Change the corresponding argument in the code to accept the image features.

Link Features from the _ model
coco_val2014_googlenetFCdense_feat.dat GoogLeNet model
coco_val2014_alexnet_feat.dat AlexNet model
coco_val2014_resnet_feat.dat ResNet model

After installing all the necessary components, run th main.lua and call the training components.

Testing the VQA model

After installing all the necessary components, run th main.lua and call the testing components.