Dynamically predicted neural network structures for multi-domain question answering
Python Scheme Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
config vqa patch-up Mar 30, 2016
extra Merge branch 'master' of https://github.com/jacobandreas/nmn2 Dec 24, 2016
layers . Jan 24, 2016
misc . Jan 24, 2016
models two tasks Feb 2, 2016
opt everything up-to-date Dec 1, 2015
tasks vqa patch-up Mar 30, 2016
.gitignore everything good, before auto parse Dec 4, 2015
LICENSE.txt license Jan 26, 2016
README.md PHRASING Jul 7, 2017
main.py fix training flag Feb 2, 2016
run.sh vqa patch-up Mar 30, 2016


Neural module networks

UPDATE 22 Jun 2017: Code for our end-to-end module network framework is available at https://github.com/ronghanghu/n2nmn. The n2nmn code works better and is easier to set up. Use it!

This library provides code for training and evaluating neural module networks (NMNs). An NMN is a neural network that is assembled dynamically by composing shallow network fragments called modules into a deeper structure. These modules are jointly trained to be freely composable. For a general overview to the framework, refer to:

Neural module networks. Jacob Andreas, Marcus Rohrbach, Trevor Darrell and Dan Klein. CVPR 2016.

Learning to compose neural networks for question answering. Jacob Andreas, Marcus Rohrbach, Trevor Darrell and Dan Klein. NAACL 2016.

At present the code supports predicting network layouts from natural-language strings, with end-to-end training of modules. Various extensions should be straightforward to implement—alternative layout predictors, supervised training of specific modules, etc.

Please cite the CVPR paper for the general NMN framework, and the NAACL paper for dynamic structure selection. Feel free to email me at jda@cs.berkeley.edu if you have questions. This code is released under the Apache 2 license, provided in LICENSE.txt.

Installing dependencies

You will need to build my fork of the excellent ApolloCaffe library. This fork may be found at jacobandreas/apollocaffe, and provides support for a few Caffe layers that haven't made it into the main Apollo repository. Ordinary Caffe users: note that you will have to install the runcython Python module in addition to the usual Caffe dependencies.

One this is done, update APOLLO_ROOT at the top of run.sh to point to your ApolloCaffe installation.

You will also need to install the following packages:

colorlogs, sexpdata

Downloading data

All experiment data should be placed in the data directory.


In data, create a subdirectory named vqa. Follow the VQA setup instructions to install the data into this directory. (It should have children Annotations, Images, etc.)

We have modified the structure of the VQA Images directory slightly. Images should have two subdirectories, raw and conv. raw contains the original VQA images, while conv contains the result of preprocessing these images with a 16-layer VGGNet as described in the paper. Every file in the conv directory should be of the form COCO_{SETNAME}_{IMAGEID}.jpg.npz, and contain a 512x14x14 image map in zipped numpy format. Here's a gist with the code I use for doing the extraction.


Download the GeoQA dataset from the LSP website, and unpack it into data/geo.

Parsing questions

Every dataset fold should contain a file of parsed questions, one per line, formatted as S-expressions. If multiple parses are provided, they should be semicolon-delimited. As an example, for the question "is the train modern" we might have:

(is modern);(is train);(is (and modern train))

For VQA, these files should be named Questions/{train2014,val2014,...}.sps2. For GeoQA, they should be named environments/{fl,ga,...}/training.sps. Parses used in our papers are provided in extra and should be installed in the appropriate location. The VQA parser script is also located under extra/vqa; instructions for running are provided in the body of the script.

Running experiments

You will first need to create directories vis and logs (which respectively store run logs and visualization code)

Different experiments can be run by providing an appropriate configuration file on the command line (see the last line of run.sh). Examples for VQA and GeoQA are provided in the config directory.

Looking for SHAPES? I haven't finished integrating it with the rest of the codebase, but check out the shapes branch of this repository for data and code.


  • Configurable data location
  • Model checkpointing