Training code for "Pixels to Graphs by Associative Embedding"
Switch branches/tags
Nothing to show
Clone or download
Latest commit 3632df2 Mar 26, 2018
Permalink
Failed to load latest commit information.
data/genome - Mar 26, 2018
models - Dec 22, 2017
task - Mar 26, 2018
util - Mar 26, 2018
.gitignore - Dec 22, 2017
LICENSE - Dec 22, 2017
README.md - Mar 26, 2018
__init__.py - Dec 22, 2017
eval.py - Mar 26, 2018
main.py - Mar 26, 2018
opts.py - Mar 26, 2018

README.md

Pixels to Graphs by Associative Embedding

Tensorflow training pipeline for:

Pixels to Graphs by Associative Embedding. Alejandro Newell and Jia Deng. Neural Information Processing Systems (NIPS), 2017. arXiv:1706.07365

Update (March 25, 2018): Changed default optimization to deal with diverging models, so code should no longer freeze in the middle of training. Also, finally added pretrained models!

Getting started

Python 3 package requirements:

(code tested on Python 3.5 and 3.6)

  • tensorflow_gpu (1.3.0)
  • numpy
  • h5py
  • scipy
  • scikit-image
  • tqdm
  • easydict
  • graphviz
  • simplejson (for Visual Genome driver)

Make sure to add the parent directory of px2graph to your PYTHONPATH.

Set up munkres-tensorflow:

  • Clone repo from here
  • Follow build instructions (make sure to use g++ 4.X)
  • For newer versions of TensorFlow it was a bit difficult to get this working, here is the command that finally worked for me:
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')

g++-4.9 -std=c++11 -shared hungarian.cc -o hungarian.so -fPIC -I$TF_INC -I$TF_INC/external/nsync/public -L$TF_LIB -ltensorflow_framework -O2 -D_GLIBCXX_USE_CXX11_ABI=0
  • Place 'hungarian.so' in util/

Download data

  • Download images from Visual Genome (parts 1 and 2)
  • Place all images into data/genome/VG_100K/
  • Download VG-SGG.h5 and proposals.h5 from here and place them in data/genome/ (credit to Xu et al. 'Scene Graph Generation by Iterative Message Passing' for preprocessing the Visual Genome annotations)

Using the code

To train a network, call:

python main.py -e [experiment name] --sg_task [PR|CL|SG]

The different task settings are defined as follows:

  • PR: Object boxes and classes provided, predict relationships
  • CL: Object boxes provided, classify objects and their relationships
  • SG: Full task, nothing but image required as input (an additional argument --use_rpn [0|1] determines whether or not to include box proposals from a RPN)

To train with multiple GPUs use a command like:

python main.py --batchsize 32 --gpu_choice '0,1,2,3'

To continue an experiment or branch off to a new directory (to test different hyperparameters for example), call:

python main.py -e test_exp --continue 1 or python main.py --branch test_exp -e test_exp_2 --learning_rate 1e-5

In general, any binary options you wish to change via the command line must be set explicitly with either a 0 or 1. Use --help to get a list of available options to set.

Important: Regarding optimization, some feature of the code can lead to spontaneous divergence in the midst of training

Evaluating a model

The recall logged during training should give a good sense of the performance of the model. After training, to generate a proper set of predictions and evaluate them, call:

python main.py --branch test_exp -e my_results --predict valid
python eval.py my_results/valid_preds

The --predict argument accepts train|valid|test as options, and will generate predictions on the first opt.valid_iters * opt.batchsize samples of the specified subset.

Pretrained models

Task Setting R @ 50 R @ 100
SGGen (SG) (no RPN) 15.5 18.8
SGCls (CL) 35.7 38.4
PredCls (PR) 82.0 86.4

Note, performance is higher than previously reported in our paper. This is not due to any changes in the algorithm, but instead is the result of training for longer and tuning some postprocessing parameters.

Another important detail regarding our evaluation is that our numbers are reported with unconstrained graphs meaning that when proposing the top 50 or 100 predicates, multiple edges may be proposed between any two nodes. This leads to a notable performance difference when comparing to numbers in a constrained setting particularly on the SGCls and PredCls tasks where nodes are provided ahead of time.

To get final test set predictions from the pretrained models, put them in the exp/ directory and call:

python main.py --branch pretrained_sg -e sg_results --predict test