Skip to content
Go to file


Failed to load latest commit information.
Latest commit message
Commit time

Iterative Visual Reasoning Beyond Convolutions

By Xinlei Chen, Li-Jia Li, Li Fei-Fei and Abhinav Gupta.


  • This is the authors' implementation of the system described in the paper, not an official Google product.
  • Right now:
    • The available reasoning module is based on convolutions and spatial memory.
    • For simplicity, the released code uses the tensorflow default crop_and_resize operation, rather than the customized one reported in the paper (I find the default one is actually better by ~1%).


  1. Tensorflow, tested with version 1.6 with Ubuntu 16.04, installed with:
pip install --ignore-installed --upgrade
  1. Other packages needed can be installed with pip:
pip install Cython easydict matplotlib opencv-python Pillow pyyaml scipy
  1. For running COCO, the API can be installed globally:
# any path is okay
mkdir ~/install && cd ~/install
git clone cocoapi
cd cocoapi/PythonAPI
python install --user

Setup and Running

  1. Clone the repository.
git clone
cd iter-reason
  1. Set up data, here we use ADE20K as an example.
mkdir -p data/ADE
cd data/ADE
wget -v
tar -xzvf
mv ADE20K_2016_07_26/* ./
rmdir ADE20K_2016_07_26
# then get the train/val/test split
wget -v
tar -xzvf ADE_split.tar.gz
rm -vf ADE_split.tar.gz
cd ../..
  1. Set up pre-trained ImageNet models. This is similarly done in tf-faster-rcnn. Here by default we use ResNet-50 as the backbone:
 mkdir -p data/imagenet_weights
 cd data/imagenet_weights
 wget -v
 tar -xzvf resnet_v1_50_2016_08_28.tar.gz
 mv resnet_v1_50.ckpt res50.ckpt
 cd ../..
  1. Compile the library (for computing bounding box overlaps).
cd lib
cd ..
  1. Now you are ready to run! For example, to train and test the baseline:
./experiments/scripts/ [GPU_ID] [DATASET] [NET] [STEPS] [ITER] 
# GPU_ID is the GPU you want to test on
# DATASET in {ade, coco, vg} is the dataset to train/test on, defined in the script
# NET in {res50, res101} is the backbone networks to choose from
# STEPS (x10K) is the number of iterations before it reduces learning rate, can support multiple steps separated by character 'a'
# ITER (x10K) is the total number of iterations to run
# Examples:
# train on ADE20K for 320K iterations, reducing learning rate at 280K.
./experiments/scripts/ 0 ade 28 32
# train on COCO for 720K iterations, reducing at 320K and 560K.
./experiments/scripts/ 1 coco 32a56 72
  1. To train and test the reasoning modules (based on ResNet-50):
./experiments/scripts/ [GPU_ID] [DATASET] [MEM] [STEPS] [ITER] 
# MEM in {local} is the type of reasoning modules to use 
# Examples:
# train on ADE20K on the local spatial memory.
./experiments/scripts/ 0 ade local 28 32
  1. Once the training is done, you can test the models separately with and, we also provided a separate set of scripts to test on larger image inputs.

  2. You can use tensorboard to visualize and track the progress, for example:

tensorboard --logdir=tensorboard/res50/ade_train_5/ --port=7002 &


    author = {Xinlei Chen and Li-Jia Li and Li Fei-Fei and Abhinav Gupta},
    title = {Iterative Visual Reasoning Beyond Convolutions},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    Year = {2018}

The idea of spatial memory was developed in:

    author = {Xinlei Chen and Abhinav Gupta},
    title = {Spatial Memory for Context Reasoning in Object Detection},
    booktitle = {Proceedings of the International Conference on Computer Vision},
    Year = {2017}


Code for Iterative Reasoning Paper (CVPR 2018)





No releases published


No packages published