Mask R-CNN
Clone or download
Latest commit 45e2818 Feb 19, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
data data lib experiments added Nov 5, 2017
experiments scripts finalized Nov 21, 2017
lib mask evaluation added Nov 21, 2017
test test ipynb added Feb 10, 2018
tools make superminival2014 for testing only Feb 10, 2018
.gitignore data lib experiments added Nov 5, 2017
.gitmodules gitignore gitmodules added Nov 5, 2017 update MAttNet link Feb 19, 2018
requirements.txt requirements added Feb 9, 2018


A pytorch implementation of Mask RCNN detection framework based on

This project supports single-GPU training of ResNet101-based Mask R-CNN (without FPN support). The purpose is to support the experiments in MAttNet, whose REFER dataset is a subset of COCO training portion. Thus our pre-trained model takes COCO_2014_train_minus_refer_valtest + COCO_2014_valminusminival images for training.


  • Python 2.7
  • Pytorch 0.2 or higher
  • CUDA 8.0 or higher
  • requirements.txt


  1. First of all, clone the code with refer API:
git clone --recursive
  1. Prepare data:
  • COCO: We use coco to name COCO's API as inheritance. Download the annotations and images into data/coco. Note the valminusminival and minival can be downloaded here.
git clone data/coco
  • REFER: Follow the instructions in REFER to prepare the annotations for RefCOCO, RefCOCO+ and RefCOCOg.
git clone data/refer
  • ImageNet Weights: Find the resnet101-caffe download link from this repository, and download it as data/imagenet_weights/res101.pth.

  • coco_minus_refer: Make the coco_minus_refer annotation, which is to be saved as data/coco/annotations/instances_train_minus_refer_valtest2014.json

python tools/


As pointed out by ruotianluo/pytorch-faster-rcnn, choose the right -arch to compile the cuda code:

GPU model Architecture
TitanX (Maxwell/Pascal) sm_52
GTX 960M sm_50
GTX 1080 (Ti) sm_61
Grid K520 (AWS g2.2xlarge) sm_30
Tesla K80 (AWS p2.xlarge) sm_37

Compile the CUDA-based nms and roi_pooling using following simple commands:

cd lib


Run by (notime as extra/tag)

./experiments/scripts/ 0 refcoco res101 notime
  • Train on COCO 2014 trainval35k minus refer_valtest, and test on minival (800k/1250k), 35.8 on detection and 30.7 on segmentation (nms).

Checking the training process by calling tensorboard, and check it at

tensorboard --logdir tensorboard/res101 --port=port_number


Run by (notime as extra/tag)

./experiments/scripts/ 0 refcoco res101 notime

Detection Comparison:

  • Compared with Faster R-CNN trained/evaluated on the same images.
Detection AP AP50 AP75
Faster R-CNN 34.1 53.7 36.8
Our Mask R-CNN 35.8 55.3 38.6

Segmentation Comparison:

  • We compare with Mask R-CNN implementation. Note this comparison is slightly unfair to ours, due to
  • We have fewer (~6,500) training images.
  • Our training is single GPU.
  • The shorter border length in our model is 600px instead of 800px.
Segmentation AP AP50 AP75
Original Mask R-CNN 32.7 54.2 34.0
Our Mask R-CNN 30.7 52.3 32.4

Pretrained Model

We provide the model we used in MAttNet for mask comprehension.

Download and put the downloaded .pth and .pkl files into output/res101/coco_2014_train_minus_refer_valtest+coco_2014_valminusminival/notime


  • Follow the ipython notebook of test/check_model.ipynb to test our model.