Skip to content
Keras implementation of MaskRCNN object detection.
Branch: master
Clone or download
Latest commit 5a032c8 Jun 20, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
examples Update ResNet50MaskRCNN.ipynb Oct 23, 2018
images Add example images. May 15, 2018
keras_maskrcnn Add option to disable COCO evaluation. Jun 20, 2019
snapshots Add snapshots subdir. Mar 12, 2018
.gitignore Add .gitignore. Mar 9, 2018
.travis.yml Allow latest keras in travis configuration. Oct 12, 2018
LICENSE Add LICENSE file. Jun 13, 2018 Readme update Mar 12, 2019
setup.cfg Add pytest config. Mar 9, 2018 Bump version to 0.2.2. Jun 20, 2019

Keras MaskRCNN Build Status DOI

Keras implementation of MaskRNN instance aware segmentation as described in Mask R-CNN by Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, using RetinaNet as base.


This repository doesn't strictly implement MaskRCNN as described in their paper. The difference is that their paper describes using a RPN to propose ROIs and to use those ROIs to perform bounding box regression, classification and mask estimation simultaneously. Instead, this repository uses RetinaNet to do the bounding box regression and classification and builds a mask estimation head on top of those predictions.

In theory RetinaNet can be configured to act as a RPN network, which would then be identical to MaskRCNN, but doing so would require more layers and complexity than is actually necessary. Less is more :)


  1. Clone this repository.
  2. Install keras-retinanet (pip install keras-retinanet --user). Make sure tensorflow v1.13.1 is installed and is using the GPU.
  3. Optionally, install pycocotools if you want to train / test on the MS COCO dataset by running pip install --user git+
  4. Run pip install keras-maskrcnn --user to install the latest release, or run pip install . --user in the repository to install that specific version..


An example of testing the network can be seen in this Notebook. In general, inference of the network works as follows:

outputs = model.predict_on_batch(inputs)
boxes  = outputs[-4]
scores = outputs[-3]
labels = outputs[-2]
masks  = outputs[-1]

Where boxes are shaped (None, None, 4) (for (x1, y1, x2, y2)), scores is shaped (None, None) (classification score), labels is shaped (None, None) (label corresponding to the score) and masks is shaped (None, None, 28, 28). In all three outputs, the first dimension represents the shape and the second dimension indexes the list of detections.

Loading models can be done in the following manner:

from keras_maskrcnn.models import load_model
model = load_model('/path/to/model.h5', backbone_name='resnet50')

Execution time on NVIDIA Pascal Titan X is roughly 175msec for an image of shape 1000x800x3.

Example output images using keras-maskrcnn are shown below.

Example result of MaskRCNN on MS COCO Example result of MaskRCNN on MS COCO Example result of MaskRCNN on MS COCO


keras-maskrcnn can be trained using this script. Note that the train script uses relative imports since it is inside the keras_maskrcnn package. If you want to adjust the script for your own use outside of this repository, you will need to switch it to use absolute imports.


For training on MS COCO, run:

# Running directly from the repository:
./keras_maskrcnn/bin/ coco /path/to/MS/COCO

# Using the installed script:
maskrcn-train coco /path/to/MS/COCO

The pretrained MS COCO model can be downloaded here. Results using the cocoapi are shown below (note: the closest resembling architecture in the MaskRCNN paper achieves an mAP of 0.336).

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.278
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.488
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.286
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.127
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.312
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.392
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.251
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.386
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.405
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.219
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.452
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.565

For training on a [custom dataset], a CSV file can be used as a way to pass the data. See below for more details on the format of these CSV files. To train using your CSV, run:

# Running directly from the repository:
./keras_maskrcnn/bin/ csv /path/to/csv/file/containing/annotations /path/to/csv/file/containing/classes

# Using the installed script:
maskrcnn-train csv /path/to/csv/file/containing/annotations /path/to/csv/file/containing/classes

CSV datasets

The CSVGenerator provides an easy way to define your own datasets. It uses two CSV files: one file containing annotations and one file containing a class name to ID mapping.

Annotations format

The CSV file with annotations should contain one annotation per line. Images with multiple bounding boxes should use one row per bounding box. Note that indexing for pixel values starts at 0. The expected format of each line is:


Some images may not contain any labeled objects. To add these images to the dataset as negative examples, add an annotation where x1, y1, x2, y2, class_name and mask are all empty:


A full example:


This defines a dataset with 3 images. img_001.jpg contains a cow. img_002.jpg contains a cat and a bird. img_003.jpg contains no interesting objects/animals.

Class mapping format

The class name to ID mapping file should contain one mapping per line. Each line should use the following format:


Indexing for classes starts at 0. Do not include a background class as it is implicit.

For example:



Feel free to join the #keras-maskrcnn Keras Slack channel for discussions and questions.

You can’t perform that action at this time.