Torch implementation of DeepMask and SharpMask
Latest commit e6a4169 Aug 31, 2016 @soumith soumith committed on GitHub Merge pull request #12 from kevintpeng/patch-1
remove duplicate dependency in README (2x cudnn)


This repository contains a Torch implementation for both the DeepMask and SharpMask object proposal algorithms.


DeepMask is trained with two objectives: given an image patch, one branch of the model outputs a class-agnostic segmentation mask, while the other branch outputs how likely the patch is to contain an object. At test time, DeepMask is applied densely to an image and generates a set of object masks, each with a corresponding objectness score. These masks densely cover the objects in an image and can be used as a first step for object detection and other tasks in computer vision.

SharpMask is an extension of DeepMask which generates higher-fidelity masks using an additional top-down refinement step. The idea is to first generate a coarse mask encoding in a feedforward pass, then refine this mask encoding in a top-down pass using features at successively lower layers. This result in masks that better adhere to object boundaries.

If you use DeepMask/SharpMask in your research, please cite the relevant papers:

   title = {Learning to Segment Object Candidates},
   author = {Pedro O. Pinheiro and Ronan Collobert and Piotr Dollár},
   booktitle = {NIPS},
   year = {2015}
   title = {Learning to Refine Object Segments},
   author = {Pedro O. Pinheiro and Tsung-Yi Lin and Ronan Collobert and Piotr Dollár},
   booktitle = {ECCV},
   year = {2016}

Note: the version of DeepMask implemented here is the updated version reported in the SharpMask paper. DeepMask takes on average .5s per COCO image, SharpMask runs at .8s. Runtime roughly doubles for the "zoom" versions of the models.

Requirements and Dependencies

Quick Start

To run pretrained DeepMask/SharpMask models to generate object proposals, follow these steps:

  1. Clone this repository into $DEEPMASK:

    DEEPMASK=/desired/absolute/path/to/deepmask/ # set absolute path as desired
    git clone $DEEPMASK
  2. Download pre-trained DeepMask and SharpMask models:

    mkdir -p $DEEPMASK/pretrained/deepmask; cd $DEEPMASK/pretrained/deepmask
    mkdir -p $DEEPMASK/pretrained/sharpmask; cd $DEEPMASK/pretrained/sharpmask
  3. Run computeProposals.lua with a given model and optional target image (specified via the -img option):

    # apply to a default sample image (data/testImage.jpg)
    cd $DEEPMASK
    th computeProposals.lua $DEEPMASK/pretrained/deepmask # run DeepMask
    th computeProposals.lua $DEEPMASK/pretrained/sharpmask # run SharpMask
    th computeProposals.lua $DEEPMASK/pretrained/sharpmask -img /path/to/image.jpg

Training Your Own Model

To train your own DeepMask/SharpMask models, follow these steps:


  1. If you have not done so already, clone this repository into $DEEPMASK:

    DEEPMASK=/desired/absolute/path/to/deepmask/ # set absolute path as desired
    git clone $DEEPMASK
  2. Download the Torch ResNet-50 model pretrained on ImageNet:

    mkdir -p $DEEPMASK/pretrained; cd $DEEPMASK/pretrained
  3. Download and extract the COCO images and annotations:

    mkdir -p $DEEPMASK/data; cd $DEEPMASK/data


To train, launch the train.lua script. It contains several options, to list them, simply use the --help flag.

  1. To train DeepMask:

    th train.lua
  2. To train SharpMask (requires pre-trained DeepMask model):

    th train.lua -dm /path/to/trained/deepmask/


There are two ways to evaluate a model on the COCO dataset.

  1. evalPerPatch.lua evaluates only the mask generation step. The per-patch evaluation only uses image patches that contain roughly centered objects. Its usage is as follows:

    th evalPerPatch.lua /path/to/trained/deepmask-or-sharpmask/
  2. evalPerImage.lua evaluates the full model on COCO images, as reported in the papers. By default, it evaluates performance on the first 5K COCO validation images (run th evalPerImage.lua --help to see the options):

    th evalPerImage.lua /path/to/trained/deepmask-or-sharpmask/

Precomputed Proposals

You can download pre-computed proposals (1000 per image) on the COCO and PASCAL VOC datasets, for both segmentation and bounding box proposals. We use the COCO JSON format for the proposals. The proposals are divided into chunks of 500 images each (that is, each JSON contains 1000 proposals per image for 500 images). All proposals correspond to the "zoom" setting in the paper (DeepMaskZoom and SharpMaskZoom) which tend to be most effective for object detection.