# Segmentation and Object Detection with pre-trained Mask RCNN model

Mask RCNN networks are extensions to Faster RCNN networks.
`gluoncv.model_zoo.MaskRCNN` is inherited from
`gluoncv.model_zoo.FasterRCNN`.

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt

from gluoncv import model_zoo, data, utils
from mxnet import image, gpu, nd, gpu
from mxnet.gluon.data.vision import transforms
import numpy as np

context = gpu(0)

## Load a pretrained model


Let's get an Mask RCNN model trained on COCO dataset with ResNet-50 backbone.
By specifying ``pretrained=True``, it will automatically download the model
from the model zoo if necessary.

The returned model is a HybridBlock `gluoncv.model_zoo.MaskRCNN`.

In [None]:
net = model_zoo.get_model('mask_rcnn_resnet50_v1b_coco', ctx=context, pretrained=True)
print(net.classes)

## Pre-process an image

The pre-processing step is identical to Faster RCNN.

Next we download an image, and pre-process with preset data transforms.
The default behavior is to resize the short edge of the image to 600px.
But you can feed an arbitrarily sized image.

You can provide a list of image file names, such as ``[im_fname1, im_fname2,
...]`` to :py:func:`gluoncv.data.transforms.presets.rcnn.load_test` if you
want to load multiple image together.

This function returns two results. The first is a NDArray with shape
`(batch_size, RGB_channels, height, width)`. It can be fed into the
model directly. The second one contains the images in numpy format to
easy to be plotted. Since we only loaded a single image, the first dimension
of `x` is 1.

Please beware that `orig_img` is resized to short edge 600px.



In [None]:
url ='https://s3-us-west-1.amazonaws.com/s3.kllcfm.radio.com/styles/delta__775x515/s3/cutch.jpg?itok=peJK9kV-&c=bde29864317af570bf02f494b8a414c9'
filename = 'example.jpg'
utils.download(url, filename, overwrite=True)

In [None]:
img = image.imread(filename)
plt.imshow(img.asnumpy())

In [None]:
x, orig_img = data.transforms.presets.rcnn.load_test(filename)
x = x.as_in_context(context)
print(x.shape)

## Inference and display

The Mask RCNN model returns predicted class IDs, confidence scores,
bounding boxes coordinates and segmentation masks.
Their shape are (batch_size, num_bboxes, 1), (batch_size, num_bboxes, 1)
(batch_size, num_bboxes, 4), and (batch_size, num_bboxes, mask_size, mask_size)
respectively. For the model used in this tutorial, mask_size is 14.

In [None]:
ids, scores, bboxes, masks = net(x)

In [None]:
ids    = ids[0].asnumpy()
scores = scores[0].asnumpy()
bboxes = bboxes[0].asnumpy()
masks  = masks[0].asnumpy()

In [None]:
# Flatten scores
score_list = scores.reshape(-1)
# Print scores
score_list = score_list[np.where (score_list != -1)])
print(score_list)

In [None]:
# Top 10 scores
print(score_list[:10])

### Segmentation

`gluoncv.utils.viz.expand_mask` will resize the segmentation mask
and fill the bounding box size in the original image.
`gluoncv.utils.viz.plot_mask` will modify an image to
overlay segmentation masks.

In [None]:
width, height = orig_img.shape[1], orig_img.shape[0]
masks = utils.viz.expand_mask(masks, bboxes, (width, height), scores, thresh=0.68)
orig_img = utils.viz.plot_mask(orig_img, masks)

### Object Detection

We can use `gluoncv.utils.viz.plot_bbox` to visualize the
results. We slice the results for the first image and feed them into `plot_bbox`:

In [None]:
# Identical to Faster RCNN object detection
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(1, 1, 1)
ax = utils.viz.plot_bbox(orig_img, bboxes, scores, ids, class_names=net.classes, ax=ax, thresh=0.68)
plt.show()