# Ungraded Lab: Mask R-CNN Image Segmentation Demo

In this lab, you will see how to use a [Mask R-CNN](https://arxiv.org/abs/1703.06870) model from Tensorflow Hub for object detection and instance segmentation. This means that aside from the bounding boxes, the model is also able to predict segmentation masks for each instance of a class in the image. You have already encountered most of the commands here when you worked with the Object Dection API and you will see how you can use it with instance segmentation models. Let's begin!

In [None]:
!git clone --depth 1 https://github.com/tensorflow/models

In [None]:
%%bash
sudo apt install -y protobuf-compiler
cd models/research
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
python -m pip install .

In [None]:
import tensorflow as tf
import tensorflow_hub as hub

import matplotlib
import matplotlib.pyplot as plt

import numpy as np
from six import BytesIO
from PIL import Image
from six.moves.urllib.request import urlopen

from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.utils import ops as utils_ops

tf.get_logger().setLevel('ERROR')

%matplotlib inline

In [None]:
def load_image_into_numpy_array(path):
    
    image = None
    if(path.startswith('http')):
        response = urlopen(path)
        image_data = response.read()
        image_data = BytesIO(image_data)
        image = Image.open(image_data)
    else:
        image_data = tf.io.gfile.GFile(path, 'rb').read()
        image = Image.open(BytesIO(image_data))
        
    (im_width, im_height) = (image.size)
    return np.array(image.getdata()).reshape(
        (1, im_height, im_width, 3)
    ).astype(np.uint8)

# dictionary with image tags as keys, and image paths as values
TEST_IMAGES = {
  'Beach' : 'models/research/object_detection/test_images/image2.jpg',
  'Dogs' : 'models/research/object_detection/test_images/image1.jpg',
  # By Américo Toledano, Source: https://commons.wikimedia.org/wiki/File:Biblioteca_Maim%C3%B3nides,_Campus_Universitario_de_Rabanales_007.jpg
  'Phones' : 'https://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/Biblioteca_Maim%C3%B3nides%2C_Campus_Universitario_de_Rabanales_007.jpg/1024px-Biblioteca_Maim%C3%B3nides%2C_Campus_Universitario_de_Rabanales_007.jpg',
  # By 663highland, Source: https://commons.wikimedia.org/wiki/File:Kitano_Street_Kobe01s5s4110.jpg
  'Street' : 'https://upload.wikimedia.org/wikipedia/commons/thumb/0/08/Kitano_Street_Kobe01s5s4110.jpg/2560px-Kitano_Street_Kobe01s5s4110.jpg'
}

## Load the Model

Tensorflow Hub provides a Mask-RCNN model that is built with the Object Detection API. You can read about the details [here](https://tfhub.dev/tensorflow/mask_rcnn/inception_resnet_v2_1024x1024/1). Let's first load the model and see how to use it for inference in the next section.

In [None]:
model_display_name = 'Mask R-CNN Inception ResNet V2 1024x1024'
model_handle = 'https://tfhub.dev/tensorflow/mask_rcnn/inception_resnet_v2_1024x1024/1'

print('Selected model:'+ model_display_name)
print('Model Handle at TensorFlow Hub: {}'.format(model_handle))

In [None]:
# This will take 10 to 15 minutes to finish
print('loading model...')
hub_model = hub.load(model_handle)
print('model loaded!')

In [None]:
# Choose one and use as key for TEST_IMAGES below: 
# ['Beach', 'Street', 'Dogs','Phones']

image_path = TEST_IMAGES['Street']

image_np = load_image_into_numpy_array(image_path)

plt.figure(figsize=(24, 32))
plt.imshow(image_np[0])
plt.show()

You can run inference by simply passing the numpy array of a single image to the model. Take nore that this model does not support batching. As you have seen in the notebooks in week 2, this will oupout a dictionary containing the results. These are described in the Outputs section of the documentation

In [None]:
# Run inference
results = hub_model(image_np)

# Output values are tensors and we only need the numpy()
# Parameters when we visualize the results
result = {key: value.numpy() for key, value in results.items()}

for key in results.keys():
    print(key)

## Visualizing the Results
You can now plot the results on the origianl image. First, you need to create the category_index dictionary that will contain the class IDs and names. The model was trained on the COCO2017 dataset and the API package has the labels saved in a different format. You can use the create_category_index_from_labelmap internal utility funciton to convert this to the required dictionary format.

In [None]:
PATH_TO_LABELS = './models/research/object_detection/data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(
    PATH_TO_LABELS,
    use_display_name=True
)

# sample output
print(category_index[1])
print(category_index[2])
print(category_index[4])

Next you will preprocess the masks then finally plot the results
- The result dictionary contains a detection_masks key containing segmentation masks for each box. That will be converted first to masks that will overlay to the full image size
- You will also select mask pixel values that are above a certain threshold. We picked a value of 0.6 but feel free to modify this and see what results your will get. If you pick something lower, then you will most likely notice mask pixesl that are outside the object
- As you have seen befor, you can use visualize_boxes_and_labels_on_image_array() to plot the results on the image. The difference this time is the parameter instance_masks and you will pass in the reframed detection boxes to see the segmentation masks on the image



In [None]:
# Handle models with masks
label_id_offset = 0
image_np_with_mask = image_np.copy()

if 'detection_masks' in result:
    # Conver np.array to tensors
    detection_masks = tf.convert_to_tensor(result['detection_masks'][0])
    detection_masks = tf.convert_to_tensor(result['detection_boxes'][0])
    
    # Reframe the bounding box mask to the image size
    detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
        detection_masks,
        detection_boxes,
        image_np.shape[1], image_np.shape[2]
    )
    
    # Filter mask pixel value that are above a specified threshold
    detection_masks_reframed = tf.cast(detection_masks_reframed > 0.6, tf.uint8)
    
    # Get the numpy array
    result['detection_masks_reframed'] = detection_masks_reframed.numpy()
    
# Overlay labeled boxes and segmentation masks on the image
vis_utils.visualize_boxes_and_labels_on_image_array(
    image_np_with_mask[0],
    result['detection_boxes'][0],
    (result['detection_classes'][0] + label_id_offset).astype(int),
    result['detection_scores'][0],
    category_index,
    use_normalized_coordinates=True,
    max_boxes_to_draw=100,
    min_score_thresh=0.7,
    agnostic_mode=False,
    instance_masks=result.get('detection_masks_reframed', None),
    line_thickness=8
)

plt.figure(figsize=(24, 32))
plt.imshow(image_np_with_mask[0])
plt.show(0)