# Instance Segmentation with Mask R-CNN with TPU support

*by Georgios K. Ouzounis, June 22nd, 2021*

In this notebook we will experiment with **instance segmentation** in still images using the **Mask R-CNN** model, trained on Cloud TPU. 

This is a slightly altered version of the original notebook posted by Google Research that can be found [here](https://colab.research.google.com/github/tensorflow/tpu/blob/master/models/official/mask_rcnn/mask_rcnn_demo.ipynb#scrollTo=t_iHs_wm2Mhh)

For each test image, the output set of predictions includes bounding boxes, labels and instance masks that are overlayed on the input. 


## Instructions
<h3><a href="https://cloud.google.com/tpu/"><img valign="middle" src="https://raw.githubusercontent.com/GoogleCloudPlatform/tensorflow-without-a-phd/master/tensorflow-rl-pong/images/tpu-hexagon.png" width="50"></a>  &nbsp;&nbsp;Use a free Cloud TPU</h3>
 
On the main menu, click Runtime and select **Change runtime type**. Set "TPU" as the hardware accelerator.

## Download the source code
Download the source code of the Mask R-CNN model from the **tensorflow/tpu/** github repo.

In [None]:
!git clone https://github.com/tensorflow/tpu/

## Import libraries

In [None]:
import numpy as np
import cv2
%tensorflow_version 1.x
import tensorflow as tf
import sys
sys.path.insert(0, 'tpu/models/official')
sys.path.insert(0, 'tpu/models/official/mask_rcnn')
import coco_metric
from mask_rcnn.object_detection import visualization_utils

## Load the COCO index mapping
This Colab uses a pretrained checkpoint of the Mask R-CNN model that is trained using the COCO dataset. Below is the mapping between the indices that the model predicts and the categories in text.

In [None]:
ID_MAPPING = {
    1: 'person',
    2: 'bicycle',
    3: 'car',
    4: 'motorcycle',
    5: 'airplane',
    6: 'bus',
    7: 'train',
    8: 'truck',
    9: 'boat',
    10: 'traffic light',
    11: 'fire hydrant',
    13: 'stop sign',
    14: 'parking meter',
    15: 'bench',
    16: 'bird',
    17: 'cat',
    18: 'dog',
    19: 'horse',
    20: 'sheep',
    21: 'cow',
    22: 'elephant',
    23: 'bear',
    24: 'zebra',
    25: 'giraffe',
    27: 'backpack',
    28: 'umbrella',
    31: 'handbag',
    32: 'tie',
    33: 'suitcase',
    34: 'frisbee',
    35: 'skis',
    36: 'snowboard',
    37: 'sports ball',
    38: 'kite',
    39: 'baseball bat',
    40: 'baseball glove',
    41: 'skateboard',
    42: 'surfboard',
    43: 'tennis racket',
    44: 'bottle',
    46: 'wine glass',
    47: 'cup',
    48: 'fork',
    49: 'knife',
    50: 'spoon',
    51: 'bowl',
    52: 'banana',
    53: 'apple',
    54: 'sandwich',
    55: 'orange',
    56: 'broccoli',
    57: 'carrot',
    58: 'hot dog',
    59: 'pizza',
    60: 'donut',
    61: 'cake',
    62: 'chair',
    63: 'couch',
    64: 'potted plant',
    65: 'bed',
    67: 'dining table',
    70: 'toilet',
    72: 'tv',
    73: 'laptop',
    74: 'mouse',
    75: 'remote',
    76: 'keyboard',
    77: 'cell phone',
    78: 'microwave',
    79: 'oven',
    80: 'toaster',
    81: 'sink',
    82: 'refrigerator',
    84: 'book',
    85: 'clock',
    86: 'vase',
    87: 'scissors',
    88: 'teddy bear',
    89: 'hair drier',
    90: 'toothbrush',
}

#create a dictionary with class IDs mapped to the COCO labels 
category_index = {k: {'id': k, 'name': ID_MAPPING[k]} for k in ID_MAPPING}

## Get a sample image

Use the **wget** command to download locally an image of your liking or mount your Google Drive and copy one locally

In [None]:
# sample image used in the original Colab Notebook
!wget https://upload.wikimedia.org/wikipedia/commons/thumb/0/08/Kitano_Street_Kobe01s5s4110.jpg/2560px-Kitano_Street_Kobe01s5s4110.jpg -O test.jpg
image_path = 'test.jpg'

In [None]:
# sample image from the author's github repo
!wget https://github.com/georgiosouzounis/instance-segmentation-mask-rcnn/raw/main/data/newyork.jpg -O test.jpg
image_path = 'test.jpg'

In [None]:
# read the image both as 3D numpy array (openCV) and as a serialized string 
# for model compatibility
image = cv2.imread(image_path)
# convert the BGR order to RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# get the image width and height
width, height = image.shape[1], image.shape[0]
 
# serialization
with open(image_path, 'rb') as f:
  np_image_string = np.array([f.read()])

In [None]:
# view the selected image
from google.colab.patches import cv2_imshow
cv2_imshow(image)

## Create a Tensorflow session

Create a Tensorflow session to run the inference. You can either connect to a TPU or a normal CPU backend.

In [None]:
use_tpu = False #@param {type:"boolean"}
# if using the TPU runtime:
if use_tpu:
  import os
  import pprint

  # assert the TPU address
  assert 'COLAB_TPU_ADDR' in os.environ, 'ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!'
  TPU_ADDRESS = 'grpc://' + os.environ['COLAB_TPU_ADDR']
  print('TPU address is', TPU_ADDRESS)

  # initialize a session
  session = tf.Session(TPU_ADDRESS, graph=tf.Graph())
  print('TPU devices:')
  pprint.pprint(session.list_devices())
# else if using the CPU runtime:
else:
  # initialize a session
  session = tf.Session(graph=tf.Graph())

## Load the pretrained model
Load the COCO pretrained model from the public GCS bucket. Ignore the deprecation warnings as there is no immediate fix for using tensorflow2 togetehr with Mask R-CNN

In [None]:
# set the model directory here or on the line to the right
saved_model_dir = 'gs://cloud-tpu-checkpoints/mask-rcnn/1555659850' #@param {type:"string"}

# load the model
# underscore _ is considered as "I don't Care" or "Throwaway" variable in Python. 
_ = tf.saved_model.loader.load(session, ['serve'], saved_model_dir)

## Compute instance segmentation

Run the inference and process the predictions returned by the model.

In [None]:
# get the predictions by running the session created earlier
num_detections, detection_boxes, detection_classes, detection_scores, detection_masks, image_info = session.run(
    ['NumDetections:0', 'DetectionBoxes:0', 'DetectionClasses:0', 'DetectionScores:0', 'DetectionMasks:0', 'ImageInfo:0'],
    feed_dict={'Placeholder:0': np_image_string})

# remove axes of length 1 in each of the numpy arrays returned at the end of the session.
num_detections = np.squeeze(num_detections.astype(np.int32), axis=(0,))
detection_boxes = np.squeeze(detection_boxes * image_info[0, 2], axis=(0,))[0:num_detections]
detection_scores = np.squeeze(detection_scores, axis=(0,))[0:num_detections]
detection_classes = np.squeeze(detection_classes.astype(np.int32), axis=(0,))[0:num_detections]
instance_masks = np.squeeze(detection_masks, axis=(0,))[0:num_detections]

# extract the bounding box endpoints from the detection_boxes array
ymin, xmin, ymax, xmax = np.split(detection_boxes, 4, axis=-1)
# convert each bbox endpoint array to the desired format [x_start, y_start, width, height]
processed_boxes = np.concatenate([xmin, ymin, xmax - xmin, ymax - ymin], axis=-1)

# generates the segmentation result from an instance mask and its bbox for each detection
segmentations = coco_metric.generate_segmentation_from_masks(instance_masks, processed_boxes, height, width)

## Visualize the detection results


In [None]:
# set  the max number of boxes to draw and the detection confidence threshold
max_boxes_to_draw = 50   #@param {type:"integer"}
min_score_thresh = 0.5    #@param {type:"slider", min:0, max:1, step:0.01}

# create the ouput image with bboxes, labels and segments imprinted
image_with_detections = visualization_utils.visualize_boxes_and_labels_on_image_array(
    image,
    detection_boxes,
    detection_classes,
    detection_scores,
    category_index,
    instance_masks=segmentations,
    use_normalized_coordinates=False,
    max_boxes_to_draw=max_boxes_to_draw,
    min_score_thresh=min_score_thresh)


In [None]:
# dispaly the resulting image
cv2_imshow(image_with_detections)

## Training Mask R-CNN on Cloud TPU

To train the Mask R-CNN on custom data on Cloud TPU you may wish to consult [this tutorial](https://cloud.google.com/tpu/docs/tutorials/mask-rcnn) from Google Research. Please do note that the tutorial uses billable components of Google Cloud, including:
- Compute Engine
- Cloud TPU
- Cloud Storage
