# YOLO, so Save Water – Mask R-CNN edition

This notebook will assess the performance of Mask R-CNN for detection of taps in pictures, differentiating between taps with and without running water.

This notebook is heavily based on the example notebook [`train_shapes.ipynb` from the Mask R-CNN GitHub repository](https://github.com/matterport/Mask_RCNN/blob/master/samples/shapes/train_shapes.ipynb). 

Before running the notebook, upload the dataset file `YOLO, so Save Water.zip` into `/content/` (the main directory of the file browser).

## Step 0. Setup

First, let's clone the Mask R-CNN repository to get ahold of its implementation and install its dependencies. Mask R-CNN requires quite old versions of dependencies, so we must explicitly downgrade :(

In [None]:
![ ! -d /content/Mask_RCNN ] && git clone https://github.com/matterport/Mask_RCNN.git
%cd /content/Mask_RCNN
%pip install -r requirements.txt
%pip install 'tensorflow < 2.0.0' 'keras == 2.1.5' 'scikit-image == 0.16.2'

We will also download the pretrained weights to speed up learning.

In [None]:
import os
import sys
import random
import math
import re
import time
import numpy as np
import cv2
import matplotlib
import matplotlib.pyplot as plt


mask_rcnn_dir = '/content/Mask_RCNN'
sys.path.append(mask_rcnn_dir)

from mrcnn.config import Config
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
from mrcnn.model import log

%matplotlib inline 

# Directory to save logs and trained model
MODEL_DIR = os.path.join(mask_rcnn_dir, "logs")

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(mask_rcnn_dir, "mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

And define a helper function for later:

In [None]:
def get_ax(rows=1, cols=1, size=8):
    """Return a Matplotlib Axes array to be used in
    all visualizations in the notebook. Provide a
    central point to control graph sizes.
    
    Change the default size attribute to control the size
    of rendered images
    """
    _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))
    return ax

## Step 1. Configuration & Dataset

We will now configure Mask R-CNN to our dataset and hardware. The changes from the tutorial include reducing the amount of images per GPU, changing the number of classes, image dimensions, and reducing some other characteristics to ensure we don't exceed Colab's limits.

In [None]:
class TapsConfig(Config):
    # Give the configuration a recognizable name
    NAME = "taps"

    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

    # Number of classes (including background)
    NUM_CLASSES = 1 + 2  # background + 2 kinds of taps

    # Use small images for faster training. Set the limits of the small side
    # the large side, and that determines the image shape.
    IMAGE_MIN_DIM = 448
    IMAGE_MAX_DIM = 448

    TRAIN_ROIS_PER_IMAGE = 8
    
config = TapsConfig()
config.display()

Now it's time to unpack and load the dataset. A custom implementation of `utils.Dataset` is provided below to account for the pecularities of storing images and their corresponding masks.

In [None]:
!unzip -qo '/content/YOLO, so Save Water.zip'

In [None]:
import json
from itertools import islice
from shutil import copy
from os import makedirs, scandir, path
from random import shuffle

import skimage


def split_dataset(dataset_dir, ratios=(0.8, 0.15, 0.5)):
    images = list(scandir(path.join(dataset_dir, "img")))
    shuffle(images)
    image_iterator = iter(images)
    directories = ("train", "val", "test")

    for directory, ratio in zip(directories, ratios):
        target = path.join(dataset_dir, "./split", directory)
        makedirs(target)
        makedirs(path.join(target, 'ann'))
        for image in islice(image_iterator, int(len(images) * ratio)):
            copy(path.join(dataset_dir, "img", image.name), target)
            copy(path.join(dataset_dir, "ann", f"{image.name}.json"), path.join(target, "ann"))


class TapDataset(utils.Dataset):
    name = "taps"

    @staticmethod
    def split_coords(points):
        xs: list[int] = []
        ys: list[int] = []

        for point in points:
            xs.append(point[0])
            ys.append(point[1])

        return xs, ys

    def get_class_id(self, class_name):
        for class_info in self.class_info:
            if class_info['name'] == class_name:
                return class_info['id']

    def load(self, dataset_dir, subset):
        meta = json.load(open(path.join(dataset_dir, "meta.json")))
        for index, img_class in enumerate(meta["classes"], start=1):
            self.add_class(TapDataset.name, index, img_class["title"])

        subset_dir = path.join(dataset_dir, "split", subset)
        for image in scandir(subset_dir):
            if image.is_dir():
                continue

            annotation = json.load(open(path.join(subset_dir, "ann", f"{image.name}.json")))
            self.add_image(
                TapDataset.name,
                image_id=image.name,
                path=image.path,
                width=annotation["size"]["width"],
                height=annotation["size"]["height"],
                points=annotation['objects'][0]['points'],
                class_name=annotation['objects'][0]['classTitle'],
            )

    def load_mask(self, image_idx):
        image_info = self.image_info[image_idx]
        class_id = self.get_class_id(image_info['class_name'])

        if image_info["source"] != TapDataset.name:
            return super().load_mask(image_idx)

        mask = np.zeros([image_info["height"], image_info["width"], 1], dtype=np.uint8)
        ext_x_coords, ext_y_coords = TapDataset.split_coords(image_info['points']['exterior'])
        rr, cc = skimage.draw.polygon(ext_x_coords, ext_y_coords)
        mask[cc, rr, 0] = 1
        for polygon in image_info['points']['interior']:
            int_x_coords, int_y_coords = TapDataset.split_coords(polygon)
            rr, cc = skimage.draw.polygon(int_x_coords, int_y_coords)
            mask[cc, rr, 0] = 1

        return mask.astype(np.bool), np.array([class_id], dtype=np.int32)

    def image_reference(self, image_idx):
        info = self.image_info[image_idx]
        if info["source"] == "balloon":
            return info["save_water"]
        else:
            super().image_reference(image_idx)

In [None]:
!rm -rf '/content/Mask_RCNN/YOLO, so Save Water/split'
split_dataset('/content/Mask_RCNN/YOLO, so Save Water')

# Training dataset
dataset_train = TapDataset()
dataset_train.load('/content/Mask_RCNN/YOLO, so Save Water', 'train')
dataset_train.prepare()

# Validation dataset
dataset_val = TapDataset()
dataset_val.load('/content/Mask_RCNN/YOLO, so Save Water', 'val')
dataset_val.prepare()

In [None]:
# Load and display random samples
image_ids = np.random.choice(dataset_train.image_ids, 4)
for image_id in image_ids:
    image = dataset_train.load_image(image_id)
    mask, class_ids = dataset_train.load_mask(image_id)
    visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)

## Step 2. Model Preloading & Training

In [None]:
# Create model in training mode
model = modellib.MaskRCNN(mode="training", config=config,
                          model_dir=MODEL_DIR)

In [None]:
# Which weights to start with?
init_with = "last"  # imagenet, coco, or last

if init_with == "imagenet":
    model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with == "coco":
    # Load weights trained on MS COCO, but skip layers that
    # are different due to the different number of classes
    # See README for instructions to download the COCO weights
    model.load_weights(COCO_MODEL_PATH, by_name=True,
                       exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", 
                                "mrcnn_bbox", "mrcnn_mask"])
elif init_with == "last":
    # Load the last model you trained and continue training
    model.load_weights(model.find_last(), by_name=True)

## Training

Train in two stages:
1. Only the heads. Here we're freezing all the backbone layers and training only the randomly initialized layers (i.e. the ones that we didn't use pre-trained weights from MS COCO). To train only the head layers, pass `layers='heads'` to the `train()` function.

2. Fine-tune all layers. For this simple example it's not necessary, but we're including it to show the process. Simply pass `layers="all` to train all layers.

In [None]:
# Train the head branches
# Passing layers="heads" freezes all layers except the head
# layers. You can also pass a regular expression to select
# which layers to train by name pattern.
model.train(dataset_train, dataset_val, 
            learning_rate=config.LEARNING_RATE, 
            epochs=1, 
            layers='heads')

In [None]:
# Fine tune all layers
# Passing layers="all" trains all layers. You can also 
# pass a regular expression to select which layers to
# train by name pattern.
model.train(dataset_train, dataset_val, 
            learning_rate=config.LEARNING_RATE / 10,
            epochs=2, 
            layers="all")

## Step 3. Detection

We will recreate the model in inference mode and preload it with the last saved weights.

In [None]:
inference_config = TapsConfig()

# Recreate the model in inference mode
model = modellib.MaskRCNN(mode="inference", 
                          config=inference_config,
                          model_dir=MODEL_DIR)

# Get path to saved weights
# Either set a specific path or find last trained weights
# model_path = os.path.join(ROOT_DIR, ".h5 file name here")
model_path = model.find_last()

# Load trained weights
print("Loading weights from ", model_path)
model.load_weights(model_path, by_name=True)

Below is a model example – manually annotated. This is what we want the model to produce.

In [None]:
# Test on a random image
image_id = random.choice(dataset_val.image_ids)
original_image, image_meta, gt_class_id, gt_bbox, gt_mask =\
    modellib.load_image_gt(dataset_val, inference_config, 
                           image_id, use_mini_mask=False)

log("original_image", original_image)
log("image_meta", image_meta)
log("gt_class_id", gt_class_id)
log("gt_bbox", gt_bbox)
log("gt_mask", gt_mask)

visualize.display_instances(original_image, gt_bbox, gt_mask, gt_class_id, 
                            dataset_train.class_names, figsize=(8, 8))

And here is what the model actually produces for that image:

In [None]:
results = model.detect([original_image], verbose=1)

r = results[0]
visualize.display_instances(original_image, r['rois'], r['masks'], r['class_ids'], 
                            dataset_val.class_names, r['scores'], ax=get_ax())

It leaves a lot to be desired, but nonetheless seems to perform reasonably well at least in the generation of masks. It seems like the model does better at images of taps with running water.

In [None]:
# Compute VOC-Style mAP @ IoU=0.5
# Running on 10 images. Increase for better accuracy.
image_ids = np.random.choice(dataset_val.image_ids, 10)
APs = []
for image_id in image_ids:
    # Load image and ground truth data
    image, image_meta, gt_class_id, gt_bbox, gt_mask =\
        modellib.load_image_gt(dataset_val, inference_config,
                               image_id, use_mini_mask=False)
    molded_images = np.expand_dims(modellib.mold_image(image, inference_config), 0)
    # Run object detection
    results = model.detect([image], verbose=0)
    r = results[0]
    # Compute AP
    AP, precisions, recalls, overlaps =\
        utils.compute_ap(gt_bbox, gt_class_id, gt_mask,
                         r["rois"], r["class_ids"], r["scores"], r['masks'])
    APs.append(AP)
    
print("mAP: ", np.mean(APs))