This tutorial will guide you to train Mask R-CNN to detect a new object.




---
## Gather and annotate training data

The first step to training a new model is to create a training data set. For Mask R-CNN, we need images of scooters where each scooter has been traced out of the background. E.g.

![scooter1.jpg](https://drive.google.com/uc?export=view&id=1ezxI6viuhMpTZS0_iPtXBdm5g2w29CPw)  
This means creating training data for an image segmentation model is a lot more time-consuming than creating training data for an image classification model. For image segmentation, we need to literally trace out the outline of each scooter. 

The easiest way to trace out each of our images is to use an annotation program specifically built for this purpose. There are many to choose from, e.g. makesense.io.  

*	The VGG dataset provides an open source annotation tool that works on any operating system. 
*	The COCO dataset also provides an annotation tool. 
*	In addition, there are paid services like Labelbox that offer cloud-based annotation tools designed to help coordinate between multiple people annotating the same dataset. 

Whichever tool you use, you will want to spend a little time learning all the hotkeys for functions like drawing a new line, saving your drawings, and moving to the next image. Saving a few seconds on each image can really add up when you are tracing a lot of images. 

To collect the images of scooters, you can walked around your neighbourhood and took snapshots with your cell phone. To make sure the final model will work as well as possible, it’s important to take pictures of scooters in front of lots of different kinds of backgrounds. Since the model is learning how to tell the object apart from the background, the more kinds of backgrounds you train it on, the better it can learn. 

Tracing images is boring, but it’s important that you do as good a job as you can. If you trace the objects poorly, the final model won’t be very accurate. 

But we don’t necessarily need thousands of training images. To reduce the amount of training data we need, we can use the same transfer learning ideas we’ve used before! We can start with the pre-trained COCO model that we used in the last project and tweak it to detect scooters with only a small number of training images. Remember that Mask R-CNN is built using a complex configuration of CNNs, so everything we’ve learned about CNNs still applies. 

I’ve provided all the annotations I created so you don’t have to create any of your own to try out this project. But if you want to re-use this code to create your own model, you will have to annotate images yourself. 



---
### Mount to Google Drive
Mount to your Google Drive to access the images and model files.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

dataset_location = "/content/drive/My Drive/Crafting/ADLCV/dataset/"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Install Matterport Mask-RCNN in Google Colab

In [2]:
%tensorflow_version 1.x

TensorFlow 1.x selected.


In [3]:
%cd /content
!git clone https://github.com/matterport/Mask_RCNN
%cd Mask_RCNN
!pip3 install -r requirements.txt
!python3 setup.py install

/content
fatal: destination path 'Mask_RCNN' already exists and is not an empty directory.
/content/Mask_RCNN
running install
running bdist_egg
running egg_info
writing mask_rcnn.egg-info/PKG-INFO
writing dependency_links to mask_rcnn.egg-info/dependency_links.txt
writing top-level names to mask_rcnn.egg-info/top_level.txt
reading manifest template 'MANIFEST.in'
writing manifest file 'mask_rcnn.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/mrcnn
copying build/lib/mrcnn/visualize.py -> build/bdist.linux-x86_64/egg/mrcnn
copying build/lib/mrcnn/parallel_model.py -> build/bdist.linux-x86_64/egg/mrcnn
copying build/lib/mrcnn/model.py -> build/bdist.linux-x86_64/egg/mrcnn
copying build/lib/mrcnn/utils.py -> build/bdist.linux-x86_64/egg/mrcnn
copying build/lib/mrcnn/__init__.py -> build/bdist.linux-x86_64/egg/mrcnn
copying build/lib/mrcnn/config.py 

In [4]:
!pip3 install xmltodict
!pip3 install keras==2.1.0



In [5]:
import os

if os.getcwd() != "/content/Mask_RCNN":
  os.chdir("/content/Mask_RCNN")

## Imports libraries and define directories

In [6]:
import os
import sys
import random
import math
import warnings
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt

# Root directory of the project
ROOT_DIR = os.path.abspath("")

# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
from mrcnn.config import Config
sys.path.append(os.path.join(ROOT_DIR, "samples/coco/"))  # To find local version
import coco

%matplotlib inline 

# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

# Directory of images to run detection on
#IMAGE_DIR = os.path.join(ROOT_DIR, "images")

# Directory of the training images
TRAINING_DATASET_PATH = os.path.join(dataset_location, "MaskRCNN/scooter_training")

# Start training from the pre-trained COCO model. 
WEIGHTS_TO_START_FROM = COCO_MODEL_PATH
#You can change this path if you want to pick up training from a prior
# checkpoint file in your ./logs folder.
# WEIGHTS_TO_START_FROM = os.path.join(MODEL_DIR, "<h5_file_in_log>.h5")

Using TensorFlow backend.


## Configurations

Instead of using the configurations of the COCO dataset, we need to manually define the configurations of out project. 

In [7]:
class InferenceConfig(Config):
    NAME = "custom_object"
    IMAGES_PER_GPU = 1
    NUM_CLASSES = 1 + 1  # Background + your custom object
    STEPS_PER_EPOCH = 100

    # Skip detections with < 90% confidence
    DETECTION_MIN_CONFIDENCE = 0.9

config = InferenceConfig()
config.display()


Configurations:
BACKBONE                       resnet101
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     1
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.9
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 1
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                14
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE         

In [8]:
class_names = ['BG', 'scooter']

---
## Parsing the Annotation Files.

Now we have to deal with one of the annoying issues with building this kind of model. The problem is that every annotation program has its own preferred format for storing annotation data. The annotation tool that I used, RectLabel, creates data in the same format as the ***Pascal VOC dataset*** but the COCO dataset uses a different data format. There’s no common standard that every program or model uses. 

The Matterport Mask R-CNN implementation handles this by making you write a little bit of custom code to parse your annotation files. You need to subclass a Dataset class and re-implement the methods that read the annotation files, parse the contents, and generate masks from the polygons. 

Here’s the code to read RectLabel annotation files. First, we’ll define our custom dataset class and configure the folders we’ll pull our training data from. 


In [9]:
from pathlib import Path
import xmltodict

class RectLabelDataset(utils.Dataset):

    def load_training_images(self, dataset_dir, subset):
        dataset_dir = os.path.join(dataset_dir,subset)
        annotation_dir = os.path.join(dataset_dir, "annotations")

        # Add classes. We have only one class to add since this model only detects one kind of object.
        self.add_class("custom_object", 1, "custom_object")
        # Load each image by finding all the RectLabel annotation files and working backwards to the image.
        # This is a lot faster then having to load each image into memory.
        count = 0
        annotation_dir_path = Path(annotation_dir)
        for annotation_file in annotation_dir_path.glob("*.xml"):
            print(f"Parsing annotation: {annotation_file}")
            xml_text = annotation_file.read_text()
            annotation = xmltodict.parse(xml_text)['annotation']
            objects = annotation['object']
            image_filename = annotation['filename']
            if not isinstance(objects, list):
                objects = [objects]
            # Add the image to the data set
            self.add_image(
                source="custom_object",
                image_id=count,
                path=os.path.join(dataset_dir, image_filename),
                objects=objects,
                width=int(annotation["size"]['width']),
                height=int(annotation["size"]['height']),
            )
            count += 1

    def load_mask(self, image_id):
        # We have to generate our own bitmap masks from the RectLabel polygons.

        # Look up the current image id
        info = self.image_info[image_id]
        # Create a blank mask the same size as the image with as many depth channels as there are
        # annotations for this image.
        mask = np.zeros([info["height"], info["width"], len(info["objects"])], dtype=np.uint8)
        # Loop over each annotation for this image. Each annotation will get it's own channel in the mask image.
        for i, o in enumerate(info["objects"]):
            # RectLabel uses Pascal VOC format which is kind of wacky.
            # We need to parse out the x/y coordinates of each point that make up the current polygon
            ys = []
            xs = []
            for label, number in o["polygon"].items():
                number = int(number)
                if label.startswith("x"):
                    xs.append(number)
                else:
                    ys.append(number)

            # Draw the filled polygon on top of the mask image in the correct channel
            rr, cc = skimage.draw.polygon(ys, xs)
            mask[rr, cc, i] = 1
        # Return mask and array of class IDs of each instance. Since we have
        # one class ID only, we return an array of 1s
        return mask.astype(np.bool), np.ones([mask.shape[-1]], dtype=np.int32)

    def image_reference(self, image_id):
        # Get the path for the image
        info = self.image_info[image_id]
        return info["path"]

## Create Model and Load Trained Weights

In [10]:
# Create model object in inference mode.
model = modellib.MaskRCNN(mode="training", model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO
# Load the weights we are going to start with
# Note: If you are picking up from a training checkpoint instead of the COCO weights, remove the excluded layers.
model.load_weights(WEIGHTS_TO_START_FROM, by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])






Instructions for updating:
keep_dims is deprecated, use keepdims instead
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Instructions for updating:
box_ind is deprecated, use box_indices instead








---
## Run the training


In [11]:
# Load the training data set
dataset_train = RectLabelDataset()
dataset_train.load_training_images(TRAINING_DATASET_PATH, "training_set")
dataset_train.prepare()

# Load the validation data set
dataset_val = RectLabelDataset()
dataset_val.load_training_images(TRAINING_DATASET_PATH, "validation_set")
dataset_val.prepare()

with warnings.catch_warnings():
    # Suppress annoying skimage warning due to code inside Mask R-CNN.
    # Not needed, but makes the output easier to read until Mask R-CNN is updated.
    warnings.simplefilter("ignore")

    # Re-train the model on a small data set. If you are training from 
    # scratch with a huge data set,
    # you'd want to train longer and customize these settings.
    model.train(
        dataset_train,
        dataset_val,
        learning_rate=config.LEARNING_RATE,
        epochs=30,
        layers='heads'
    )


Parsing annotation: /content/drive/My Drive/Crafting/ADLCV/dataset/MaskRCNN/scooter_training/training_set/annotations/20180816_125220.xml
Parsing annotation: /content/drive/My Drive/Crafting/ADLCV/dataset/MaskRCNN/scooter_training/training_set/annotations/20180816_125259.xml
Parsing annotation: /content/drive/My Drive/Crafting/ADLCV/dataset/MaskRCNN/scooter_training/training_set/annotations/20180816_125453.xml
Parsing annotation: /content/drive/My Drive/Crafting/ADLCV/dataset/MaskRCNN/scooter_training/training_set/annotations/20180816_133547.xml
Parsing annotation: /content/drive/My Drive/Crafting/ADLCV/dataset/MaskRCNN/scooter_training/training_set/annotations/20180816_133629.xml
Parsing annotation: /content/drive/My Drive/Crafting/ADLCV/dataset/MaskRCNN/scooter_training/training_set/annotations/20180816_133602.xml
Parsing annotation: /content/drive/My Drive/Crafting/ADLCV/dataset/MaskRCNN/scooter_training/training_set/annotations/20180816_125443.xml
Parsing annotation: /content/drive

---
## Copy the trained model to Google Drive

In [12]:
import shutil
# Use the Files explorer of Google Colaboratory 
# to obtain the path of the coustomized h5 file.
H5_PATH = "<Paste the path here>"
shutil.copy(H5_PATH, 
            os.path.join(TRAINING_DATASET_PATH,'customized_model.h5'))

'/content/drive/My Drive/Crafting/ADLCV/dataset/MaskRCNN/scooter_training/customized_model.h5'