Hello fellow Kagglers,

The Sartorius cell instance segmentation competition introduced me to Mask-RCNN. As many of you, I used the Tensorflow 2.0 compatible [Mask-RCNN](https://github.com/leekunhee/Mask_RCNN/tree/tensorflow2.0) by leekunhee. Although easy to use, this library is CPU heavy, resulting in extremely long training times on Kaggle GPU notebook with 2 CPU cores. Over the past weeks I worked on improving the training efficiency which resulted in a massive reduction in training time, accomplished by 2 key improvements:

1. Caching computed training samples
2. Replacing Resnet with the modern [EfficientNetV2](https://github.com/google/automl/tree/master/efficientnetv2) models

This notebook demonstrates the training process of a Mask-RCNN model with an EfficientNetV2 model as backbone and a caching mechanism. Many improvements are possible, such as increasing the number of epochs or training/finetuning with a larger batch size. The fast training time should allow to experiment with different configurations.

The EfficientNetV2 models are efficient in terms of parameters, FLOPS and training time. The Mask-RCNN configuration used in this notebook has over 3 times less parameters than the conventional ResNet50 configuration. This allows for faster training with larger batch sizes and prevents overfitting, a key problem with just 660 training samples.

The caching mechanism will cache each sample using the lightning fast [LZ4](https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)) compression algorithm. After each epoch the number of cache hits increases, improving training time. After around 10 epochs most samples are cached and a training epoch with the complete training dataset takes less than 3 minutes!

Possible drawbacks are the restriction to just horizontal/vertical flips as data augmentation and not being able to load pretrained COCO weights. The EfficientNetV2 models could of course be pretrained on COCO, pretrained weights might be added in the future.

The inference notebook can be found [here](https://www.kaggle.com/markwijkhuizen/sartorius-mask-rcnn-efficientnetv2-inference).

The dataset containing the Mask-RCNN EfficientNetV2 model with caching can be found [here](https://www.kaggle.com/markwijkhuizen/maskrcnn-tf-2-efficientnetv2-caching).

**V2 Updates**

- Increased prediction mask size from 28x28 to 56x56. Added functionality that automatically adds N (Conv2D, BatchNorm, ReLu and Conv2DTranspose) layers for a mask of size 28Nx28N in the configuration.
- Increased FPN_CLASSIF_FC_LAYERS_SIZE from 128 to the original 1024 as this improves performance.
- Decreased DETECTION_MIN_CONFIDENCE from 0.70 -> 0.50
- Multi class prediction, meaning each instance is classified as "astro", "cort" or "shsy5y" and not just "cell". This is particularly import for improving the inference process, but more on this in the [inference notebook](https://www.kaggle.com/markwijkhuizen/sartorius-mask-rcnn-efficientnetv2-inference).
- Loading EfficientNetV2 pretrained weights on ImageNet21K

In [None]:
# Library to silence Tensorflow Logs
!pip install -q silence-tensorflow
import silence_tensorflow.auto

In [None]:
# Add Folders to path, required to import them
import sys
sys.path.append('../input/maskrcnn-tf-2-efficientnetv2-caching/Instance_Segmentation/efficientnetv2')
sys.path.append('../input/maskrcnn-tf-2-efficientnetv2-caching/Instance_Segmentation/Mask_RCNN')

In [None]:
# Install LZ4 Compression/Decompression Library
!pip install -q ../input/maskrcnn-tf-2-efficientnetv2-caching/lz4-3.1.3-cp37-cp37m-manylinux1_x86_64.whl

In [None]:
import matplotlib.pyplot as plt
import mrcnn.utils as utils
import mrcnn.model as modellib
import numpy as np
import pandas as pd
import tensorflow as tf

import os
import sys
import json
import time
import skimage
import imageio
import glob
import imgaug
import multiprocessing

from PIL import Image, ImageDraw
from tqdm.notebook import tqdm
from sklearn.model_selection import KFold
from PIL import Image, ImageEnhance
from mrcnn.config import Config
from mrcnn import visualize

# ignore warnings to make outputs clearer
import warnings
warnings.filterwarnings('ignore')

print(f'Python Version: {sys.version}')
print(f'Tensorflow Version: {tf.__version__}')
print(f'Tensorflow Keras Version: {tf.keras.__version__}')

In [None]:
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

In [None]:
train = pd.read_csv('../input/sartorius-cell-instance-segmentation/train.csv')

# Unique Image IDs
id_unique = train['id'].unique()

# Original Image File Path
def get_file_path(image_id):
    return f'/kaggle/input/sartorius-cell-instance-segmentation/train/{image_id}.png'

train['file_path'] = train['id'].apply(get_file_path)

# Unique Cell Names
CELL_NAMES = np.sort(train['cell_type'].unique())
print(f'CELL_NAMES: {CELL_NAMES}')

# Cell Type to Label Dictionary
CELL_NAMES_DICT = dict([(v, k) for k, v in enumerate(CELL_NAMES)])
print(f'CELL_NAMES_DICT: {CELL_NAMES_DICT}')

# Add Cell Type Label to train, " + 1" becaue label 0 is reserved for background
train['cell_type_label'] = train['cell_type'].apply(CELL_NAMES_DICT.get) + 1

# Image Id to Cell Type Label Dictionary
ID2CELL_LABEL = dict(
    [(k, v) for k, v in train[['id', 'cell_type_label']].itertuples(name=None, index=False)]
)

In [None]:
# path to COCO-dataset weights
COCO_MODEL_PATH = '../input/maskrcnn-tf-2-efficientnetv2-caching/mask_rcnn_coco.h5'

# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

# Training Configuration

In [None]:
# Original Image Dimensions
HEIGHT = 520
WIDTH = 704
SHAPE = (HEIGHT, WIDTH)

# Target Image Dimensions which are divisable by 64 as required by the MASK-RCNN model
HEIGHT_TARGET = 576
WIDTH_TARGET = 704
SHAPE_TARGET = (HEIGHT_TARGET, WIDTH_TARGET)

BATCH_SIZE = 1
N_SAMPLES = train['id'].nunique()

# Debug mode for fast experementing with 50 samples
DEBUG = False
DEBUG_SIZE = 50

In [None]:
EPOCHS_ALL = 10 if DEBUG else 20

# Model Configuration

1. Class classification size is reduced from 1024 to 128 to reduce model size, which is done to reduce overfitting. The COCO dataset contains 80 object classes, whereas this training method only contains a single "cell" class, requiring less parameters.
2. Disable multithreading for data generators by setting workers to 0. Surprisingly, multithreading is slower than running the data generator on a single core.

In [None]:
class CellConfig(Config):
    """Configuration for training on the cigarette butts dataset.
    Derives from the base Config class and overrides values specific
    to the cigarette butts dataset.
    """
    
    NAME = "cell"

    # Set batch size to 1.
    GPU_COUNT = 1
    IMAGES_PER_GPU = BATCH_SIZE
    STEPS_PER_EPOCH = int(DEBUG_SIZE / BATCH_SIZE)  if DEBUG else int(N_SAMPLES / BATCH_SIZE)
    
    # Number of Classes
    NUM_CLASSES = 1 + len(CELL_NAMES)

    # Image Dimensions
    IMAGE_MIN_DIM = HEIGHT_TARGET
    IMAGE_MAX_DIM = WIDTH_TARGET
    IMAGE_SHAPE = [HEIGHT_TARGET, WIDTH_TARGET, 3]
    IMAGE_RESIZE_MODE = 'none'
    
    BACKBONE = 'efficientnetv2-b0'

    # Training Structure
    FPN_CLASSIF_FC_LAYERS_SIZE = 1024
    RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)
    # Regions of Interest
    PRE_NMS_LIMIT = 6000
    # Non Max Supression
    POST_NMS_ROIS_TRAINING = 2000
    POST_NMS_ROIS_INFERENCE = 2000
    # Instances
    MAX_GT_INSTANCES = 790
    TRAIN_ROIS_PER_IMAGE = 200
    DETECTION_MAX_INSTANCES = 200
    
    # Thresholds
    RPN_NMS_THRESHOLD = 0.70        # IoU Threshold for RPN proposals and GT
    DETECTION_MIN_CONFIDENCE = 0.50 # Non-Background Confidence Threshold
    DETECTION_NMS_THRESHOLD = 0.30  # IoU Threshold for ROI and GT
    ROI_POSITIVE_RATIO = 0.33
    
    # Prediction Mask Shape
    MASK_SHAPE = (56, 56)
    # Size of mask groundtruth
    USE_MINI_MASK = True
    MINI_MASK_SHAPE = (112, 112)
    
    # DO NOT train Batch Normalization because of small batch size
    # There are too few samples to correctly train the normalization
    TRAIN_BN = False
    
    # Learning Rate
    LEARNING_RATE = 0.004
    WEIGHT_DECAY = 0.0
    N_WARMUP_STEPS = 2
    LR_SCHEDULE = True
    
    # Dataloader Queue Size (was set to 100 but resulted in OOM error)
    MAX_QUEUE_SIZE = 10
    
    # Cache Items
    CACHE = True
    
    # Debug mode will disable model checkpoints
    DEBUG = False
    
    # Do not use multithreading as this slows down the dataloader!
    WORKERS = 0
    
    # Losses
    LOSS_WEIGHTS = {
        'rpn_class_loss': 1.0,    # is the class of the bbox correct? / RPN anchor classifier loss (Forground/Background)
        'rpn_bbox_loss': 1.0,     # is the size of the bbox correct? / RPN bounding box loss graph (bbox of generic object)
        'mrcnn_class_loss': 1.0,  # loss for the classifier head of Mask R-CNN (Background / specific class)
        'mrcnn_bbox_loss': 1.0,   # is the size of the bounding box correct or not? / loss for Mask R-CNN bounding box refinement
        'mrcnn_mask_loss': 1.0,   # is the class correct? is the pixel correctly assign to the class? / mask binary cross-entropy loss for the masks head
    }
    
config = CellConfig()
config.display()

# RLE Decode

In [None]:
# ref: https://www.kaggle.com/paulorzp/run-length-encode-and-decode
def rle_decode_by_image_id(image_id):
    rows = train.loc[train['id'] == image_id]
    
    # Image Shape
    mask = np.full(shape=[len(rows), np.prod(SHAPE)], fill_value=0, dtype=np.uint8)
    
    for idx, (_, row) in enumerate(rows.iterrows()):
        s = row['annotation'].split()
        starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
        starts -= 1
        ends = starts + lengths
        for lo, hi in zip(starts, ends):
            mask[idx, lo:hi] = True
    
    mask = mask.reshape([len(rows), *SHAPE])
    mask = np.moveaxis(mask, 0, 2)
    
    return mask

# Create Training Dataset

In [None]:
# Function to pad images and masks
def pad_image(image, constant_values):
    pad_h = (HEIGHT_TARGET - HEIGHT) // 2
    pad_w = (WIDTH_TARGET - WIDTH) // 2
    
    if len(image.shape) == 3:
        return np.pad(image, ((pad_h, pad_h), (pad_w, pad_w), (0,0)), constant_values=constant_values)
    else:
        return np.pad(image, ((pad_h, pad_h), (pad_w, pad_w)), constant_values=constant_values)

In [None]:
!rm -rf ./train ./test
!mkdir ./train ./test

In [None]:
id_unique = train['id'].unique()
if DEBUG:
    id_unique = id_unique[:DEBUG_SIZE]

image_id2file_path = train.groupby('id')[['id', 'file_path']].head(1)
image_id2file_path = image_id2file_path.set_index('id').squeeze().to_dict()

# Create padded training samples with enhanced contrast
for image_id in tqdm(id_unique):
    # Read Original Image
    image = imageio.imread(image_id2file_path[image_id])
    # Pad Image
    image = pad_image(image, 128)
    
    # Save image in working directory, required for Mask-RCNN
    imageio.imwrite(f'./train/{image_id}.png', image)

# Dataset

In [None]:
class CellDataset(utils.Dataset):

    def load_data(self, image_ids, form, image_group):
   
        for i, name in enumerate(CELL_NAMES):
            self.add_class('cell', 1 + i, name)
       
        # Add the image using the base method from utils.Dataset
        for vertical_flip in [True, False]:
            for horizontal_flip in [True, False]:
                for image in tqdm(image_ids):
                    self.add_image('cell', 
                           image_id=image, 
                           path=(f'./{image_group}/{image}.png'), 
                           label = ID2CELL_LABEL[image],
                           height=512, width=512,
                          vertical_flip=vertical_flip, horizontal_flip=horizontal_flip,
                      )
            
            
    def load_mask(self, image_id):
        """ Load instance masks for the given image.
        MaskRCNN expects masks in the form of a bitmap [height, width, instances].
        Args:
            image_id: The id of the image to load masks for
        Returns:
            masks: A bool array of shape [height, width, instance count] with
                one mask per instance.
            class_ids: a 1D array of class IDs of the instance masks.
        """
    
        info = self.image_info[image_id]
        image_id = info['id']
    
        # Get masks by image_id
        masks = rle_decode_by_image_id(image_id)
        masks = pad_image(masks, 0)

        # Get label
        _, _, size = masks.shape
        label = info['label']
        class_ids = np.full(size, label, dtype=np.int32)
        
        return masks, class_ids

In [None]:
# Create Training Dataset
dataset_train = CellDataset()
dataset_train.load_data(id_unique, 'png', 'train')
dataset_train.prepare()

In [None]:
# Plot Training Samples with Target Masks, note the instance classes!
dataset = dataset_train
image_ids = np.random.choice(dataset.image_ids, 5)
for image_id in image_ids:
    image = dataset.load_image(image_id)
    mask, class_ids = dataset.load_mask(image_id)
    visualize.display_top_masks(image, mask, class_ids, dataset.class_names)

# Model

In [None]:
# Create model in training mode
!mkdir 'model_checkpoints'
model = modellib.MaskRCNN(mode="training", config=config, model_dir='model_checkpoints')

In [None]:
init_with = "coco"
exclude = ["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"]
if 'efficientnetv2-b' in config.BACKBONE:
    exclude += [
        "fpn_c5p5", "fpn_c4p4", "fpn_c3p3", "fpn_c2p2",
    ]
    
# Excluce FC layer if it is not the original size
if config.FPN_CLASSIF_FC_LAYERS_SIZE != 1024:
    print(f'Excluding FC layer')
    exclude += [
        "mrcnn_class_conv1", "mrcnn_class_bn1", "mrcnn_class_conv2", "mrcnn_class_bn2",
    ]
    
# using coco weights
model.load_weights(
    COCO_MODEL_PATH,
    by_name=True,
    exclude=exclude,
)

In [None]:
# Load EfficientNetV2 Weights Pretrained on Imagenet21K 
model.keras_model.layers[1].load_weights('/kaggle/input/maskrcnn-tf-2-efficientnetv2-caching/Instance_Segmentation/efficientnetv2_model_checkpoints/efficientnetv2-b0-imagenet21k.h5')

# Model Summary

In [None]:
# Added functionality, plot model summary
model.show_summary()

In [None]:
# Show Mask-RCNN Architecture
plt.figure(figsize=(25, 10))
plt.title('Mask-RCNN Model Architecture')
plt.imshow(imageio.imread('./model.png'))
plt.axis(False)
plt.show()

# Learning Rate Scheduler

In [None]:
# Added functionality, show learning rate schedule
model.plot_lr_schedule(EPOCHS_ALL)

# Training Whole Model

here the actual training happens. Training will be increasingly fast as the training samples are cached and cache hits increase.

In [None]:
start_train = time.time()
history = model.train(
    dataset_train, None, 
    learning_rate=config.LEARNING_RATE,
    epochs=EPOCHS_ALL, 
    layers="all",
    augmentation=None,
)

end_train = time.time()
minutes = round((end_train - start_train) / 60, 2)
print(f'Training took {minutes} minutes')

# Training History

In [None]:
# Plots a metric
def plot_history_metric(metric, f_best=np.argmax):
    values = history.history[metric]
    plt.figure(figsize=(15, 8))
    N_EPOCHS = len(values)
    # Epoch Ticks
    if N_EPOCHS <= 20:
        x = np.arange(1, N_EPOCHS + 1)
    else:
        x = [1, 5] + [10 + 5 * idx for idx in range((N_EPOCHS - 10) // 5 + 1)]
    x_ticks = np.arange(1, N_EPOCHS+1)
        
    # summarize history for accuracy
    plt.plot(x_ticks, values, label='train')
    argmin = f_best(values)
    plt.scatter(argmin + 1, values[argmin], color='red', s=75, marker='o', label='train_best')
    
    plt.title(f'Model {metric}', fontsize=24, pad=10)
    plt.ylabel(metric, fontsize=20, labelpad=10)
    plt.xlabel('epoch', fontsize=20, labelpad=10)
    plt.tick_params(axis='x', labelsize=8)
    plt.xticks(x, fontsize=16) # set tick step to 1 and let x axis start at 1
    plt.yticks(fontsize=16)
    plt.legend(prop={'size': 18})
    plt.grid()

In [None]:
# Mean loss
plot_history_metric('loss', f_best=np.argmin)

In [None]:
# Region Proposal Network Foreground / Background Classifier
plot_history_metric('rpn_class_loss', f_best=np.argmin)

In [None]:
# Region Proposal Network Bounding Box Loss
plot_history_metric('rpn_bbox_loss', f_best=np.argmin)

In [None]:
# Mask RCNN Head Class Classifier Background / specific class
plot_history_metric('mrcnn_class_loss', f_best=np.argmin)

In [None]:
# # Mask RCNN Head Bounding Box Loss
plot_history_metric('mrcnn_bbox_loss', f_best=np.argmin)

In [None]:
# Mask RCNN Head Object Mask Binary Cross Entropy Loss
plot_history_metric('mrcnn_mask_loss', f_best=np.argmin)

# Inference

In [None]:
class InferenceConfig(CellConfig):
    IMAGES_PER_GPU = 1
    DETECTION_MAX_INSTANCES = 200
    DETECTION_MIN_CONFIDENCE = 0.70
    USE_MINI_MASK = False
    

inference_config = InferenceConfig()
inference_config.display()

In [None]:
# Recreate the model in inference mode
model = modellib.MaskRCNN(mode="inference", config=inference_config, model_dir='model_checkpoints')

In [None]:
# Set EfficientNetV2 head untrainable
if 'efficientnetv2-b' in  inference_config.BACKBONE:
    model.keras_model.layers[1].layers[-1].trainable = False

In [None]:
# Find last epoch
model_path = model.find_last()

# Load trained weights (fill in path to trained weights here)
print("Loading weights from ", model_path)
model.load_weights(model_path, by_name=True)

# Visualize Train Predictions

In [None]:
for file_path in glob.glob('./train/*.png')[:25]:
    img = skimage.io.imread(file_path)
    img = np.expand_dims(img, axis=2)
    img = np.concatenate((img, img, img), axis=2)
    results = model.detect([img], verbose=1)
    r = results[0]
    
    # Image Id
    image_id = file_path.split('/')[-1].split('.')[0]
    print(f'image_id: {image_id}')
    
    mask = rle_decode_by_image_id(image_id)
    mask = np.sum(mask, axis=2)
    plt.figure(figsize=(16,16))
    plt.imshow(mask)
    plt.show()
    
    visualize.display_instances(
        img,
        r['rois'],
        r['masks'],
        r['class_ids'], 
        ['BG'] + CELL_NAMES.tolist(),
        r['scores'],
        figsize=(16,16)
    )