
# Unit 28 - Project: Object Detection using RetinaNet

**Project**: Object Detection, Classification and Labeling using RetinaNet


## Course: Fall 2018, Deep Learning
Professor: **Dr. James Shanahan**

Students: **Gelesh Omathil and Murali Cheruvu**

University: **Indiana University**


**RetinaNet**: Introduction: https://arxiv.org/pdf/1708.02002.pdf

**Dataset**: 
-	Use **COCO Dataset** (http://cocodataset.org/#home) (~100k images) for training, validation and test datasets 
-	About 1MM bounding boxes; some of the images have about 10 classes in them
-	We have 80 classes in this database
-	Our focus is only 7 classes - **car, truck, person, bus, bycle and traffic sign**

**Cloud**: Cloud Provider Server with Linux/Ubuntu Box with **GPU**s

**Project**: **Train RetinaNet Dataset - Object Detectors**

-	Use this notebook as a base: https://github.com/fizyr/keras-retinanet
-	Use Transfer Learning (load the weights from pre-trained models)
-	Use base model on the pre-trained model from RetinaNet but focus on only 7 classes (all the other classes be treated like background images)
-	Retrain part of the network (about 6 key layers) from the transferred learning state using ResNet50/ResNet101 as a back-bone - with focus on 7 classes, so that we will recalibrate our model
-	Predict bounding boxes, predict classes - 8 classes (7 + background class), 8X5 outputs
-	Try out - output layers of different resolutions - ex: 56X56, 28x28, 14x14 (feature pyramid)
-	For each feature pyramid, we will have output layer with loss function
-	Try out with smaller epochs with CPU and full blown using GPUs



## Python Libraries

- Python 3.6 
- Keras 2.2.4+
- TensorFlow (CPU and GPU)

## Introduction

Recognizing an object from an image has always been a very challenging task. If we need detect multiple objects from the same image, it is even more difficult. Purpose of Computer Vision is to solve such complex tasks. With the emergence of Neural Network driven Machine Learning algorithms, there are better ways to tackle these tasks. 

Object detection architecture is categorized into two types: two-stage and single-stage. 

Two-stage object detectors organize the image into two parts: foreground and background. Then, all the foreground objects are classified into more fine grained classes: car, truck, person, bus, bycle, etc. 

Convolutional Neural Network (CNN), Deep Learning, is an advanced neural network concept to perfectly handle these challenges.

We present three techniques here - (1) Region-based CNN (R-CNN), (2)  Fast R-CNN and (3) Regional Proposal Network (RPN) (Ref: @Guide-DL).

1. **Region-based CNN**

![image.png](img/r-cnn.png)(Ref: @Guide-DL)


2. **Fast R-CNN**

![image.png](img/faster-r-cnn.png) (Ref: @Guide-DL)


3. **Regional Proposal Network**

![](img/reg_prop_1.png) (Ref: @Guide-DL)

![image.png](img/reg_prop_2.png) (Ref: @Guide-DL)

## RetinaNet - One-Stage Detector

Most of the popular object detector algorithms are based on R-CNN with two-stage detection and give highest possible accuracy.
However, two-stage detection algorithms are slower due to complex processing in a iterative manner. 

Recent work to improve the performance of the algorithms, one-stage detectors come to popularity. **OverFeat** and **YOLO** (You Only Look Once) have achieved faster detection with 10%-40% accuracy relative to two-stage detectors. 

RetinaNet, Focal Loss for Dense Object Detection, is a project done by Facebook AI Research team, has proposed one-stage detector with hybrid approaches from two-stage detectors such as Feature Pyramid Network (FPN) and Mask R-CNN, to achieve the accuracy comparable with two-stage detectors. RetinaNet offers best of both single-stage and two-stage detectors.

Following figure shows the comparison of various object detection algorithms including RetinaNet:

![image.png](img/retinanet-compare.png) (Ref: @ObjDetect)

Some of the key aspects are listed as follows:

- First-pass detection, class imbalance and inefficiency is addressed using techniques such as bootstrapping and hard example mining
- Proposed a new loss function, **Focal Loss**, dynamically scaled cross entropy loss to deal with class imbalance using intuitive scaling factor to down-weight the contribution of easy samples automatically while focusing on the hard samples


(Ref: @RetinaNet-Intro)

##
# RetinaNet Components

"**RetinaNet is a single, unified network composed of a Feature Pyramid (backbone) network and two task-specific sub-networks**" (Ref: @RetinaNet-Intro)

![RetinaNet](img/retinanet.png "Title") (Ref: @RetinaNet-Intro)




## ResNet, CNN Network as Backbone

ResNet-50 is a popular convolutional neural network for images. It processes images by going through several convolutional filters/kernels to create various feature-maps of the images to capture high level features, then it goes down into details with smaller feature maps by using pooling layers. 

## Feature Pyramid Network

RetinaNet adds a Feature Pyramid Network (FPN), instead of, the typical classifier. Thus, RetinaNet collects feature maps at various layers from the ResNet and provides complex features at different scales. It is called pyramid network because it detects objects at different scales at different levels as it goes up in the pyramid. 

## Anchor Boxes

An anchor is a rectangle box with different sizes and ratios. At each FPN level, anchors are created in association with feature maps, covering each potential object. 

Each FPN level goes through two fully convolutional networks (FCN), first one is to find the regression - predicts anchor box boundaries - x1, y1, x2, y2 and the second neural network is for multi-label (N) classification. 

## Focal Loss

Real improvement in the accuracy of the RetinaNet is brought by using a new loss function called - Focal Loss.

Focal Loss is designed to address the image imbalance challenge between foreground and background classes during the training of the image dataset. Focal Loss assigns low-weights to the well-defined backgrounds. 

Focal Loss for the binary classification, similar to Cross Entropy (CE):


\begin{equation*}
[
        CE_{(p,y)}=\begin{cases}
                -log(p) & \text{if }y = 1\,,  \\
                -log(1 - p) & \text{if } otherwise\,.
        \end{cases}
]
\end{equation*}

In the above y belongs to {+/- 1} denotes the base class (ground-truth) and p = [0,1] is the estimated probability of the model for the class with label y = 1. We define p as:

\begin{equation*}
[
        p_{t}=\begin{cases}
                p & \text{if }y = 1\,,  \\
                1 - p & \text{if } otherwise\,.
        \end{cases}
]
\end{equation*}

\begin{equation*}
[
        Fl(p_{t})= -(1 - p_{t})^{y} log(p_{t})
]
\end{equation*}




From the focal loss function defined above, classification cross-entropy loss -log(p) by a factor of (1-p)^y. Here is y is the modulating factor between 0 and 5. The well classified background classes have higher p and lower y. This is key aspect that compels the model to learn on specific foreground classes. 



![RetinaNet](img/focal-loss.png "Focal Loss") (Ref: @ObjDetect)

Note: For complete details of the Focal Loss Object Detetion - Single-Stage Detector algorithm, please refer to the link:  https://arxiv.org/pdf/1708.02002.pdf

## Dataset

- Prepare the dataset in the CSV format (with training and cross-validaton split)
- Check the correctness of the dataset using retinanet-debug
- Train retinanet, using predefined COCO weights (with decent jump start with better accuracy and better performance)
- Optimize the training model to an inference model
- Evaluate the updated model on the cross-validaton and test datasets
- install pycocotools to test on the MS COCO dataset by running pip install git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI


## Training

- COCO dataset can be trained on RetinaNet using the python code lised in the training folder
- The default backbone is ResNet50, it can be changed to a different dataset by pasing the dataset name in the --backbone argument
- Various backbone models to try are: ResNet models (ResNet50, ResNet101), MobileNet models (MobileNet128_1:0, MobileNet128_0.75) and VGG models

Trained model needs to converted into an inteference model before proceeding to the testing.



## Usage

### Running directly from the repository:
keras_retinanet/bin/train.py coco /path/to/MS/COCO

### Using the installed script:
retinanet-train coco /path/to/MS/COCO

## Testing

###  Load Python Libraries

In [1]:
# show images inline
%matplotlib inline

# automatically reload modules when they have changed
%load_ext autoreload
%autoreload 2

# import keras
import keras

# import miscellaneous modules
import matplotlib.pyplot as plt
import cv2
import os
import numpy as np
import time
import keras

import numpy as np
import math

# set tf backend to allow memory to grow, instead of claiming everything
import tensorflow as tf

def get_session():
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    return tf.Session(config=config)

# use this environment flag to change which GPU to use
#os.environ["CUDA_VISIBLE_DEVICES"] = "1"

# set the modified tf session as backend in keras
keras.backend.tensorflow_backend.set_session(get_session())

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


### Load Utility Methods

In [2]:
import numpy as np
import cv2
from PIL import Image

def read_image_bgr(path):
    """ Read an image in BGR format.

    Args
        path: Path to the image.
    """
    image = np.asarray(Image.open(path).convert('RGB'))
    return image[:, :, ::-1].copy()

def preprocess_image(x, mode='caffe'):
    """ Preprocess an image by subtracting the ImageNet mean.

    Args
        x: np.array of shape (None, None, 3) or (3, None, None).
        mode: One of "caffe" or "tf".
            - caffe: will zero-center each color channel with
                respect to the ImageNet dataset, without scaling.
            - tf: will scale pixels between -1 and 1, sample-wise.

    Returns
        The input with the ImageNet mean subtracted.
    """
    # mostly identical to "https://github.com/keras-team/keras-applications/blob/master/keras_applications/imagenet_utils.py"
    # except for converting RGB -> BGR since we assume BGR already

    # covert always to float32 to keep compatibility with opencv
    x = x.astype(np.float32)

    if mode == 'tf':
        x /= 127.5
        x -= 1.
    elif mode == 'caffe':
        x[..., 0] -= 103.939
        x[..., 1] -= 116.779
        x[..., 2] -= 123.68

    return x

def resize_image(img, min_side=800, max_side=1333):
    """ Resize an image such that the size is constrained to min_side and max_side.

    Args
        min_side: The image's min side will be equal to min_side after resizing.
        max_side: If after resizing the image's max side is above max_side, resize until the max side is equal to max_side.

    Returns
        A resized image.
    """
    # compute scale to resize the image
    scale = compute_resize_scale(img.shape, min_side=min_side, max_side=max_side)

    # resize the image with the computed scale
    img = cv2.resize(img, None, fx=scale, fy=scale)

    return img, scale

In [3]:
def label_color(label):
    """ Return a color from a set of predefined colors. Contains 80 colors in total.

    Args
        label: The label to get the color for.

    Returns
        A list of three values representing a RGB color.

        If no color is defined for a certain label, the color green is returned and a warning is printed.
    """
    if label < len(colors):
        return colors[label]
    else:
        warnings.warn('Label {} has no color, returning default.'.format(label))
        return (0, 255, 0)


"""
Generated using:

```
colors = [list((matplotlib.colors.hsv_to_rgb([x, 1.0, 1.0]) * 255).astype(int)) for x in np.arange(0, 1, 1.0 / 80)]
shuffle(colors)
pprint(colors)
```
"""
colors = [
    [31  , 0   , 255] ,
    [0   , 159 , 255] ,
    [255 , 95  , 0]   ,
    [255 , 19  , 0]   ,
    [255 , 0   , 0]   ,
    [255 , 38  , 0]   ,
    [0   , 255 , 25]  ,
    [255 , 0   , 133] ,
    [255 , 172 , 0]   ,
    [108 , 0   , 255] ,
    [0   , 82  , 255] ,
    [0   , 255 , 6]   ,
    [255 , 0   , 152] ,
    [223 , 0   , 255] ,
    [12  , 0   , 255] ,
    [0   , 255 , 178] ,
    [108 , 255 , 0]   ,
    [184 , 0   , 255] ,
    [255 , 0   , 76]  ,
    [146 , 255 , 0]   ,
    [51  , 0   , 255] ,
    [0   , 197 , 255] ,
    [255 , 248 , 0]   ,
    [255 , 0   , 19]  ,
    [255 , 0   , 38]  ,
    [89  , 255 , 0]   ,
    [127 , 255 , 0]   ,
    [255 , 153 , 0]   ,
    [0   , 255 , 255] ,
    [0   , 255 , 216] ,
    [0   , 255 , 121] ,
    [255 , 0   , 248] ,
    [70  , 0   , 255] ,
    [0   , 255 , 159] ,
    [0   , 216 , 255] ,
    [0   , 6   , 255] ,
    [0   , 63  , 255] ,
    [31  , 255 , 0]   ,
    [255 , 57  , 0]   ,
    [255 , 0   , 210] ,
    [0   , 255 , 102] ,
    [242 , 255 , 0]   ,
    [255 , 191 , 0]   ,
    [0   , 255 , 63]  ,
    [255 , 0   , 95]  ,
    [146 , 0   , 255] ,
    [184 , 255 , 0]   ,
    [255 , 114 , 0]   ,
    [0   , 255 , 235] ,
    [255 , 229 , 0]   ,
    [0   , 178 , 255] ,
    [255 , 0   , 114] ,
    [255 , 0   , 57]  ,
    [0   , 140 , 255] ,
    [0   , 121 , 255] ,
    [12  , 255 , 0]   ,
    [255 , 210 , 0]   ,
    [0   , 255 , 44]  ,
    [165 , 255 , 0]   ,
    [0   , 25  , 255] ,
    [0   , 255 , 140] ,
    [0   , 101 , 255] ,
    [0   , 255 , 82]  ,
    [223 , 255 , 0]   ,
    [242 , 0   , 255] ,
    [89  , 0   , 255] ,
    [165 , 0   , 255] ,
    [70  , 255 , 0]   ,
    [255 , 0   , 172] ,
    [255 , 76  , 0]   ,
    [203 , 255 , 0]   ,
    [204 , 0   , 255] ,
    [255 , 0   , 229] ,
    [255 , 133 , 0]   ,
    [127 , 0   , 255] ,
    [0   , 235 , 255] ,
    [0   , 255 , 197] ,
    [255 , 0   , 191] ,
    [0   , 44  , 255] ,
    [50  , 255 , 0]
]


In [4]:
import cv2
import numpy as np

def draw_box(image, box, color, thickness=2):
    """ Draws a box on an image with a given color.

    # Arguments
        image     : The image to draw on.
        box       : A list of 4 elements (x1, y1, x2, y2).
        color     : The color of the box.
        thickness : The thickness of the lines to draw a box with.
    """
    b = np.array(box).astype(int)
    cv2.rectangle(image, (b[0], b[1]), (b[2], b[3]), color, thickness, cv2.LINE_AA)


def draw_caption(image, box, caption):
    """ Draws a caption above the box in an image.

    # Arguments
        image   : The image to draw on.
        box     : A list of 4 elements (x1, y1, x2, y2).
        caption : String containing the text to draw.
    """
    b = np.array(box).astype(int)
    cv2.putText(image, caption, (b[0], b[1] - 10), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 0), 2)
    cv2.putText(image, caption, (b[0], b[1] - 10), cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255), 1)


In [5]:
class PriorProbability(keras.initializers.Initializer):
    """ Apply a prior probability to the weights.
    """

    def __init__(self, probability=0.01):
        self.probability = probability

    def get_config(self):
        return {
            'probability': self.probability
        }

    def __call__(self, shape, dtype=None):
        # set bias to -log((1 - p)/p) for foreground
        result = np.ones(shape, dtype=dtype) * -math.log((1 - self.probability) / self.probability)

        return result

### Define RetinaNet

In [6]:
%run filter_detections.py 
%run tensorflow_backend.py 
%run anchors.py

SyntaxError: invalid syntax (compute_overlap.py, line 8)

In [9]:
"""
Copyright 2017-2018 Fizyr (https://fizyr.com)

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""

import keras

def default_classification_model(
    num_classes,
    num_anchors,
    pyramid_feature_size=256,
    prior_probability=0.01,
    classification_feature_size=256,
    name='classification_submodel'
):
    """ Creates the default regression submodel.

    Args
        num_classes                 : Number of classes to predict a score for at each feature level.
        num_anchors                 : Number of anchors to predict classification scores for at each feature level.
        pyramid_feature_size        : The number of filters to expect from the feature pyramid levels.
        classification_feature_size : The number of filters to use in the layers in the classification submodel.
        name                        : The name of the submodel.

    Returns
        A keras.models.Model that predicts classes for each anchor.
    """
    options = {
        'kernel_size' : 3,
        'strides'     : 1,
        'padding'     : 'same',
    }

    if keras.backend.image_data_format() == 'channels_first':
        inputs  = keras.layers.Input(shape=(pyramid_feature_size, None, None))
    else:
        inputs  = keras.layers.Input(shape=(None, None, pyramid_feature_size))
    outputs = inputs
    for i in range(4):
        outputs = keras.layers.Conv2D(
            filters=classification_feature_size,
            activation='relu',
            name='pyramid_classification_{}'.format(i),
            kernel_initializer=keras.initializers.normal(mean=0.0, stddev=0.01, seed=None),
            bias_initializer='zeros',
            **options
        )(outputs)

    outputs = keras.layers.Conv2D(
        filters=num_classes * num_anchors,
        kernel_initializer=keras.initializers.normal(mean=0.0, stddev=0.01, seed=None),
        bias_initializer=initializers.PriorProbability(probability=prior_probability),
        name='pyramid_classification',
        **options
    )(outputs)

    # reshape output and apply sigmoid
    if keras.backend.image_data_format() == 'channels_first':
        outputs = keras.layers.Permute((2, 3, 1), name='pyramid_classification_permute')(outputs)
    outputs = keras.layers.Reshape((-1, num_classes), name='pyramid_classification_reshape')(outputs)
    outputs = keras.layers.Activation('sigmoid', name='pyramid_classification_sigmoid')(outputs)

    return keras.models.Model(inputs=inputs, outputs=outputs, name=name)


def default_regression_model(num_values, num_anchors, pyramid_feature_size=256, regression_feature_size=256, name='regression_submodel'):
    """ Creates the default regression submodel.

    Args
        num_values              : Number of values to regress.
        num_anchors             : Number of anchors to regress for each feature level.
        pyramid_feature_size    : The number of filters to expect from the feature pyramid levels.
        regression_feature_size : The number of filters to use in the layers in the regression submodel.
        name                    : The name of the submodel.

    Returns
        A keras.models.Model that predicts regression values for each anchor.
    """
    # All new conv layers except the final one in the
    # RetinaNet (classification) subnets are initialized
    # with bias b = 0 and a Gaussian weight fill with stddev = 0.01.
    options = {
        'kernel_size'        : 3,
        'strides'            : 1,
        'padding'            : 'same',
        'kernel_initializer' : keras.initializers.normal(mean=0.0, stddev=0.01, seed=None),
        'bias_initializer'   : 'zeros'
    }

    if keras.backend.image_data_format() == 'channels_first':
        inputs  = keras.layers.Input(shape=(pyramid_feature_size, None, None))
    else:
        inputs  = keras.layers.Input(shape=(None, None, pyramid_feature_size))
    outputs = inputs
    for i in range(4):
        outputs = keras.layers.Conv2D(
            filters=regression_feature_size,
            activation='relu',
            name='pyramid_regression_{}'.format(i),
            **options
        )(outputs)

    outputs = keras.layers.Conv2D(num_anchors * num_values, name='pyramid_regression', **options)(outputs)
    if keras.backend.image_data_format() == 'channels_first':
        outputs = keras.layers.Permute((2, 3, 1), name='pyramid_regression_permute')(outputs)
    outputs = keras.layers.Reshape((-1, num_values), name='pyramid_regression_reshape')(outputs)

    return keras.models.Model(inputs=inputs, outputs=outputs, name=name)


def __create_pyramid_features(C3, C4, C5, feature_size=256):
    """ Creates the FPN layers on top of the backbone features.

    Args
        C3           : Feature stage C3 from the backbone.
        C4           : Feature stage C4 from the backbone.
        C5           : Feature stage C5 from the backbone.
        feature_size : The feature size to use for the resulting feature levels.

    Returns
        A list of feature levels [P3, P4, P5, P6, P7].
    """
    # upsample C5 to get P5 from the FPN paper
    P5           = keras.layers.Conv2D(feature_size, kernel_size=1, strides=1, padding='same', name='C5_reduced')(C5)
    P5_upsampled = layers.UpsampleLike(name='P5_upsampled')([P5, C4])
    P5           = keras.layers.Conv2D(feature_size, kernel_size=3, strides=1, padding='same', name='P5')(P5)

    # add P5 elementwise to C4
    P4           = keras.layers.Conv2D(feature_size, kernel_size=1, strides=1, padding='same', name='C4_reduced')(C4)
    P4           = keras.layers.Add(name='P4_merged')([P5_upsampled, P4])
    P4_upsampled = layers.UpsampleLike(name='P4_upsampled')([P4, C3])
    P4           = keras.layers.Conv2D(feature_size, kernel_size=3, strides=1, padding='same', name='P4')(P4)

    # add P4 elementwise to C3
    P3 = keras.layers.Conv2D(feature_size, kernel_size=1, strides=1, padding='same', name='C3_reduced')(C3)
    P3 = keras.layers.Add(name='P3_merged')([P4_upsampled, P3])
    P3 = keras.layers.Conv2D(feature_size, kernel_size=3, strides=1, padding='same', name='P3')(P3)

    # "P6 is obtained via a 3x3 stride-2 conv on C5"
    P6 = keras.layers.Conv2D(feature_size, kernel_size=3, strides=2, padding='same', name='P6')(C5)

    # "P7 is computed by applying ReLU followed by a 3x3 stride-2 conv on P6"
    P7 = keras.layers.Activation('relu', name='C6_relu')(P6)
    P7 = keras.layers.Conv2D(feature_size, kernel_size=3, strides=2, padding='same', name='P7')(P7)

    return [P3, P4, P5, P6, P7]


def default_submodels(num_classes, num_anchors):
    """ Create a list of default submodels used for object detection.

    The default submodels contains a regression submodel and a classification submodel.

    Args
        num_classes : Number of classes to use.
        num_anchors : Number of base anchors.

    Returns
        A list of tuple, where the first element is the name of the submodel and the second element is the submodel itself.
    """
    return [
        ('regression', default_regression_model(4, num_anchors)),
        ('classification', default_classification_model(num_classes, num_anchors))
    ]


def __build_model_pyramid(name, model, features):
    """ Applies a single submodel to each FPN level.

    Args
        name     : Name of the submodel.
        model    : The submodel to evaluate.
        features : The FPN features.

    Returns
        A tensor containing the response from the submodel on the FPN features.
    """
    return keras.layers.Concatenate(axis=1, name=name)([model(f) for f in features])


def __build_pyramid(models, features):
    """ Applies all submodels to each FPN level.

    Args
        models   : List of sumodels to run on each pyramid level (by default only regression, classifcation).
        features : The FPN features.

    Returns
        A list of tensors, one for each submodel.
    """
    return [__build_model_pyramid(n, m, features) for n, m in models]


def __build_anchors(anchor_parameters, features):
    """ Builds anchors for the shape of the features from FPN.

    Args
        anchor_parameters : Parameteres that determine how anchors are generated.
        features          : The FPN features.

    Returns
        A tensor containing the anchors for the FPN features.

        The shape is:
        ```
        (batch_size, num_anchors, 4)
        ```
    """
    anchors = [
        layers.Anchors(
            size=anchor_parameters.sizes[i],
            stride=anchor_parameters.strides[i],
            ratios=anchor_parameters.ratios,
            scales=anchor_parameters.scales,
            name='anchors_{}'.format(i)
        )(f) for i, f in enumerate(features)
    ]

    return keras.layers.Concatenate(axis=1, name='anchors')(anchors)


def retinanet(
    inputs,
    backbone_layers,
    num_classes,
    num_anchors             = None,
    create_pyramid_features = __create_pyramid_features,
    submodels               = None,
    name                    = 'retinanet'
):
    """ Construct a RetinaNet model on top of a backbone.

    This model is the minimum model necessary for training (with the unfortunate exception of anchors as output).

    Args
        inputs                  : keras.layers.Input (or list of) for the input to the model.
        num_classes             : Number of classes to classify.
        num_anchors             : Number of base anchors.
        create_pyramid_features : Functor for creating pyramid features given the features C3, C4, C5 from the backbone.
        submodels               : Submodels to run on each feature map (default is regression and classification submodels).
        name                    : Name of the model.

    Returns
        A keras.models.Model which takes an image as input and outputs generated anchors and the result from each submodel on every pyramid level.

        The order of the outputs is as defined in submodels:
        ```
        [
            regression, classification, other[0], other[1], ...
        ]
        ```
    """

    if num_anchors is None:
        num_anchors = AnchorParameters.default.num_anchors()

    if submodels is None:
        submodels = default_submodels(num_classes, num_anchors)

    C3, C4, C5 = backbone_layers

    # compute pyramid features as per https://arxiv.org/abs/1708.02002
    features = create_pyramid_features(C3, C4, C5)

    # for all pyramid levels, run available submodels
    pyramids = __build_pyramid(submodels, features)

    return keras.models.Model(inputs=inputs, outputs=pyramids, name=name)


def retinanet_bbox(
    model                 = None,
    nms                   = True,
    class_specific_filter = True,
    name                  = 'retinanet-bbox',
    anchor_params         = None,
    **kwargs
):
    """ Construct a RetinaNet model on top of a backbone and adds convenience functions to output boxes directly.

    This model uses the minimum retinanet model and appends a few layers to compute boxes within the graph.
    These layers include applying the regression values to the anchors and performing NMS.

    Args
        model                 : RetinaNet model to append bbox layers to. If None, it will create a RetinaNet model using **kwargs.
        nms                   : Whether to use non-maximum suppression for the filtering step.
        class_specific_filter : Whether to use class specific filtering or filter for the best scoring class only.
        name                  : Name of the model.
        anchor_params         : Struct containing anchor parameters. If None, default values are used.
        *kwargs               : Additional kwargs to pass to the minimal retinanet model.

    Returns
        A keras.models.Model which takes an image as input and outputs the detections on the image.

        The order is defined as follows:
        ```
        [
            boxes, scores, labels, other[0], other[1], ...
        ]
        ```
    """

    # if no anchor parameters are passed, use default values
    if anchor_params is None:
        anchor_params = AnchorParameters.default

    # create RetinaNet model
    if model is None:
        model = retinanet(num_anchors=anchor_params.num_anchors(), **kwargs)
    else:
        assert_training_model(model)

    # compute the anchors
    features = [model.get_layer(p_name).output for p_name in ['P3', 'P4', 'P5', 'P6', 'P7']]
    anchors  = __build_anchors(anchor_params, features)

    # we expect the anchors, regression and classification values as first output
    regression     = model.outputs[0]
    classification = model.outputs[1]

    # "other" can be any additional output from custom submodels, by default this will be []
    other = model.outputs[2:]

    # apply predicted regression to anchors
    boxes = layers.RegressBoxes(name='boxes')([anchors, regression])
    boxes = layers.ClipBoxes(name='clipped_boxes')([model.inputs[0], boxes])

    # filter detections (apply NMS / score threshold / select top-k)
    detections = layers.FilterDetections(
        nms                   = nms,
        class_specific_filter = class_specific_filter,
        name                  = 'filtered_detections'
    )([boxes, classification] + other)

    # construct the model
    return keras.models.Model(inputs=model.inputs, outputs=detections, name=name)


SyntaxError: invalid syntax (<ipython-input-9-f08b6b4bd9f7>, line 18)

### Define Geneic Backbone

In [8]:
from __future__ import print_function
import sys


class Backbone(object):
    """ This class stores additional information on backbones.
    """
    def __init__(self, backbone):
        # a dictionary mapping custom layer names to the correct classes
        from .. import layers
        from .. import losses
        from .. import initializers
        self.custom_objects = {
            'UpsampleLike'     : layers.UpsampleLike,
            'PriorProbability' : initializers.PriorProbability,
            'RegressBoxes'     : layers.RegressBoxes,
            'FilterDetections' : layers.FilterDetections,
            'Anchors'          : layers.Anchors,
            'ClipBoxes'        : layers.ClipBoxes,
            '_smooth_l1'       : losses.smooth_l1(),
            '_focal'           : losses.focal(),
        }

        self.backbone = backbone
        self.validate()

    def retinanet(self, *args, **kwargs):
        """ Returns a retinanet model using the correct backbone.
        """
        raise NotImplementedError('retinanet method not implemented.')

    def download_imagenet(self):
        """ Downloads ImageNet weights and returns path to weights file.
        """
        raise NotImplementedError('download_imagenet method not implemented.')

    def validate(self):
        """ Checks whether the backbone string is correct.
        """
        raise NotImplementedError('validate method not implemented.')

    def preprocess_image(self, inputs):
        """ Takes as input an image and prepares it for being passed through the network.
        Having this function in Backbone allows other backbones to define a specific preprocessing step.
        """
        raise NotImplementedError('preprocess_image method not implemented.')


def backbone(backbone_name):
    """ Returns a backbone object for the given backbone.
    """
    if 'resnet' in backbone_name:
        from .resnet import ResNetBackbone as b
    elif 'mobilenet' in backbone_name:
        from .mobilenet import MobileNetBackbone as b
    elif 'vgg' in backbone_name:
        from .vgg import VGGBackbone as b
    elif 'densenet' in backbone_name:
        from .densenet import DenseNetBackbone as b
    else:
        raise NotImplementedError('Backbone class for  \'{}\' not implemented.'.format(backbone))

    return b(backbone_name)


def load_model(filepath, backbone_name='resnet50'):
    """ Loads a retinanet model using the correct custom objects.

    Args
        filepath: one of the following:
            - string, path to the saved model, or
            - h5py.File object from which to load the model
        backbone_name         : Backbone with which the model was trained.

    Returns
        A keras.models.Model object.

    Raises
        ImportError: if h5py is not available.
        ValueError: In case of an invalid savefile.
    """
    import keras.models
    return keras.models.load_model(filepath, custom_objects=backbone(backbone_name).custom_objects)


def convert_model(model, nms=True, class_specific_filter=True, anchor_params=None):
    """ Converts a training model to an inference model.

    Args
        model                 : A retinanet training model.
        nms                   : Boolean, whether to add NMS filtering to the converted model.
        class_specific_filter : Whether to use class specific filtering or filter for the best scoring class only.
        anchor_params         : Anchor parameters object. If omitted, default values are used.

    Returns
        A keras.models.Model object.

    Raises
        ImportError: if h5py is not available.
        ValueError: In case of an invalid savefile.
    """
    from .retinanet import retinanet_bbox
    return retinanet_bbox(model=model, nms=nms, class_specific_filter=class_specific_filter, anchor_params=anchor_params)


def assert_training_model(model):
    """ Assert that the model is a training model.
    """
    assert(all(output in model.output_names for output in ['regression', 'classification'])), \
        "Input is not a training model (no 'regression' and 'classification' outputs were found, outputs are: {}).".format(model.output_names)


def check_training_model(model):
    """ Check that model is a training model and exit otherwise.
    """
    try:
        assert_training_model(model)
    except AssertionError as e:
        print(e, file=sys.stderr)
        sys.exit(1)


### Load RetinaNet Model

In [7]:
# adjust this to point to your downloaded/trained model
# models can be downloaded here: https://github.com/fizyr/keras-retinanet/releases
model_path = os.path.join('..', 'snapshots', 'resnet50_coco_best_v2.1.0.h5')

# load retinanet model
model = models.load_model(model_path, backbone_name='resnet50')

# if the model is not converted to an inference model, use the line below
# see: https://github.com/fizyr/keras-retinanet#converting-a-training-model-to-inference-model
#model = models.convert_model(model)

#print(model.summary())

# load label to names mapping for visualization purposes
labels_to_names = {0: 'person',  1: 'bicycle',  2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 
                   6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 
                   11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 
                   16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 
                   21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 
                   26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 
                   31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 
                   36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 
                   41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 
                   46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 
                   51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 
                   56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 
                   61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 
                   66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 
                   71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 
                   76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}

ModuleNotFoundError: No module named 'keras_resnet'

### Try with an example

In [None]:
def classify_img_retinaNet(img_name):
    # load image
    image = read_image_bgr(img_name)

    # copy to draw on
    draw = image.copy()
    draw = cv2.cvtColor(draw, cv2.COLOR_BGR2RGB)

    # preprocess image for network
    image = preprocess_image(image)
    image, scale = resize_image(image)

    # process image
    start = time.time()
    boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0))
    print("processing time: ", time.time() - start)

    # correct for image scale
    boxes /= scale

    # visualize detections
    for box, score, label in zip(boxes[0], scores[0], labels[0]):
        # scores are sorted so we can break
        if score < 0.5:
            break

        color = label_color(label)

        b = box.astype(int)
        draw_box(draw, b, color=color)

        caption = "{} {:.3f}".format(labels_to_names[label], score)
        draw_caption(draw, b, caption)

    plt.figure(figsize=(15, 15))
    plt.axis('off')
    plt.imshow(draw)
    plt.show()

In [None]:
#car, truck, person, bus, bycle and traffic sign
classify_img_retinaNet('img_with_car.jpg')
classify_img_retinaNet('img_with_truck.jpg')
classify_img_retinaNet('img_with_person.jpg')
classify_img_retinaNet('img_with_bus.jpg')
classify_img_retinaNet('img_with_bycle.jpg')
classify_img_retinaNet('img_with_traffic_sign.jpg')
classify_img_retinaNet('img_with_multiple.jpg')
classify_img_retinaNet('img_with_all.jpg')

## Next Steps - Projects

- NATO Innovation Challenge. The winning team of the NATO Innovation Challenge used keras-retinanet to detect cars in aerial images (COWC dataset).

- Microsoft Research for Horovod on Azure. A research project by Microsoft, using keras-retinanet to distribute training over multiple GPUs using Horovod on Azure.

- 4k video example. This demo shows the use of keras-retinanet on a 4k input video.

# References: 

We would like thank our professor - **Dr. James Shanahan** for his great guidance, continual help and support during the **Deep Learning course.**

We would also like to thank various developers and authors of the Deep Learning (CNN) related including the references given in the following links.

** Books **

- Ref: Book_DL
- Book Title: **Deep Learning**
- Authors: Ian Goodfellow, Yoshua Bengio and Aaron Courville


- Ref: Guide-DL
- Book/Guide: **A Guide to Covolutional Neural Networks for Computer Vision**
- Link: https://www.dropbox.com/s/789qiaq0svh4270/A%20Guide%20to%20Convolutional%20Neural%20Networks%20for%20Computer%20Vision.pdf?dl=0
- Editors: Gérard Medioni, University of Southern California and Sven Dickinson, University of Toronto

** Videos **

- Title: **Courseera CNN course - Object Detection and Localization**
- Link: https://www.coursera.org/lecture/convolutional-neural-networks/object-detection-VgyWR
- Professor: Andrew Ng


** Web Articles **

- Title: **Back-Propogation is very simple. Who made it complicated?**
- Link: https://medium.com/@14prakash/back-propagation-is-very-simple-who-made-it-complicated-97b794c97e5c
- Author: Prakash Jay
- Date: 20-Apr-2017


- Title: **An intutive guide to Convolutional Neural Networks**
- Link: https://medium.freecodecamp.org/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050
- Author: Daphane Cornelisse
- Date: 24-Aprl-2018


- Title: **Understanding of Convolutional Neural Network (CNN) - Deep Learning**
- Link: https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148
- Author: Prabhu
- Date: 04-Mar-2018


- Title: **Implementation of Training Convolutional Neural Networks**
- Link: https://arxiv.org/ftp/arxiv/papers/1506/1506.01195.pdf
- Authors: Tianyi Liu, Shuangsang Fang, Yuehui Zhao, Peng Wang, Jun Zhang
- University of Chinese Academy of Sciences, Beijing, China


- Title: **A Beginner's Guide to Neural Networks and Deep Learning**
- Link: https://skymind.ai/wiki/neural-network
- Author: AI Wiki


- Title: **LeNet5 - A Classic CNN Architecture**
- Link: https://engmrk.com/lenet-5-a-classic-cnn-architecture/
- Author: Muhammad Rizwan
- Date: 30-Sept-2018


- Ref: @RetinaNet-Intro
- Title: **RetinaNet Introduction**
- Link: https://arxiv.org/pdf/1708.02002.pdf
- Authors: Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollar
- Facebook AI Research (FAIR)

- Title: **COCO (Community Objects in Context) Image Dataset **
- Link: http://cocodataset.org/#home


- Ref: @ObjDetect
- Title: Object Detection with Deep Learning on Aerial Imagery
- Link: https://medium.com/data-from-the-trenches/object-detection-with-deep-learning-on-aerial-imagery-2465078db8a9
- Author: Arthur Douillard
- Date: 22-Jun-2018


**GitHub Links:**

- Title: **Convolutional Neural Network**
- Link: https://github.com/mbadry1/DeepLearning.ai-Summary/tree/master/4-%20Convolutional%20Neural%20Networks
- Author: Mahmoud Badry


- Title: **Keras RetinaNet**
- Link: https://github.com/fizyr/keras-retinanet
- Author: Fizyr
