
# Unit 28 - Project: Object Detection using RetinaNet

**Project**: Object Detection, Classification and Labeling using RetinaNet


## Course: Fall 2018, Deep Learning
Professor: **Dr. James Shanahan**

Students: **Gelesh Omathil and Murali Cheruvu**

University: **Indiana University**


**RetinaNet**: Introduction: https://arxiv.org/pdf/1708.02002.pdf

**Dataset**: 
-	Use **COCO Dataset** (http://cocodataset.org/#home) (~100k images) for training, validation and test datasets 
-	About 1MM bounding boxes; some of the images have about 10 classes in them
-	We have 80 classes in this database
-	Our focus is only 7 classes - **car, truck, person, bus, bycle and traffic sign**

**Cloud**: Cloud Provider Server with Linux/Ubuntu Box with **GPU**s

**Project**: **Train RetinaNet Dataset - Object Detectors**

-	Use this notebook as a base: https://github.com/fizyr/keras-retinanet
-	Use Transfer Learning (load the weights from pre-trained models)
-	Use base model on the pre-trained model from RetinaNet but focus on only 7 classes (all the other classes be treated like background images)
-	Retrain part of the network (about 6 key layers) from the transferred learning state using ResNet50/ResNet101 as a back-bone - with focus on 7 classes, so that we will recalibrate our model
-	Predict bounding boxes, predict classes - 8 classes (7 + background class), 8X5 outputs
-	Try out - output layers of different resolutions - ex: 56X56, 28x28, 14x14 (feature pyramid)
-	For each feature pyramid, we will have output layer with loss function
-	Try out with smaller epochs with CPU and full blown using GPUs



## Python Libraries

- Python 3.6 
- Keras 2.2.4+
- TensorFlow (CPU and GPU)

## Introduction

Recognizing an object from an image has always been a very challenging task. If we need detect multiple objects from the same image is even more difficult. Purpose of Computer Vision is to solve such complex tasks. With the emergence of Neural Network driven Machine Learning algorithms, there are better ways to tackle these tasks. 

Convolutional Neural Network (CNN), Deep Learning, is an advanced neural network concept to perfectly handle these challenges.

We present three techniques here - (1) Region-based CNN (R-CNN), (2)  Fast R-CNN and (3) Regional Proposal Network (RPN) (Ref: @Guide-DL).

1. **Region-based CNN**

![image.png](img/r-cnn.png)(Ref: @Guide-DL)


2. **Fast R-CNN**

![image.png](img/faster-r-cnn.png) (Ref: @Guide-DL)


3. **Regional Proposal Network**

![](img/reg_prop_1.png) (Ref: @Guide-DL)

![image.png](img/reg_prop_2.png) (Ref: @Guide-DL)

## RetinaNet - One-Stage Detector

Most of the popular object detector algorithms are based on R-CNN with two-stage detection and give highest possible accuracy.
However, two-stage detection algorithms are slower due to complex processing in a iterative manner. Recent work to improve the performance of the algorithms, one-stage detectors come to popularity. **OverFeat** and **YOLO** (You Only Look Once) have achieved faster detection with 10%-40% accuracy relative to two-stage detectors. 

Focal Loss for Dense Object Detection, is a project done by Facebook AI Reserch team, has proposed one-stage detector with hybrid approaches from two-stage detectors such as Feature Pyramid Network (FPN) and Mask R-CNN, to achieve the acurracy comparable with two-stage detectors.

Some of the key aspects are listed as follows:

- First-pass detection, class imbalance and inefficiency is addressed using techniques such as bootstrapping and hard example mining
- Proposed a new loss function, **Focal Loss**, dynamically scalled cross entropy loss to deal with class imbalance using intutive scaling factor to down-weight the contribution of easy samples automatically while focusing on the hard samples


(Ref: @RetinaNet-Intro)

## Focal Loss

Focal Loss is desinged to address the image imbalance challege between foreground and background classes during the training of the image dataset. 

Focal Loss for the binary classfication, similar to Cross Entropy (CE):


\begin{equation*}
[
        CE_{(p,y)}=\begin{cases}
                -log(p) & \text{if }y = 1\,,  \\
                -log(1 - p) & \text{if } otherwise\,.
        \end{cases}
]
\end{equation*}

In the above y belongs to {+/- 1} denotes the base class (ground-truth) and p = [0,1] is the estimated probability of the model for the class with label y = 1. We deine p as:

\begin{equation*}
[
        p_{t}=\begin{cases}
                p & \text{if }y = 1\,,  \\
                1 - p & \text{if } otherwise\,.
        \end{cases}
]
\end{equation*}


"**RetinaNet is a single, unified network composed of a Feature Pyramid (backbone) network and two task-specific sub-networks**" (Ref: @RetinaNet-Intro)



![RetinaNet](img/retinanet.png "Title") (Ref: @RetinaNet-Intro)

Note: For complete details of the Focal Loss Object Detetion - Single-Stage Detector algorithm, please refer to the link:  https://arxiv.org/pdf/1708.02002.pdf

## Dataset

- Prepare the dataset in the CSV format (with training and cross-validaton split)
- Check the correctness of the dataset using retinanet-debug
- Train retinanet, using predefined COCO weights (with decent jump start with better accuracy and better performance)
- Optimize the training model to an inference model
- Evaluate the updated model on the cross-validaton and test datasets
- install pycocotools to test on the MS COCO dataset by running pip install git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI


## Training

- COCO dataset can be trained on RetinaNet using the python code lised in the training folder
- The default backbone is ResNet50, it can be changed to a different dataset by pasing the dataset name in the --backbone argument
- Various backbone models to try are: ResNet models (ResNet50, ResNet101), MobileNet models (MobileNet128_1:0, MobileNet128_0.75) and VGG models

Trained model needs to converted into an inteference model before proceeding to the testing.



## Usage

### Running directly from the repository:
keras_retinanet/bin/train.py coco /path/to/MS/COCO

### Using the installed script:
retinanet-train coco /path/to/MS/COCO

## Testing

###  Load Python Libraries

In [None]:
# show images inline
%matplotlib inline

# automatically reload modules when they have changed
%load_ext autoreload
%autoreload 2

# import keras
import keras

# import keras_retinanet
from keras_retinanet import models
from keras_retinanet.utils.image import read_image_bgr, preprocess_image, resize_image
from keras_retinanet.utils.visualization import draw_box, draw_caption
from keras_retinanet.utils.colors import label_color

# import miscellaneous modules
import matplotlib.pyplot as plt
import cv2
import os
import numpy as np
import time

# set tf backend to allow memory to grow, instead of claiming everything
import tensorflow as tf

def get_session():
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    return tf.Session(config=config)

# use this environment flag to change which GPU to use
#os.environ["CUDA_VISIBLE_DEVICES"] = "1"

# set the modified tf session as backend in keras
keras.backend.tensorflow_backend.set_session(get_session())

### Load RetinaNet Model

In [None]:
# adjust this to point to your downloaded/trained model
# models can be downloaded here: https://github.com/fizyr/keras-retinanet/releases
model_path = os.path.join('..', 'snapshots', 'resnet50_coco_best_v2.1.0.h5')

# load retinanet model
model = models.load_model(model_path, backbone_name='resnet50')

# if the model is not converted to an inference model, use the line below
# see: https://github.com/fizyr/keras-retinanet#converting-a-training-model-to-inference-model
#model = models.convert_model(model)

#print(model.summary())

# load label to names mapping for visualization purposes
labels_to_names = {0: 'person',  1: 'bicycle',  2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 
                   6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 
                   11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 
                   16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 
                   21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 
                   26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 
                   31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 
                   36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 
                   41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 
                   46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 
                   51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 
                   56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 
                   61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 
                   66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 
                   71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 
                   76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}

### Try with an example

In [None]:
def classify_img_retinaNet(img_name):
    # load image
    image = read_image_bgr(img_name)

    # copy to draw on
    draw = image.copy()
    draw = cv2.cvtColor(draw, cv2.COLOR_BGR2RGB)

    # preprocess image for network
    image = preprocess_image(image)
    image, scale = resize_image(image)

    # process image
    start = time.time()
    boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0))
    print("processing time: ", time.time() - start)

    # correct for image scale
    boxes /= scale

    # visualize detections
    for box, score, label in zip(boxes[0], scores[0], labels[0]):
        # scores are sorted so we can break
        if score < 0.5:
            break

        color = label_color(label)

        b = box.astype(int)
        draw_box(draw, b, color=color)

        caption = "{} {:.3f}".format(labels_to_names[label], score)
        draw_caption(draw, b, caption)

    plt.figure(figsize=(15, 15))
    plt.axis('off')
    plt.imshow(draw)
    plt.show()

In [None]:
#car, truck, person, bus, bycle and traffic sign
classify_img_retinaNet('img_with_car.jpg')
classify_img_retinaNet('img_with_truck.jpg')
classify_img_retinaNet('img_with_person.jpg')
classify_img_retinaNet('img_with_bus.jpg')
classify_img_retinaNet('img_with_bycle.jpg')
classify_img_retinaNet('img_with_traffic_sign.jpg')
classify_img_retinaNet('img_with_multiple.jpg')
classify_img_retinaNet('img_with_all.jpg')

## Next Steps - Projects

- NATO Innovation Challenge. The winning team of the NATO Innovation Challenge used keras-retinanet to detect cars in aerial images (COWC dataset).

- Microsoft Research for Horovod on Azure. A research project by Microsoft, using keras-retinanet to distribute training over multiple GPUs using Horovod on Azure.

- 4k video example. This demo shows the use of keras-retinanet on a 4k input video.

# References: 

We would like thank our professor - **Dr. James Shanahan** for his great guidance, continual help and support during the **Deep Learning course.**

We would also like to thank various developers and authors of the Deep Learning (CNN) related including the references given in the following links.

** Books **

- Ref: Book_DL
- Book Title: **Deep Learning**
- Authors: Ian Goodfellow, Yoshua Bengio and Aaron Courville


- Ref: Guide-DL
- Book/Guide: **A Guide to Covolutional Neural Networks for Computer Vision**
- Link: https://www.dropbox.com/s/789qiaq0svh4270/A%20Guide%20to%20Convolutional%20Neural%20Networks%20for%20Computer%20Vision.pdf?dl=0
- Editors: Gérard Medioni, University of Southern California and Sven Dickinson, University of Toronto

** Videos **

- Title: **Courseera CNN course - Object Detection and Localization**
- Link: https://www.coursera.org/lecture/convolutional-neural-networks/object-detection-VgyWR
- Professor: Andrew Ng


** Web Articles **

- Title: **Back-Propogation is very simple. Who made it complicated?**
- Link: https://medium.com/@14prakash/back-propagation-is-very-simple-who-made-it-complicated-97b794c97e5c
- Author: Prakash Jay
- Date: 20-Apr-2017


- Title: **An intutive guide to Convolutional Neural Networks**
- Link: https://medium.freecodecamp.org/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050
- Author: Daphane Cornelisse
- Date: 24-Aprl-2018


- Title: **Understanding of Convolutional Neural Network (CNN) - Deep Learning**
- Link: https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148
- Author: Prabhu
- Date: 04-Mar-2018


- Title: **Implementation of Training Convolutional Neural Networks**
- Link: https://arxiv.org/ftp/arxiv/papers/1506/1506.01195.pdf
- Authors: Tianyi Liu, Shuangsang Fang, Yuehui Zhao, Peng Wang, Jun Zhang
- University of Chinese Academy of Sciences, Beijing, China


- Title: **A Beginner's Guide to Neural Networks and Deep Learning**
- Link: https://skymind.ai/wiki/neural-network
- Author: AI Wiki


- Title: **LeNet5 - A Classic CNN Architecture**
- Link: https://engmrk.com/lenet-5-a-classic-cnn-architecture/
- Author: Muhammad Rizwan
- Date: 30-Sept-2018

- Ref: @RetinaNet-Intro
- Title: **RetinaNet Introduction**
- Link: https://arxiv.org/pdf/1708.02002.pdf
- Authors: Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollar
- Facebook AI Research (FAIR)

- Title: **COCO (Community Objects in Context) Image Dataset **
- Link: http://cocodataset.org/#home


**GitHub Links:**

- Title: **Convolutional Neural Network**
- Link: https://github.com/mbadry1/DeepLearning.ai-Summary/tree/master/4-%20Convolutional%20Neural%20Networks
- Author: Mahmoud Badry


- Title: **Keras RetinaNet**
- Link: https://github.com/fizyr/keras-retinanet
- Author: Fizyr
