
# Unit 28 - Project: Object Detection using RetinaNet

**Project**: Object Detection, Classification and Labeling using RetinaNet


## Course: Fall 2018, Deep Learning
Professor: **Dr. James Shanahan**

Students: **Gelesh Omathil and Murali Cheruvu**

University: **Indiana University**


**RetinaNet**: Introduction: https://arxiv.org/pdf/1708.02002.pdf

**Dataset**: 
-	Use **COCO Dataset** (http://cocodataset.org/#home) (~100k images) for training, validation and test datasets 
-	About 1MM bounding boxes; some of the images have about 10 classes in them
-	We have 80 classes in this database
-	Our focus is only 7 classes - **car, truck, person, bus, bicycle and traffic sign**

**Cloud**: Cloud Provider Server with Linux/Ubuntu Box with **GPU**s

**Project**: **Train RetinaNet Dataset - Object Detectors**

-	Use this notebook as a base: https://github.com/fizyr/keras-retinanet
-	Use Transfer Learning (load the weights from pre-trained models)
-	Use base model on the pre-trained model from RetinaNet but focus on only 7 classes (all the other classes be treated like background images)
-	Retrain part of the network (about 6 key layers) from the transferred learning state using ResNet50/ResNet101 as a back-bone - with focus on 7 classes, so that we will recalibrate our model
-	Predict bounding boxes, predict classes - 8 classes (7 + background class), 8X5 outputs
-	Try out - output layers of different resolutions - ex: 56X56, 28x28, 14x14 (feature pyramid)
-	For each feature pyramid, we will have output layer with loss function
-	Try out with smaller epochs with CPU and full blown using GPUs



## Python Libraries

- Python 3.6 
- Keras 2.2.4+
- TensorFlow (CPU and GPU)

## Introduction

Recognizing an object from an image has always been a very challenging task. If we need detect multiple objects from the same image, it is even more difficult. Purpose of Computer Vision is to solve such complex tasks. With the emergence of Neural Network driven Machine Learning algorithms, there are better ways to tackle these tasks. 

Object detection architecture is categorized into two types: two-stage and single-stage. 

Two-stage object detectors organize the image into two parts: foreground and background. Then, all the foreground objects are classified into more fine grained classes: car, truck, person, bus, bicycle, etc. 

Convolutional Neural Network (CNN), Deep Learning, is an advanced neural network concept to perfectly handle these challenges.

We present three techniques here - (1) Region-based CNN (R-CNN), (2)  Fast R-CNN and (3) Regional Proposal Network (RPN) (Ref: @Guide-DL).

1. **Region-based CNN**

![image.png](img/r-cnn.png)(Ref: @Guide-DL)


2. **Fast R-CNN**

![image.png](img/faster-r-cnn.png) (Ref: @Guide-DL)


3. **Regional Proposal Network**

![](img/reg_prop_1.png) (Ref: @Guide-DL)

![image.png](img/reg_prop_2.png) (Ref: @Guide-DL)

## RetinaNet - One-Stage Detector

Most of the popular object detector algorithms are based on R-CNN with two-stage detection and give highest possible accuracy.
However, two-stage detection algorithms are slower due to complex processing in a iterative manner. 

Recent work to improve the performance of the algorithms, one-stage detectors come to popularity. **OverFeat** and **YOLO** (You Only Look Once) have achieved faster detection with 10%-40% accuracy relative to two-stage detectors. 

RetinaNet, Focal Loss for Dense Object Detection, is a project done by Facebook AI Research team, has proposed one-stage detector with hybrid approaches from two-stage detectors such as Feature Pyramid Network (FPN) and Mask R-CNN, to achieve the accuracy comparable with two-stage detectors. RetinaNet offers best of both single-stage and two-stage detectors.

Following figure shows the comparison of various object detection algorithms including RetinaNet:

![image.png](img/retinanet-compare.png) (Ref: @ObjDetect)

Some of the key aspects are listed as follows:

- First-pass detection, class imbalance and inefficiency is addressed using techniques such as bootstrapping and hard example mining
- Proposed a new loss function, **Focal Loss**, dynamically scaled cross entropy loss to deal with class imbalance using intuitive scaling factor to down-weight the contribution of easy samples automatically while focusing on the hard samples


(Ref: @RetinaNet-Intro)

## RetinaNet Components

"**RetinaNet is a single, unified network composed of a Feature Pyramid (backbone) network and two task-specific sub-networks**" (Ref: @RetinaNet-Intro)

![RetinaNet](img/retinanet.png "RetinaNet Architecture") (Ref: @RetinaNet-Intro)




## ResNet, CNN Network as Backbone

ResNet-50 is a popular convolutional neural network for images. It processes images by going through several convolutional filters/kernels to create various feature-maps of the images to capture high level features, then it goes down into details with smaller feature maps by using pooling layers. 

## Feature Pyramid Network

RetinaNet adds a Feature Pyramid Network (FPN), instead of, the typical classifier. Thus, RetinaNet collects feature maps at various layers from the ResNet and provides complex features at different scales. It is called pyramid network because it detects objects at different scales at different levels as it goes up in the pyramid. 

![RetinaNet](img/feature-pyramid-network.png "Feature Pyramid Network") (Ref: @RetinaNet-Int)



## Anchor Boxes

An anchor is a rectangle box with different sizes and ratios. At each FPN level, anchors are created in association with feature maps, covering each potential object. 

Each FPN level goes through two fully convolutional networks (FCN), first one is to find the regression - predicts anchor box boundaries - x1, y1, x2, y2 and the second neural network is for multi-label (N) classification. 


![RetinaNet](img/anchor-boxes.png "Anchor Boxes") (Ref: @RetinaNet-Int)

## Focal Loss

Real improvement in the accuracy of the RetinaNet is brought by using a new loss function called - Focal Loss.

Focal Loss is designed to address the image imbalance challenge between foreground and background classes during the training of the image dataset. Focal Loss assigns low-weights to the well-defined backgrounds. 

Focal Loss for the binary classification, similar to Cross Entropy (CE):


\begin{equation*}
[
        CE_{(p,y)}=\begin{cases}
                -log(p) & \text{if }y = 1\,,  \\
                -log(1 - p) & \text{if } otherwise\,.
        \end{cases}
]
\end{equation*}

In the above y belongs to {+/- 1} denotes the base class (ground-truth) and p = [0,1] is the estimated probability of the model for the class with label y = 1. We define p as:

\begin{equation*}
[
        p_{t}=\begin{cases}
                p & \text{if }y = 1\,,  \\
                1 - p & \text{if } otherwise\,.
        \end{cases}
]
\end{equation*}

\begin{equation*}
[
        Fl(p_{t})= -(1 - p_{t})^{y} log(p_{t})
]
\end{equation*}




From the focal loss function defined above, classification cross-entropy loss -log(p) by a factor of (1-p)^y. Here is y is the modulating factor between 0 and 5. The well classified background classes have higher p and lower y. This is key aspect that compels the model to learn on specific foreground classes. 



![RetinaNet](img/focal-loss.png "Focal Loss") (Ref: @ObjDetect)

Note: For complete details of the Focal Loss Object Detetion - Single-Stage Detector algorithm, please refer to the link:  https://arxiv.org/pdf/1708.02002.pdf

## Dataset

- Prepare the dataset in the CSV format (with training and cross-validaton split)
- Check the correctness of the dataset using retinanet-debug
- Train retinanet, using predefined COCO weights (with decent jump start with better accuracy and better performance)
- Optimize the training model to an inference model
- Evaluate the updated model on the cross-validaton and test datasets
- install pycocotools to test on the MS COCO dataset by running pip install git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI


## Training

- COCO dataset can be trained on RetinaNet using the python code lised in the training folder
- The default backbone is ResNet50, it can be changed to a different dataset by pasing the dataset name in the --backbone argument
- Various backbone models to try are: ResNet models (ResNet50, ResNet101), MobileNet models (MobileNet128_1:0, MobileNet128_0.75) and VGG models
- Probability of all the anchor boxes is set to 0.01
- weight decay of 0.0001 and momentum of 0.9 with initial learning rate of 0.01 is used for first 60k iterations. Learning rate is reduced by 10 after 60k to 80k iterations
- Achieved mAP = 40.8 using ResNeXt-101-FPN backbone on MS COCO dataset

Trained model needs to converted into an inteference model before proceeding to the testing.



## Usage

### Running directly from the repository:
keras_retinanet/bin/train.py coco /path/to/MS/COCO

### Using the installed script:
retinanet-train coco /path/to/MS/COCO

## Testing

## Next Steps - Projects

- NATO Innovation Challenge. The winning team of the NATO Innovation Challenge used keras-retinanet to detect cars in aerial images (COWC dataset).

- Microsoft Research for Horovod on Azure. A research project by Microsoft, using keras-retinanet to distribute training over multiple GPUs using Horovod on Azure.

- 4k video example. This demo shows the use of keras-retinanet on a 4k input video.

# References: 

We would like thank our professor - **Dr. James Shanahan** for his great guidance, continual help and support during the **Deep Learning course.**

We would also like to thank various developers and authors of the Deep Learning (CNN) related including the references given in the following links.

** Books **

- Ref: Book_DL
- Book Title: **Deep Learning**
- Authors: Ian Goodfellow, Yoshua Bengio and Aaron Courville


- Ref: Guide-DL
- Book/Guide: **A Guide to Covolutional Neural Networks for Computer Vision**
- Link: https://www.dropbox.com/s/789qiaq0svh4270/A%20Guide%20to%20Convolutional%20Neural%20Networks%20for%20Computer%20Vision.pdf?dl=0
- Editors: Gérard Medioni, University of Southern California and Sven Dickinson, University of Toronto

** Videos **

- Title: **Courseera CNN course - Object Detection and Localization**
- Link: https://www.coursera.org/lecture/convolutional-neural-networks/object-detection-VgyWR
- Professor: Andrew Ng


** Web Articles **

- Title: **Back-Propogation is very simple. Who made it complicated?**
- Link: https://medium.com/@14prakash/back-propagation-is-very-simple-who-made-it-complicated-97b794c97e5c
- Author: Prakash Jay
- Date: 20-Apr-2017


- Title: **An intutive guide to Convolutional Neural Networks**
- Link: https://medium.freecodecamp.org/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050
- Author: Daphane Cornelisse
- Date: 24-Aprl-2018


- Title: **Understanding of Convolutional Neural Network (CNN) - Deep Learning**
- Link: https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148
- Author: Prabhu
- Date: 04-Mar-2018


- Title: **Implementation of Training Convolutional Neural Networks**
- Link: https://arxiv.org/ftp/arxiv/papers/1506/1506.01195.pdf
- Authors: Tianyi Liu, Shuangsang Fang, Yuehui Zhao, Peng Wang, Jun Zhang
- University of Chinese Academy of Sciences, Beijing, China


- Title: **A Beginner's Guide to Neural Networks and Deep Learning**
- Link: https://skymind.ai/wiki/neural-network
- Author: AI Wiki


- Title: **LeNet5 - A Classic CNN Architecture**
- Link: https://engmrk.com/lenet-5-a-classic-cnn-architecture/
- Author: Muhammad Rizwan
- Date: 30-Sept-2018


- Ref: @RetinaNet-Intro
- Title: **RetinaNet Introduction**
- Link: https://arxiv.org/pdf/1708.02002.pdf
- Authors: Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollar
- Facebook AI Research (FAIR)

- Title: **COCO (Community Objects in Context) Image Dataset **
- Link: http://cocodataset.org/#home


- Ref: @ObjDetect
- Title: Object Detection with Deep Learning on Aerial Imagery
- Link: https://medium.com/data-from-the-trenches/object-detection-with-deep-learning-on-aerial-imagery-2465078db8a9
- Author: Arthur Douillard
- Date: 22-Jun-2018

- Ref: @RetinaNet-Int
- Title: The intuition behind RetinaNet
- Link: https://medium.com/@14prakash/the-intuition-behind-retinanet-eb636755607d
- Author: Prakash Jay
- Date: 23-Mar-2018

**GitHub Links:**

- Title: **Convolutional Neural Network**
- Link: https://github.com/mbadry1/DeepLearning.ai-Summary/tree/master/4-%20Convolutional%20Neural%20Networks
- Author: Mahmoud Badry


- Title: **Keras RetinaNet**
- Link: https://github.com/fizyr/keras-retinanet
- Author: Fizyr
