# Overview

This notebook describes the initial stage of experiments between the initial model provided with the Udacity starter repository along with the two models that I trained. I kept the `pipeline.config` as close as possible to the ones provided in the tensorflow source code examples for the respective models with only the necessary changes required to make them running. These changes included updating the following:

1. `model.{architecture}.num_classes`
2. `train_config.fine_tune_checkpoint`
3. `train_config.fine_tune_checkpoint_type`
4. `train_config.batch_size`
5. `train_config.num_steps`
6. `train_input_reader`
7. `eval_input_reader`

The goal was the get the basic version up and running on the AWS infra while getting a deeper understanding of the TF OD API. The `Model-Improvements` folder of the write-up describe my further efforts in experimenting with **augmentations** and **hyperparameters** and the results coming from them.

In the next cells, I describe the model architecture choices that I made, along with the reasoning behind those choices. After that, I present the `mAP`, `recall` and `losses` comparisions amongst the chosen models and my understanding from that.


# Models

## Base Model: `SSD with EfficientNet-b1 + BiFPN feature extractor`
This model was provided by default in the starter code repository.

### Description
**Architecture**: Single-Shot Detector (1-Stage)

**Backbone**: EfficientNet-b1 + BiFPN

**Expected Speed (Model Zoo)**: 54ms

**Expected mAP (Model Zoo)** 38.4

**Input Shape**: 640x640

---

## Model 1: `Faster R-CNN ResNet152 V1 640x640`

### Description
**Architecture**: Faster R-CNN (2-Stage)

**Backbone**: ResNet152

**Expected Speed (Model Zoo)**: 64ms

**Expected mAP (Model Zoo)** 32.4

**Input Shape**: 640x640

### Reason for choosing this Model
* I wanted to experiment with an architecture that was not SSD since that was already given. R-CNN is quite a stable and well reputed architecture in terms of mAP, so this became my first choice to experiment on.
* Same image input size meant no changes to the given tfrecords data source were needed, so that acted as the second reason.
* Having ResNet152 meant the more complex features should be detected with accuracy and since this is for an autonomous vehicle, we would want to capture with as much accuracy as possible. 

### Related files

* Pipeline configuration: link
* Model training code: link

### Outcomes
* The losses graph below for this model show unstable losses as compared to the other models. This should be fixable using a smaller learning rate.

---

## Model 2: `SSD MobileNet V2 FPNLite 640x640`
**Architecture**: Sing-Shot Detector (1-Stage)

**Backbone**: MobileNet

**Expected Speed (Model Zoo)**: 39ms

**Expected mAP (Model Zoo)** 28.2

**Input Shape**: 640x640

### Reason for choosing this Model
* Autonomous vehicles need a near realtime model for object detection tasks. This model was the fastest one available with the 640x640 input size which had ~30 mAP. So that made me choose this model.
* Secondly, I wanted to explore the workings of a Feature Pyramid Network along with a different backbone than ResNet. This qualified for both my requirements.

### Related files
* Pipeline configuration: link
* Model training code: link


# Mean Average Precision (mAP) Comparision

To make the mAP and other comparisions easier to understand, I manually modified the folder structure and names of the tensorboard logs downloaded from the S3 bucket.

As mentioned in the bottom-left box also, the colored dots describe these models:

Default: <span style="color: #0c62ac">Given EfficientDet</span>

Model 1: <span style="color: #2dade9">Faster R-CNN</span>

Model 2: <span style="color: #118a76">SSD Mobilenet</span>



![mAP Comparision](initial_precision_comparision.png "mAP")

## Observations

1. A clear observation is that both the models I chose show much better precision in all of the different COCO metrics. Although, the reason behind this is still unclear to me as the default model is supposed to have a better mAP according to the model zoo table. **(So if I had more time and resources, I'd try and retrain the base model again to figure out why that is behaving this way)**

2. All the models were trained for 2000 steps, and as you can see in the graph, other than the **mAP (large)** value, both my trained models have very similar performance in all the other metrics.

3. **Best model for this problem**: If the constraints include fixing the num of training steps and the resources available, then the **SSD MobileNet V2 FPNlite** seems like the best model to me for this problem. Reason 1 being, it performs similar to the heavier and bigger RCNN model in terms of mAP. Reason 2 being, it performs predictions much faster and that is a big requirement in cases like self-driving vehicles.

# Recall Comparision

![mAP Comparision](initial_recall_comparision.png "mAP")

## Observations

1. The Average Recall @ (1,10,100) also shows a somewhat similar trend to that of the mAP but **SSD MobileNet** model has a higher recall than the *R-CNN* one for AR@100 and AR@100 (small).

2. This shows that R-CNN struggles in a few cases where:
   1. The density of detections is high in a smaller region
   2. Detecting smaller objects with `area < 32x32`

# Validation Loss vs Training Loss

## Model 1 [Faster R-CNN]
![Faster R-CNN Losses 1](loss_1_faster_rcnn.png "Faster R-CNN Losses 1")
![Faster R-CNN Losses 2](loss_2_faster_rcnn.png "Faster R-CNN Losses 2")

### Observations

Since this was a R-CNN model, the losses were divided into Region Proposal losses and Box Classifier losses. Throughout the training, I observed that the losses were unstable, increasing and decreasing across some steps which is not an ideal case. I further explored it in the `Model-Improvements` section of the write up by experimenting with the hyper params.

But the total training loss went from **1.635** to **0.7083** which is a positive sign that model has been improving along the 2000 steps that it got trained on.

Final Loss Values:

1. **Loss/BoxClassifierLoss/classification_loss**: Training: 0.1798 | Eval: 0.16
2. **Loss/BoxClassifierLoss/localization_loss**: Training: 0.2017 | Eval: 0.2185
3. **Loss/RPNLoss/localization_loss**: Training: 0.2971 | Eval: 0.5155
4. **Loss/RPNLoss/objectness_loss**: Training: 0.029 | Eval: 0.044
5. **Loss/classification_loss**: Not Applicable (Since this is a 2-stage detector)
6. **Loss/localization_loss**: Not Applicable (Since this is a 2-stage detector)
7. **Loss/regularization_loss**: Not Applicable (Since the regularizer is set to 0 in the config file)
8. **Loss/total_loss**: Training: 0.7083 | Eval: 0.9386

### Assignment Questions

**How does the validation loss compare to the training loss?**

Since the evaluation loss is calculated only once after the 2000 steps, we don't know the exact trend it followed but its value **0.9386** as compared to the **0.7083** training loss hints that the model might be overfitting a little.

**Did you expect such behavior from the losses/metrics?**

The difference between evaluation and training losses were expected but the unstable losses were not.

**What can you do to improve the performance of the tested models further?**

Image augmentation and Hyperparameter tuning are the next steps to improve this further. These are explored along with their results in the `Model-Improvements` section of the write up

---

## Model 2 [SSD MobileNet]
![SSD MobileNet Losses 1](loss_1_ssd_mobilenet.png "SSD MobileNet Losses 1")
![SSD MobileNet Losses 2](loss_2_ssd_mobilenet.png "SSD MobileNet Losses 2")

### Observations
The loss graph for SSD MobileNet model was much more stable than the Faster R-CNN and also ended up overfitting.

The total training loss went from **0.8646** to **0.5264** which is a positive sign that model has been improving along the 2000 steps that it got trained on.

Final Loss Values:

1. **Loss/BoxClassifierLoss/classification_loss**: Not Applicable (Since this is a 1-stage detector)
2. **Loss/BoxClassifierLoss/localization_loss**: Not Applicable (Since this is a 1-stage detector)
3. **Loss/RPNLoss/localization_loss**: Not Applicable (Since this is a 1-stage detector)
4. **Loss/RPNLoss/objectness_loss**: Not Applicable (Since this is a 1-stage detector)
5. **Loss/classification_loss**: Training: 0.1571 | Eval: 0.2351
6. **Loss/localization_loss**: Training: 0.2236 | Eval: 0.3907
7. **Loss/regularization_loss**: Training: 0.1491 | Eval: 0.1491
8. **Loss/total_loss**: Training: 0.5298 | Eval: 0.7749

### Assignment Questions

**How does the validation loss compare to the training loss?**

Since the evaluation loss is calculated only once after the 2000 steps, we don't know the exact trend it followed but its value **0.7749** as compared to the **0.5298** training loss hints that the model might be overfitting a little.

**Did you expect such behavior from the losses/metrics?**

The difference between evaluation and training losses were expected.

**What can you do to improve the performance of the tested models further?**

Image augmentation and Hyperparameter tuning are the next steps to improve this further. These are explored along with their results in the `Model-Improvements` section of the write up