# Assignment 1 Report

This is an outline for your report to ease the amount of work required to create your report. Jupyter notebook supports markdown, and I recommend you to check out this [cheat sheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet). If you are not familiar with markdown.

Before delivery, **remember to convert this file to PDF**. You can do it in two ways:
1. Print the webpage (ctrl+P or cmd+P)
2. Export with latex. This is somewhat more difficult, but you'll get somehwat of a "prettier" PDF. Go to File -> Download as -> PDF via LaTeX. You might have to install nbconvert and pandoc through conda; `conda install nbconvert pandoc`.

# Task 1

## task 1a)
Intersection-over-Union is a validation metric that calculates the ratio of the intersected area and union area of a predicted bounding box and the ground truth bounding box.
![](task1a.png)
In the illustration above. The ground truth bounding box is the green outline, the predicted bounding box is the red outline. Grey is the intersection and Grey+Blue is the union.

$ IoU = \frac{Intersection}{Union} = \frac{Grey}{Grey+Blue}$


## task 1b)
A true positive (TP) is a correctly labeled prediction of an item belonging to the positive class. A false positive (FP) is a incorrectly labeled prediction of an item belonging to the positive class.

Precision is the ratio of positive predictions that are correct.

$ Precision = \frac{TP}{TP+FP}$

Recall is correct positives found over all existing positives in the ground truth.

$ Recall = \frac{TP}{TP+FN}$

## task 1c)


In [3]:
import numpy as np

precision_1 = [1.0, 1.0, 1.0, 0.5, 0.20]
recall_1 = [0.05, 0.1, 0.4, 0.7, 1.0]
precision_2 = [1.0, 0.80, 0.60, 0.5, 0.20]
recall_2 = [0.3, 0.4, 0.5, 0.7, 1.0]

# By hand:
# 1 + 1 + 1 + 1 + 1 + .5 + .5 + .5 + .2 + .2 + .2 = .6455
# 1 + 1 + 1 + 1 + .8 + .6 + .5 + .5 + .2 + .2 + .2 = .6364
# avg = .6409

# Implementation used in 2e. The recall-level variable: "recall" is rounded to 1 decimal as np.linspace was not able to produce
# the desr
avg_precision_1 = 0
avg_precision_2 = 0
for recall in np.linspace(0, 1.0, 11): 
    precisions_to_right_1 = [p for p, r in zip(precision_1, recall_1) if r >= round(recall,1)]
    precisions_to_right_2 = [p for p, r in zip(precision_2, recall_2) if r >= round(recall,1)]
    avg_precision_1 += max(precisions_to_right_1)
    avg_precision_2 += max(precisions_to_right_2)

avg1 = avg_precision_1/11
avg2 = avg_precision_2/11

print("mAP Class 1: ", avg1)
print("mAP Class 2: ", avg2)
print("mAP: ", (avg1+avg2)/2)

mAP Class 1:  0.6454545454545455
mAP Class 2:  0.6363636363636364
mAP:  0.6409090909090909


# Task 2

### Task 2f
![](task2/precision_recall_curve.png)

# Task 3

### Task 3a)
The process of filtering out a set of overlapping boxes is called non-maximum suppression.

### Task 3b)
False; The first layers are better at detecting small objects, not the deeper layers. The first layers analyze the image at higher resolutions, thus having better conditions for small-object detection.

### Task 3c)
As different objects have different shapes, we wish to use various aspect-ratios for bounding boxes. An object's shape has a typical ratio, e.g., humans typically have the ratio 0.41 (which equates to high and narrow rectangle). We use different ratios to be able to detect the different objects in the image. This makes us able to cover various input object sizes and shapes. Using a variety of default box shapes/aspect ratios makes the task of predicting bounding boxes easier for the network. 

### Task 3d)
The main difference is that YOLOv1/v2 uses a single scale feature map while SSD uses a multi-scale feature map.

### Task 3e)
We have that WxHxK = 38x38x6 = 8664

### Task 3f)
For each resolution:

38x38: 38x38x6 = 8664

19x19: 19x19x6 = 2166

10x10: 10x10x6 = 600

5x5: 5x5x6 = 150

3x3: 3x3x6 = 54

1x1: 1x1x6 = 6

Total: 8664 + 2166 + 600 + 150 + 54 + 6 = 11640

# Task 4

## Task 4b)
Final mAP was 80.11% after 10 000 iterations. 

![](SSD/notebooks/basic_plot.png)

## Task 4c/d)
To improve our model we first implemented batch normalization, before all the activation functions, which improved the model by 3% and achieved an mAP of 83.03%. Second we changed the activation functions to LeakyReLU, with a = 0.005, we tried with a equal to 0.01, 0.1, and 0.2, but this decreased the mAP. With a=0.005 the mAP resulted in 82.26%. We changed the optimizer to Adam, which gave a small improvement and achieved mAP on 83.38%. Further on we changed the min box sizes because some of the numbers in the picture are quite small. We changed each variable in MODEL.PRIORS.MIN by 20 pixels, and achieved a mAP on 84%. 
Our model was quite unstable during training and therefore we decreased the learning rate to 2e^(-4), which gave a quite significant improvement with an mAP around 89%. Here it could also be an idea to decrease the learning rate during training, since decreasing the learning rate makes the model learn slower. The model was still a little bit unstable and usually decreased a little by the end of the training which showed a tendency of overfitting. Therefore, we increased the weight decay, and by that implementing an even harder L2 regularization to avoid overfitting. This did not change the final mAP significantly, but made the model more stable and avoided a big decrease at the end of training.

Final mAP was 89.44% after 10 000 iterations. 

Plot over total loss is included below, with loss on the y-axis and iteration on the x-axis. 

![](SSD/notebooks/basicimproved_plot.png)


## Task 4e)
There were several digits the model did not detect, from the images we can see that it is an overweight of digits smaller in size that the model could not detect. This could caused by not haing small enough bounding boxes.


![](SSD/demo/mnist/result/0.png)
![](SSD/demo/mnist/result/1.png)
![](SSD/demo/mnist/result/2.png)
![](SSD/demo/mnist/result/3.png)
![](SSD/demo/mnist/result/4.png)
![](SSD/demo/mnist/result/5.png)
![](SSD/demo/mnist/result/6.png)
![](SSD/demo/mnist/result/7.png)
![](SSD/demo/mnist/result/8.png)
![](SSD/demo/mnist/result/9.png)


## Task 4f)
Final mAP was 21.72% after 5000 iterations. 

![](SSD/notebooks/vgg_plot.png)
![](SSD/demo/voc/result/000342.png)
![](SSD/demo/voc/result/000542.png)
![](SSD/demo/voc/result/003123.png)
![](SSD/demo/voc/result/004101.png)
![](SSD/demo/voc/result/008591.png)