# Assignment 4 - Martin Kvisvik Larsen

## Task 1 - Object Detection Metrics

### a - Intersection over union

The intersection over union is a ratio that can be used to evaluate how well an object detection algorithm that uses bounding boxes to define object detections performs. The intersection over union is defined as

\begin{equation}
a_0 = \frac{\text{Area}(\text{B}_{\text{p}} \cap \text{B}_{\text{gt}})}{\text{Area}(\text{B}_{\text{p}} \cup \text{B}_{\text{gt}})}
\end{equation}

where $\text{B}_{\text{p}}$ is the bounding box for the object detection prediction of the algorithm, $\text{B}_{\text{gt}}$ is the ground truth bounding box. The intersection over union can be illustrated by the following figures:

Bounding boxes | Intersection | Union
-|-|-
![bounding_boxes.png](attachment:bounding_boxes.png) | ![intersect.png](attachment:intersect.png) | ![union.png](attachment:union.png)

Here the intersection over union would be:

\begin{equation}
a_0 = \frac{\text{Purple area}}{\text{Orange area}}
\end{equation}

### b - True positive, false positive

A true positive means a positive prediction that is correct, for instance in object detection it would mean that the object detection algorithm would predict that there is an object in an image that contains an object. A false positive means a positive prediction that is incorrect, in the case of object detection it would mean that the object detection algorithm would predict that there is an object in an image that does not contain an object.

### c - Equation of precision and recall

Equation for precision:

\begin{equation}
    \text{p} = \frac{\text{True positives}}{\text{True positives} + \text{False positives}}
\end{equation}

Equation for recall:

\begin{equation}
    \text{r} = \frac{\text{True positives}}{\text{True positives} + \text{False negatives}}
\end{equation}

### d - Calculating mean average precision

| Precision and recall curve for class 1|
|-|-|-|-|-|-|
| Precison | 1.0 | 1.0 | 1.0 | 0.5 | 0.2 |
| Recall | 0.05 | 0.1 | 0.4 | 0.7 | 1.0 |

| Precision and recall curve for class 2|
|-|-|-|-|-|-|
| Precison | 1.0 | 0.8 | 0.6 | 0.5 | 0.2 |
| Recall | 0.3 | 0.4 | 0.5 | 0.7 | 1.0 |

The equation for average precision is

\begin{equation}
    \text{AP} = \frac{1}{11} \sum_{r \in \{ 0, 0.1, ... , 1 \}} p_{interp} (r)
\end{equation}

where the interpolated precision is defined as

\begin{equation}
    p_{interp} (r) = \text{max}_{{\widetilde{r}}, {\widetilde{r}} \geq {r}} \hspace{2mm} p({\widetilde{r}})
\end{equation}

The equation for the mean average precision for K different classes is

\begin{align}
    \text{mAP} = \frac{1}{K} \sum_{i=1}^{K} \text{AP}_{i}
\end{align}

AP calculations for class 1:

\begin{align}
    \sum_{r \in \{ 0, 0.1, ... , 1 \}} p_{interp,1} (r) 
    = &\quad p_{interp,1}(0) + p_{interp,1}(0.1) + p_{interp,1}(0.2) + p_{interp,1}(0.3) \\
    & + p_{interp,1}(0.4) + p_{interp,1}(0.5) + p_{interp,1}(0.6) + p_{interp,1}(0.7) \\
    & + p_{interp,1}(0.8) + p_{interp,1}(0.9) + p_{interp,1}(1) \\
    = &\quad 1.0 + 1.0 + 1.0 + 1.0 \\
    & + 1.0 + 0.5 + 0.5 + 0.5 \\
    & + 0.2 + 0.2 + 0.2 \\
    = &\quad 7.1 \\
    \text{AP}_{1} 
    = &\quad \frac{1}{11} \sum_{r \in \{ 0, 0.1, ... , 1 \}} p_{interp,1} (r) \\
    = &\quad \frac{1}{11} \cdot 7.1 \\
    \approx &\quad 0.645
\end{align}


AP calculations for class 2:
\begin{align}
    \sum_{r \in \{ 0, 0.1, ... , 1 \}} p_{interp,2} (r) 
    = &\quad p_{interp,2}(0) + p_{interp,2}(0.1) + p_{interp,2}(0.2) + p_{interp,2}(0.3) \\
    & + p_{interp,2}(0.4) + p_{interp,2}(0.5) + p_{interp,2}(0.6) + p_{interp,2}(0.7) \\
    & + p_{interp,2}(0.8) + p_{interp,2}(0.9) + p_{interp,2}(1) \\
    = &\quad 1.0 + 1.0 + 1.0 + 1.0 \\
    & + 0.8 + 0.6 + 0.5 + 0.5 \\
    & + 0.2 + 0.2 + 0.2 \\
    = &\quad 7 \\
    \text{AP}_{2} 
    = &\quad \frac{1}{11} \sum_{r \in \{ 0, 0.1, ... , 1 \}} p_{interp,2} (r) \\
    = &\quad \frac{1}{11} \cdot 7 \\
    \approx &\quad 0.636
\end{align}

The mean average precision is then:

\begin{align}
    \text{mAP} 
    = &\quad \frac{1}{2} \big( \text{AP}_{1} + \text{AP}_{2} \big)\\
    \approx &\quad \frac{1}{2} \big( 0.645 + 0.636 \big)\\
    \approx &\quad 0.641
\end{align}

## Task 2 - Implementing Mean Average Precision

### f - Final precision-recall curve

![precision_recall_curve.png](attachment:precision_recall_curve.png)

## Task 3 - You Only Look Once (YOLO)

### a - YOLO limitations

- YOLO imposes strong spatial constraints on bounding box predictions since there can only be two bounding boxes in each cell of the image grid. This means that the algorithm can only find two objects within each image cell and is thus not suited for finding multiple objects that are close together.
- YOLO struggles generalize to objects with aspect ratios or configurations that it has not seen before as it learns to predict bounding boxes from data.
- Due to multiple downsampling layers YOLO uses relatively coarse features to predict bounding boxes.
- Due to the loss function used in YOLO, which approximates detection performance, errors in large and small bounding boxes are treated equally. Hence it does not take the intersect over union into account.

### b - Sliding window for object detection

False, YOLO uses the entire image during training and testing.

### c - Differences between Fast YOLO and YOLO

Fast YOLO uses 9 convolutional layers in its architecture, while YOLO uses 24 convolutional layers. In addition the convolutional layers of Fast YOLO has less filters than the convolutional layers of YOLO.

### d - Comparison of YOLOv1 and Faster R-CNN

| | YOLO | Faster R-CNN |
|-|-|-|
|Strong point| Few background errors | Few localization errors |
|Weak point | Many localization errors | Many background errors |