# Assignment 1 Report

This is an outline for your report to ease the amount of work required to create your report. Jupyter notebook supports markdown, and I recommend you to check out this [cheat sheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet). If you are not familiar with markdown.

Before delivery, **remember to convert this file to PDF**. You can do it in two ways:
1. Print the webpage (ctrl+P or cmd+P)
2. Export with latex. This is somewhat more difficult, but you'll get somehwat of a "prettier" PDF. Go to File -> Download as -> PDF via LaTeX. You might have to install nbconvert and pandoc through conda; `conda install nbconvert pandoc`.

# Task 1

## task 1a)

### Intersection over Union (IoU)

Intersection over Union (IoU) is a metric used in object detection tasks to evaluate the overlap between two bounding boxes. It is calculated using the following formula:

$$
IoU = \frac{{\text{Intersection Area}}}{{\text{Union Area}}}
$$

where:
- Intersection Area (IA) is the area where the two bounding boxes overlap.
- Union Area (UA) is the total area covered by both bounding boxes.

IoU ranges from 0 to 1, with higher values indicating greater overlap between the bounding boxes.

![](images/IoU_drawing1.jpg)
![](images/IoU_drawing2.jpeg)


## task 1b)

**Precision**: Ratio of true positive predictions to total positive predictions made by the model. It measures the accuracy of positive predictions:

$$
\text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}
$$

**Recall**: Ratio of true positive predictions to all actual positive instances in the dataset. It indicates the model's ability to identify positive instances:

$$
\text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}
$$

- **True Positive (TP)**: Represents the instances that are correctly predicted as positive by the model. For example, in a medical diagnosis scenario, a true positive would occur when the model correctly identifies a patient with a disease as having the disease.

- **False Positive (FP)**: Denotes the instances that are incorrectly predicted as positive by the model. In other words, these are instances where the model predicts a positive outcome when it should have predicted a negative outcome. For instance, in the medical diagnosis example, a false positive would happen if the model incorrectly labels a healthy patient as having the disease.

## task 1c)

1. Calculate Average Precision (AP) for each class using the precision and recall values provided, using the trapezoidal rule:

$$
\text{AP}_i = \int_{0}^{1} \text{precision}_i(r) \, \text{d}r
$$

where $\text{precision}_i(r)$ is the precision for class $i$ at recall level $r$.

2. Compute Mean Average Precision (mAP) by taking the average of AP values for all classes:

$$
\text{mAP} = \frac{1}{N} \sum_{i=1}^{N} \text{AP}_i
$$

where $N$ is the number of classes.



In [6]:
import numpy as np

def calculate_average_precision(precision, recall):
    mAP = np.trapz(precision, recall)
    return mAP

# Precision and recall curve for class 1
precision1 = [1.0, 1.0, 1.0, 0.5, 0.20]
recall1 = [0.05, 0.1, 0.4, 0.7, 1.0]

# Precision and recall curve for class 2
precision2 = [1.0, 0.80, 0.60, 0.5, 0.20]
recall2 = [0.3, 0.4, 0.5, 0.7, 1.0]

# Calculate mean average precision (mAP) for each class
mAP_class1 = np.trapz(precision1, recall1)
mAP_class2 = np.trapz(precision2, recall2)

# Calculate mean average precision (mAP) across both classes
mAP = np.mean([mAP_class1, mAP_class2])

print("Mean Average Precision (mAP) for Class 1:", mAP_class1)
print("Mean Average Precision (mAP) for Class 2:", mAP_class2)
print("Mean Average Precision (mAP) across both classes:", mAP)

Mean Average Precision (mAP) for Class 1: 0.6799999999999999
Mean Average Precision (mAP) for Class 2: 0.375
Mean Average Precision (mAP) across both classes: 0.5275


# Task 2
### Understanding the Precision-Recall Curve

This plot shows how well our model identifies true positives (correct predictions) while avoiding false positives (incorrect positive predictions).

- **Early Performance:** Initially, the model does exceptionally well; it accurately predicts most of the positive cases without making many mistakes. This is why we see high precision even as the model identifies a larger percentage of the true positives, which is what recall measures.

- **Decline in Precision:** As the recall approaches 1 (meaning the model tries to identify all true positives), there's a sharp drop in precision. This indicates that to find all positives, the model starts to mislabel more negative cases as positive, leading to more errors.

- **Analyzing Results:** The model is highly effective up to a recall of about 0.9, maintaining accuracy while catching most positives. Past this point, trying to catch every single positive results in a significant increase in false positives. For practical use, it suggests that we might want to set a threshold that balances recall and precision before this sharp decline, to keep both false positives and false negatives at acceptable levels for our specific application.

![Precision recall curve](precision_recall_curve.png)

# Task 3

### Task 3a)
The filtering operation used to remove overlapping bounding boxes in SSD during inference is called **Non-Maximum Suppression (NMS)**.

### Task 3b)
**False.** In the SSD architecture, predictions from the deeper layers are responsible for detecting larger objects. It's the predictions from the earlier layers that are responsible for detecting smaller objects.

### Task 3c)
They use different shapes (aspect ratios) for the bounding boxes at the same spot on the image to better identify objects that look different—like tall trees or wide cars. This helps the SSD predict what the object is (class scores) and how to frame it accurately (by adjusting four key measurements from the original box), improving its ability to recognize and locate diverse objects within the scene.

### Task 3d)
The main difference between SSD and YOLOv1/v2 is that SSD detects objects across multiple scales using different-sized feature maps, while YOLOv1/v2 typically uses a single-scale feature map. This makes SSD more effective at identifying objects of various sizes, especially smaller ones.

**SSD:**

Pros:
- Better at detecting small objects due to its use of multiple feature maps.
- Generally provides high accuracy.

Cons:
- More complex and can be slightly slower than YOLO due to its detailed multi-scale approach.

**YOLOv1/v2:**

Pros:
- Extremely fast, making it ideal for real-time applications.
- Simpler architecture due to its single-scale, single-shot approach.

Cons:
- Less effective at detecting small objects compared to SSD.
- Lower localization accuracy, leading to less precise object detection.



### Task 3e)
To calculate the total number of anchor boxes for the given SSD framework with a feature map resolution of $38 \times 38$ and 6 different aspect ratios per anchor location, we use:

$$
\text{Total number of anchors} = \text{Number of anchor locations} \times \text{Number of aspect ratios}
$$


$$
\text{Number of anchor locations} = \text{Height} \times \text{Width}
$$

For a $38 \times 38$ feature map:
$$
\text{Number of anchor locations} = 38 \times 38 = 1444
$$

Now, since we have 6 different aspect ratios for each anchor location, the total number of anchor boxes will be:

$$
\text{Total number of anchors} = 1444 \times 6 = 8664
$$



### Task 3f)
The total number of anchor boxes for the entire network can be calculated by summing up the anchor boxes for each feature map: 

Each feature map has **6** aspect ratios.

$38 \times 38 \times 6 = 8664$ anchor boxes

$19 \times 19 \times 6 = 2166$ anchor boxes

$10 \times 10 \times 6 = 600$ anchor boxes

$5 \times 5 \times 6 = 150$ anchor boxes

$3 \times 3 \times 6 = 54$ anchor boxes

$1 \times 1 \times 6 = 6$ anchor boxes

$Sum = 8664 + 2166 + 600 + 150 + 54 + 6 = 11640$ **anchor boxes**


In [8]:
# Define resolutions of feature maps
resolutions = [(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)]
aspect_ratios = 6

# Calculate total number of anchor boxes
total_anchors = 0
for resolution in resolutions:
    height, width = resolution
    total_anchors += height * width * aspect_ratios

total_anchors

11640

# Task 4

## Task 4b)

FILL IN ANSWER. 

## Task 4c)
FILL IN ANSWER. 


## Task 4d)
FILL IN ANSWER. 


## Task 4e)
FILL IN ANSWER. 


## Task 4f)
FILL IN ANSWER. 