# Exercise 1.7.1 — Scene Understanding
#### By Jonathan L. Moran (jonathan.moran107@gmail.com)
From the Self-Driving Car Engineer Nanodegree programme offered at Udacity.

## Objectives

* Compute the Mean Intersection over Union (IoU) of the multi-class segmentation label predictions.

## 1. Introduction

In [1]:
### Importing required modules

In [2]:
import numpy as np
import os
import tensorflow as tf
from typing import List, Union, Tuple

In [3]:
tf.__version__

'2.9.2'

In [4]:
tf.test.gpu_device_name()

''

In [5]:
### Setting the environment variables

In [6]:
ENV_COLAB = True                # True if running in Google Colab instance

In [7]:
# Root directory
DIR_BASE = '' if not ENV_COLAB else '/content/'

In [8]:
# Subdirectory to save output files
DIR_OUT = os.path.join(DIR_BASE, 'out/')
# Subdirectory pointing to input data
DIR_SRC = os.path.join(DIR_BASE, 'data/')

In [9]:
### Creating subdirectories (if not exists)
os.makedirs(DIR_OUT, exist_ok=True)

### 1.1. Scene Understanding 

#### Background

TODO.

#### Metrics — Intersection over Union (IoU)

In the [very first exercise](https://github.com/jonathanloganmoran/ND0013-Self-Driving-Car-Engineer/blob/main/1-Computer-Vision/Exercises/1-1-1-Choosing-Metrics/2022-07-25-Choosing-Metrics-IoU.ipynb) of this course, we covered the Intersection over Union (IoU) metric and its application to the bounding box prediction task. Now, we use the IoU metric again but this time for semantic segmentation and scene understanding.

We start with the same general formula for the IoU score given by:

$$
\begin{align}
\mathrm{IoU} &= \frac{\textrm{Area of Intersection}}{\textrm{Area of Union}},
\end{align}
$$

but now we calculate the IoU score using the following binary classification metrics:

$$
\begin{align}
\mathrm{IoU} &= \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FP} + \mathrm{FN} + \mathrm{TN}}.
\end{align}
$$

With this form of the IoU equation, all we need to do is compute the true positive ($\mathrm{TP}$), true negative ($\mathrm{TN}$), and the false positive ($\mathrm{FN}$), false negative ($\mathrm{FN}$) rates. For the image segmentation task, this boils down to the pixel-wise classification predictions. Thankfully, the algorithms we designed in [Sect. 2.1](https://github.com/jonathanloganmoran/ND0013-Self-Driving-Car-Engineer/blob/main/1-Computer-Vision/Exercises/1-1-1-Choosing-Metrics/2022-07-25-Choosing-Metrics-IoU.ipynb) of Exercise 1.1.1 hold; all we need to do is compute the pixel-wise classification metrics for each class using the same tabular approach as before. With these metrics, we evaluate the $\mathrm{IoU}$ formula and obtain a score indicating the amount of "overlap" between the predicted region and the true region of each segmented object. 

Let's illustrate this with a simple example:

```python
ground_truth_labels = [
    [0, 0, 0, 0], 
    [1, 1, 1, 1],
    [2, 2, 2, 2], 
    [3, 3, 3, 3],
]
predicted_labels = [
    [1, 0, 0, 0],
    [1, 3, 0, 1],
    [2, 2, 2, 3],
    [3, 1, 0, 0],
]
```

Above we define a set of _ground-truth_ and _predicted_ labels. Each row in the matrix corresponds to a class; looking at the first row of the `ground_truth_labels` ("$\mathrm{A}$") matrix, we see that class `0` should appear at all four pixel locations. Looking at the first row of `predicted_labels` ("$\mathrm{B}$" matrix), we see instead that only three of the four pixel locations were given a correct prediction of class `0`. In other words, we have in the first row a $\mathrm{TP} = 3$. Now, we need to compute for class `0` the false positive ($\mathrm{FP}$) rate. To do this, we examine the _other_ pixel locations (i.e., other rows of the `predicted_labels` matrix), and add up any occurrences of class label `0` where the corresponding entries in `ground_truth_labels` do not match. Since class label `0` was predicted _incorrectly_ at pixel locations $\mathrm{B}_{2, 3}$, $\mathrm{B}_{3, 3}$, and $\mathrm{B}_{4, 4}$, we have a $\mathrm{FP} = 3$. Now let's complete the calculations for the two other metrics: true negative ($\mathrm{TN}$) and false negative ($\mathrm{FN}$). The $\mathrm{TN}$ value for this problem is easy to compute, since we assume all predictions here were valid (i.e., that we expected a class label to be predicted for every pixel in `predicted_labels`). That means our $\mathrm{TN} = 0$. Lastly, our $\mathrm{FN}$ rate is computed as the number of _incorrect_ predictions for class `0`. Looking at the first row of the `predicted_labels` matrix (i.e., the "predictions" for class `0`), we count the number of label predictions that are _not_ equal to class `0` to get our false negative rate. With _one_ incorrect class `0` prediction at the first index $\mathrm{B}_{1, 1} = $ `1`, we have therefore a $\mathrm{FN} = 1$. 

With these four classification metrics out of the way, we can obtain the $\mathrm{IoU}$ score for class `0` as follows:

$$
\begin{align}
\mathrm{IoU}_{0} &= \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FP} + \mathrm{FN} + \mathrm{TN}} = \frac{3}{3 + 3 + 1 + 0} = \frac{3}{7} \approx 0.4286.
\end{align}
$$

Now that we have the $\mathrm{IoU}$ score for class `0` computed, we repeat the process for the other three rows (classes) in `predicted_labels` to obtain each classes' respective $\mathrm{IoU}$ score. Once we have completed the calculations of all four classes, we can take the average to obtain the $\mathrm{IoU}_{\textrm{mean}}$, as simply:

$$
\begin{align}
\mathrm{IoU}_{\textrm{mean}} &= \frac{1}{n}\sum_{i=0}^{n} \mathrm{IoU}_{i},
\end{align}
$$

which is nothing but the sum of the per-class $\mathrm{IoU}$ scores divided by the total number of classes.

### 1.2. FCN-8

#### Network Architecture — Encoder and Decoder 

TODO.

#### Segmentation — Classifiction and Loss

TODO.

## 2. Programming Task

### 2.1. Intersection over Union (IoU)

Here we use the TensorFlow [`tf.keras.metrics.MeanIoU`](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/MeanIoU) function to compute the mean Intersection over Union (IoU) across all classes $i=0,\ldots, n$.

In order to use the metric as a standalone function, we have to first initialise the respective [`tf.keras.metrics.Metric`](https://www.tensorflow.org/versions/r2.9/api_docs/python/tf/keras/metrics/Metric) subclass instance (i.e., `MeanIoU`), then perform a single "state update" using the [`update_state()`](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/MeanIoU#update_state) class method. As arguments to this function, we pass in the `y_true` and `y_pred` tensors that we wish to evaluate. Optionally, we can provide a `sample_weight` scalar value or vector of rank equal to `y_true`. 

In [10]:
### Defining the number of distinct class labels (i.e., classes)
N_CLASSES = 4

In [11]:
### Initialising the `tf.keras.metrics.Metric` instance
iou_mean = tf.keras.metrics.MeanIoU(
    num_classes=N_CLASSES,
    name='Mean IoU for multi-class object segmentation data',
    dtype=tf.dtypes.float32,
    ### Additional arguments for TF2.10+ API:
    #ignore_class=None,
    #sparse_y_true=True,    # `True` if class labels are integers, `False` if floating-point
    #sparse_y_pred=True,
    #axis=-1
)

#### Testing the `MeanIoU` metric

TODO.

### 2.2. Separable Depthwise Convolution

### 2.3. SSD Feature Maps

### 2.4. Filtering Bounding Boxes

#### 2.5. Object Detection Inference

### 2.6. Timing Detection

### 2.7. Object Detection Pipeline

## 3. Closing Remarks

##### Alternatives
* TODO.
##### Extensions of task
* TODO.

## 4. Future Work

- ⬜️ TODO.

## Credits

This assignment was prepared by Kelvin Lwin, Andrew Bauman, Dominique Luna et al., 2021 (link [here](https://github.com/udacity/CarND-Object-Detection-Lab)).

References
* [] TODO.

Helpful resources:
* [`CarND-Object-Detection-Lab` by @udacity | GitHub](https://github.com/udacity/CarND-Object-Detection-Lab);