# Exercise 1.5.1 - Non-Maximum Suppression
#### By Jonathan L. Moran (jonathan.moran107@gmail.com)
From the Self-Driving Car Engineer Nanodegree programme offered by Udacity.

## Objectives

* Implement the Non-Maximum Suppression ([NMS](https://learnopencv.com/non-maximum-suppression-theory-and-implementation-in-pytorch/)) algorithm;
* Use the Intersection over Union ([IoU](https://en.wikipedia.org/wiki/Jaccard_index)) metric with a threshold value of $0.7$; 
* Apply the NMS algorithm to the provided frame from the [Waymo Open Dataset](https://waymo.com/open/);
* (Optional) Use the [Soft-NMS](https://arxiv.org/abs/1704.04503) algorithm to re-score the bounding box predictions.

## 1. Introduction

In [None]:
### Importing the required modules

In [None]:
import json
import numpy as np
import os

In [None]:
tf.__version

In [None]:
tf.test.gpu_device_name()

In [None]:
### Setting the environment variables

In [None]:
ENV_COLAB = True                # True if running in Google Colab instance

In [None]:
# Root directory
DIR_BASE = '' if not ENV_COLAB else '/content/'

In [None]:
# Subdirectory to save output files
DIR_OUT = os.path.join(DIR_BASE, 'out/')
# Subdirectory pointing to input data
DIR_SRC = os.path.join(DIR_BASE, 'data/')

In [None]:
### Creating subdirectories (if not exists)
os.makedirs(DIR_OUT, exist_ok=True)

### 1.1. Non-Maximum Suppression

#### Background
* Why is it used?
* How is it implemented?
* What are the drawbacks?

[Non-maximum suppression](https://paperswithcode.com/method/non-maximum-suppression) (NMS) [1] is a popular technique used in object detection pipelines for handling duplicate or redundant bounding box predictions. Since object detection algorithms tend to produce more than one candidate bounding boxes for a single object, NMS is used to preserve only the _best_ bounding box per object using a overlap score. 

#### History

Traditional object detection algorithms used an _exhaustive search_ method; by iteratively spanning the entire image space, any and all objects could in theory be precisely located. While this sliding window approach was successful in detecting objects at all sorts of unpredictable locations, exhaustive search was an extremely expensive algorithm to run — exploring and eliminating tens of thousands of candidate regions per image was far from efficient. Exhaustive search and other sliding window-based algorithms quickly became a relic and were replaced with more intuitive algorithms for object detection.

Modern deep learning detection algorithms use a _region-based_ "approximation" algorithm to obtain object locations. Rather than scanning the entire image space iteratively, the image is split into sub-regions and analysed more efficiently using convolutional feature maps to determine whether or not an object is present. The [region-based convolutional neural networks](https://en.wikipedia.org/wiki/Region_Based_Convolutional_Neural_Networks) family, known as R-CNNs, revolutionised object detection and helped speed up efficiency to allow for real-time detection needed in today's applications of self-driving car technology. 

With these new methods brought a new set of challenges; R-CNN architectures often produced many candidate bounding boxes for each object detected. In order to eliminate the redundant boxes and preserve only one candidate bounding box per object, a new "post-processing" step needed to be defined. In 2009, Felzenszwalb et al., created _non-maxima suppression_ (NMS), a scoring metric that combined the [Intersection over Union](https://en.wikipedia.org/wiki/Jaccard_index) (IoU) and the predicted _confidence score_ into a single metric. NMS effectively discarded all but one bounding box for each object, eliminating the redundant bounding box problem.

### 1.2. Soft-NMS

Researchers began to notice that non-maximum suppression wasn't perfect; NMS tended to disregard otherwise valid objects of interest. These objects in particular were _occluded_ (obstructed or "blocked" by another object). Such objects have bound to have confidence scores that are lower than the average, thus a fixed threshold wouldn't cut it.

Simply decreasing the confidence threshold to account for this could lead to a drop in average precision, increasing the number of _false positives_ ("other" bounding boxes belonging to the same object). Consequently, when the overlap threshold is increased, valid bounding boxes can be unintentionally suppressed (discarded), leaving some objects without a corresponding bounding box. This is especially relevant in the driving environment. As in high-density traffic conditions; two separate cars might share a high-degree of overlap. As a result, only one of the two overlapping bounding boxes would be preserved (the one with the greatest confidence threshold). In other words, both cars would be incorrectly assigned to the same bounding box.

Soft-NMS by Bodla et al., seeks to address this problem in their cleverly-titled paper "Improving Object Detection With One Line of Code" [2]. Rather than suppressing boxes with a high-degree of overlap, we can instead decrease ("decay") their classification score. Re-running those candidates through a thresholding function could have one of two effects: those candidates no longer meet the threshold requirement and are therefore dropped, or, they are above the threshold and can therefore be kept. While the separate cars scenario might benefit from this simple case, as both boxes would be preserved, the more trivial case might be that the neighbouring redundant bounding box candidates would be incorrectly kept as well. Because of this likely scenario, Bodla et al. proposed a decay function that decreased the confidence scores linearly proportional to overlap amount. Therefore, candidate boxes that had a very high degree of overlap would experience a larger decay (greater penalty) than neighbouring bounding boxes without as much overlap. To better account for false positives, the Gaussian penalty function was introduced. By applying the update rule in each iteration, boxes with the highest degree of overlap are pruned and the number of false positives reduced at each step.

## 2. Programming Task

### 2.1. Non-Maximum Suppression

You are given a json file containing a list of predictions, containing `boxes` and `scores`.

You will leverage the `calculate_iou` function to calculate the Intersection Over Union (IoU) of these different predictions and implement the NMS algorithm.

In [None]:
### From Udacity's `utils.py`

In [None]:
def calculate_iou(gt_bbox: List[int], pred_bbox: List[int]):
    """Calculates the IoU score between two bounding boxes.
    
    :param gt_bbox: the 1x4 ground truth bounding box coordinates.
    :param pred_bbox: the 1x4 predicted bounding box coordinates.
    :returns: iou, the intersection over union (IoU) score between
        the two bounding boxes.
    """
    
    xmin = np.max([gt_bbox[0], pred_bbox[0]])
    ymin = np.max([gt_bbox[1], pred_bbox[1]])
    xmax = np.min([gt_bbox[2], pred_bbox[2]])
    ymax = np.min([gt_bbox[3], pred_bbox[3]])
    
    intersection = max(0, xmax - xmin) * max(0, ymax - ymin)
    gt_area = (gt_bbox[2] - gt_bbox[0]) * (gt_bbox[3] - gt_bbox[1])
    pred_area = (pred_bbox[2] - pred_bbox[0]) * (pred_bbox[3] - pred_bbox[1])
    
    union = gt_area + pred_area - intersection
    return intersection / union

To do so, you will need to:
* compare each bounding box with all the other bounding boxes in the set
* for each pair of bounding boxes, calculate the IoU and compare the scores
* if the IoU is above the threshold, keep the box with the highest score

In [None]:
### From Udacity's `nms.py`

In [None]:
def nms(predictions: dict):
    """Performs non-maximum suppression as in Felzenszwalb et al., 2008.
    
    :param predictions: the dict instance containing the ground truth
        and predicted bounding box coordinates.
    :returns filtered: the list of thresholded bounding boxes and their
        computed non-maximum suppression scores.
    """
    filtered = []
    # IMPLEMENT THIS FUNCTION
    return filtered

You can run `python nms.py` to check your implementation.

In [None]:
### From Udacity's `nms.py`

In [None]:
with open('data/predictions_nms.json', 'r') as f:
    predictions = json.load(f)

In [None]:
filtered = nms(predictions)

In [None]:
### From Udacity's `utils.py`

In [None]:
def check_results(output):
    truth = np.load('data/nms.npy', allow_pickle=True)
    assert np.array_equal(truth, np.array(output, dtype="object")), 'The NMS implementation is wrong'
    print('The NMS implementation is correct!')

In [None]:
check_results(filtered)

### 2.2. Soft-NMS

In [None]:
def soft_nms(predictions: dict):
    """Soft-NMS algorithm as in Bodla et al., 2017.
    
    :param predictions: the dict instance containing the ground truth
        and predicted bounding box coordinates.
    :returns filtered: the list of thresholded bounding boxes and their
        computed Soft-NMS scores.
    """
    
    pass

## 3. Closing Remarks

##### Alternatives
* Use the [Soft-NMS](https://arxiv.org/abs/1704.04503) algorithm to handle occluded objects

##### Extensions of task
* Apply NMS to a object detection pipeline.

## 4. Future Work

- [ ] Compare NMS and Soft-NMS on images with occluded objects;
- [ ] Add NMS/Soft-NMS to an object detection pipeline.

## Credits

This assignment was prepared by Thomas Hossler et al., Winter 2021 (link [here](https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd0013)).


References

[1] Felzenszwalb, P. F., et al. Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence. 32(9):1627-1645. 2010. [doi:10.1109/TPAMI.2009.167](https://ieeexplore.ieee.org/document/5255236).

[2] Bodla, N. et al. Soft-NMS — Improving Object Detection With One Line of Code. arXiv. 2017. [doi:10.48550/ARXIV.1704.04503](https://arxiv.org/abs/1704.04503).



Further reading:
* Uijlings, J.R.R., et al. Selective Search for Object Recognition. International Journal of Computer Vision, 104:154–171. 2013. [doi:10.1007/s11263-013-0620-5](https://doi.org/10.1007/s11263-013-0620-5).

* Ren, S., et al., Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, vol. 28. 2015. [doi:10.48550/ARXIV.1506.01497](https://arxiv.org/abs/1506.01497).



Helpful resources:
* [Selective Search for Object Recognition by S. Smith | CS231B at Stanford University](http://vision.stanford.edu/teaching/cs231b_spring1415/slides/ssearch_schuyler.pdf)