# Autonomous Driving - Car's Detection System

We will build a car's detection system using a pre-trained YOLO model.

<a name='0'></a>
## Packages

In [None]:
import argparse
import os
import scipy.io
import scipy.misc

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow

import PIL
from PIL import ImageFont, ImageDraw, Image

import tensorflow as tf
from tensorflow.keras.models import load_model

from yad2k.models.keras_yolo import yolo_head
from yad2k.utils.utils import draw_boxes, get_colors_for_classes, scale_boxes, read_classes, read_anchors, preprocess_image

%matplotlib inline

<a name='1'></a>
## 1 - Model Details

<a name='1-1'></a>
### 1.1 - Dataset
- The model was trained using road pictures collected by a camera mounted to the hood of a car.
- All the images were labelled with bounding boxes around every object found. Here's an example:

<center> <img src="images/box_label.png" width="45%" height="45%"> </center>
<caption> <center> <b>Figure 1</b>: Definition of a box </center></caption>
<br>  
 
<a name='1-2'></a>
### 1.2 - Inputs and Outputs
- It was used images with shape of (608, 608, 3), for training the model.

- The output is a list of bounding boxes along with the recognized classes. Each bounding box is represented by 6 numbers $(p_c, b_x, b_y, b_h, b_w, c)$, as showed in **figure 1**. Since the model is capable of recognizing 80 diferent classes, $c$ is an 80-dimensional vector. So each bounding box is represented by 85 numbers. 

<a name='1-3'></a>
### 1.3 - Anchor boxes
- Anchor boxes were chosen by exploring the training data to select reasonable height/width ratios that represent the different classes. 5 anchor boxes were chosen, and stored in the file './model_data/yolo_anchors.txt'.
    
- If the center/midpoint of an object falls into a grid cell, that grid cell is responsible for detecting that object.
    
<center> <img src="images/architecture.png" width="60%" height="60%"> <center>
<caption> <center> <b> Figure 2 </b>: Encoding architecture for YOLO </center> </caption>

- For each box (of each cell) the probability of contain a certain class ($score_{c,i}$) is the probability that there is an object ($p_{c}$) times the probability that the object is a certain class ($c_{i}$).

<center> <img src="images/probability_extraction.png" width="50%" height="50%"> </center>
<caption> <center> <b> Figure 3 </b>: Finding the class detected by each box </center></caption>

<a name='2'></a>
## 2 - Building the System

<a name='2-1'></a>
### 2.1 - Non-max Suppression
Even though for one object, the recognition is a task for just one anchor box, it's possible that more than one is identifying the same object in the figure. To reduce the model's output to a much smaller number of detected objects, we'll use the non-max suppression method. The steps are:

- Get rid of boxes with a low score by apllying a threshold.
- Select only one box from overlaping boxes detecting the same object. This step is applied using a function called Intersection over Union, or IoU.

<center> <img src="images/iou.png" width="50%" height="50%"> </center>
<caption> <center> <b> Figure 4 </b>: Definition of "Intersection over Union".</center> </caption>

Creating a funtion to filter the bounding boxes based on its calculated scores. 

In [None]:
def yolo_filter_boxes(boxes, box_confidence, box_class_probs, threshold=0.6):
    """
    Filters YOLO boxes by thresholding on object and class confidence.
    
    Arguments:
        boxes -- tensor of shape (19, 19, 5, 4), boxes corners coordinates
        box_confidence -- tensor of shape (19, 19, 5, 1), propability of containing an object for each box
        box_class_probs -- tensor of shape (19, 19, 5, 80), classes probabilities for each box
        threshold -- real value, score threshold

    Returns:
        scores -- tensor of shape (None,), containing the class probability score for selected boxes
        boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
        classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes

    Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold. 
    """

    # Calculating the score for each box
    box_scores = box_confidence*box_class_probs
    
    # Selecting the most probable class for each box  
    box_classes = tf.math.argmax(box_scores, axis=-1)
    
    # Selecting the biggest score for each box 
    box_class_scores = tf.math.reduce_max(box_scores, axis=-1)
    
    # Creating mask for filtering 
    filtering_mask = box_class_scores >= threshold
    
    scores = tf.boolean_mask(box_class_scores, filtering_mask) # filtering scores
    boxes = tf.boolean_mask(boxes, filtering_mask) # filtering boxes
    classes = tf.boolean_mask(box_classes, filtering_mask) # filtering classes
    
    return scores, boxes, classes

In [None]:
def iou(box1, box2):
    """
    Implement the intersection over union (IoU) between box1 and box2
    
    Arguments:
    box1 -- first box, list object with coordinates (box1_x1, box1_y1, box1_x2, box_1_y2)
    box2 -- second box, list object with coordinates (box2_x1, box2_y1, box2_x2, box2_y2)
    """

    (box1_x1, box1_y1, box1_x2, box1_y2) = box1
    (box2_x1, box2_y1, box2_x2, box2_y2) = box2

    # Calculating union area 
    # Union(A,B) = A + B - Inter(A,B)
    
    # Inter(A,B)
    xi1 = max(box1_x1, box2_x1)
    yi1 = max(box1_y1, box2_y1)
    xi2 = min(box1_x2, box2_x2)
    yi2 = min(box1_y2, box2_y2)
    
    inter_width = max((xi2 - xi1),0)
    inter_height =  max((yi2 - yi1),0)
    
    inter_area = inter_width*inter_height
    
    #  A e B
    box1_area = (box1_x2 - box1_x1)*(box1_y2 - box1_y1)
    box2_area = (box2_x2 - box2_x1)*(box2_y2 - box2_y1)
    
    # Union(A,B)
    union_area = box1_area + box2_area - inter_area
    
    # Calculating the IoU
    iou = inter_area/union_area
    
    return iou

TensorFlow has a built-in function that is used to implement non-max suppression.

In [None]:
def non_max_suppression(boxes, scores, classes, max_boxes=10, iou_threshold=0.5):
    """
    Applies non-max suppression to set of boxes
    
    Arguments:
    boxes --  tensor of shape (num_boxes, 4), boxes corners coordinates 
    scores -- tensor of shape (num_boxes), boxes scores
    classes -- tensor of shape (num_boxes), boxes classes
    max_boxes -- integer, maximum number of predicted boxes you'd like
    iou_threshold -- real value, intersection over union threshold
    
    
    Returns:
    scores -- tensor of shape (None, ), predicted score for each box
    boxes -- tensor of shape (None, 4), predicted box coordinates
    classes -- tensor of shape (None, ), predicted class for each box
    
    Note: The "None" dimension of the output tensors has obviously to be less than max_boxes.
    """
    
    max_boxes_tensor = tf.Variable(max_boxes, dtype='int32')
    
    nms_indices = tf.image.non_max_suppression(boxes, 
                                               scores, 
                                               max_boxes_tensor, 
                                               iou_threshold)
    
    scores = tf.gather(scores, nms_indices)
    boxes = tf.gather(boxes, nms_indices)
    classes = tf.gather(classes, nms_indices)
    
    return scores, boxes, classes

<a name='2-2'></a>
### 2.2 - YOLO Evaluation  
We'll treat the model's output and pass to non-max supression function to get the final classification and localization for each object in the figure.

In [None]:
def yolo_boxes_to_corners(box_xy, box_wh):
    """Convert YOLO box predictions to bounding box corners."""
    
    box_mins = box_xy - (box_wh / 2.)
    box_maxes = box_xy + (box_wh / 2.)
    
    box_mins[...,1]
    box_mins[...,0]
    box_maxes[...,1]
    box_maxes[...,0]
    
    boxes = tf.concat([box_mins[...,1:2],    # y1
                       box_mins[...,0:1],    # x1
                       box_maxes[...,1:2],   # y2
                       box_maxes[...,0:1]],  # x2
                      axis=-1)

    return boxes

In [None]:
def yolo_eval(yolo_outputs, image_shape, max_boxes=10, score_threshold=0.6, iou_threshold=0.5):
    """
    Converts the output of YOLO encoding to your predicted boxes along with their scores, box coordinates and classes.
    
    Arguments:
    yolo_outputs -- output of the encoding model, contains 4 tensors:
                    box_xy: tensor of shape (None, 19, 19, 5, 2), center cordinates (x,y) of each box
                    box_wh: tensor of shape (None, 19, 19, 5, 2), width and height (w,h) of each box
                    box_confidence: tensor of shape (None, 19, 19, 5, 1), propability of containing an object for each box
                    box_class_probs: tensor of shape (None, 19, 19, 5, 80), classes probabilities for each box
    image_shape -- tensor of shape (2,) containing the input shape (must be float32)
    max_boxes -- integer, maximum number of predicted boxes you'd like
    score_threshold -- real value, score threshold
    iou_threshold -- real value, intersection over union threshold
    
    Returns:
    scores -- tensor of shape (None, ), predicted score for each box
    boxes -- tensor of shape (None, 4), predicted box coordinates
    classes -- tensor of shape (None,), predicted class for each box
    """
    
    box_xy, box_wh, box_confidence, box_class_probs = yolo_outputs
    
    # Converting boxes box_xy and box_wh to corner coordinates
    boxes = yolo_boxes_to_corners(box_xy, box_wh)
    
    # Filtering boxes based on score
    scores, boxes, classes = yolo_filter_boxes(boxes, box_confidence, box_class_probs, score_threshold)
    
    # Scaling boxes back to original image shape
    boxes = scale_boxes(boxes, image_shape)
    
    # Applying non-max supression
    scores, boxes, classes = non_max_suppression(boxes, 
                                                 scores, 
                                                 classes, 
                                                 max_boxes, 
                                                 iou_threshold)
    
    return scores, boxes, classes

<a name='2-3'></a>
### 2.3 - Defining Classes, Anchors and Image Shape

- The information on the 80 classes and 5 boxes is gathered in two files: "coco_classes.txt" and "yolo_anchors.txt".
- The yolo model was trained with input images with size of 608 x 608.

In [None]:
class_names = read_classes("model_data/coco_classes.txt")
anchors = read_anchors("model_data/yolo_anchors.txt")
model_image_size = (608, 608)

<a name='3-2'></a>
### 3.2 - Loading the Pre-trained Model

We'll load an existing pre-trained YOLO model developed using the **yad2k** implementation from Allan Zelener github repository.

In [None]:
yolo_model = load_model("model_data/", compile=False)

In [None]:
yolo_model.summary()

<a name='3-3'></a>
### 3.3 - Output Treatment

The output of `yolo_model` is a (m, 19, 19, 5, 85) tensor. We'll use the function `yolo_head` from **yad2k** to format the encoding of the model into 4 tensors, so we can use as input to our `yolo_eval` function.

<a name='3-4'></a>
### 3.4 - Model Prediction

We'll implement the `predict` function, which runs the graph to test YOLO on an image to compute `out_scores`, `out_boxes`, `out_classes`.

The code below also uses the following function from **yad2k**:

    image, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))
    
which opens the image file and scales, reshapes and normalizes the image. It returns the outputs:
- image: a python (PIL) representation of your image used for drawing boxes.
- image_data: a numpy-array representing the image. This will be the input to the CNN.

In [None]:
def predict(image_file):
    """
    Runs the graph to predict boxes for "image_file". Prints and plots the predictions.
    
    Arguments:
    image_file -- name of an image stored in the "images" folder.
    
    Returns:
    out_scores -- tensor of shape (None, ), scores of the predicted boxes
    out_boxes -- tensor of shape (None, 4), coordinates of the predicted boxes
    out_classes -- tensor of shape (None, ), class index of the predicted boxes
    
    Note: "None" actually represents the number of predicted boxes, it varies between 0 and max_boxes. 
    """

    # Preprocessing the image
    image, image_data = preprocess_image("samples/" + image_file, model_image_size = (608, 608))
    
    # Predicting the outputs
    yolo_model_outputs = yolo_model(image_data)
    
    # Treating the models outputs
    yolo_outputs = yolo_head(yolo_model_outputs, anchors, len(class_names))
    
    # Applying non-max supression
    out_scores, out_boxes, out_classes = yolo_eval(yolo_outputs, [image.size[1], image.size[0]], 10, 0.3, 0.5)

    # Printing predictions info
    print('Found {} boxes for {}'.format(len(out_boxes), "samples/" + image_file))
    
    # Generating colors for drawing bounding boxes
    colors = get_colors_for_classes(len(class_names))
    
    # Drawing bounding boxes on the image file
    draw_boxes(image, out_boxes, out_classes, class_names, out_scores)
    
    # Saving the predicted bounding box on the image
    image.save(os.path.join("outputs", image_file), quality=100)
    
    # Displaying the results in the notebook
    output_image = Image.open(os.path.join("outputs", image_file))
    imshow(output_image)

    return out_scores, out_boxes, out_classes