
# Introduction

In this lab we will use YOLO to detect raccoons!

This lab is adapted from the following sources:

 * https://github.com/experiencor/keras-yolo3
 * https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/

The `keras-yolo3` repository provides utility functions to facilitate working with the YOLO model in Keras.  I basically took that code and deleted important parts of it for you to fill in :)

# Package imports

In [0]:
%tensorflow_version 1.x

import os
from google.colab import drive

import numpy as np
from keras.layers import Conv2D, Input, BatchNormalization, LeakyReLU, ZeroPadding2D, UpSampling2D
from keras.layers.merge import add, concatenate
from keras.models import Model
from keras.models import load_model
import struct
import cv2

# Set up google drive

Since the YOLO model takes a long time to fit, we will use pre-estimated model weights and see how the process of using the model to generate predictions works.

As usual, I have shared the necessary files with you in a google drive folder.  To get the data into colab, do these steps:

1. Sign into drive.google.com
2. Click on "Shared with me" on the left side of the screen
3. Right click on the stat344ne_yolo folder and select "Add Shortcut to Drive"
4. Run the code cell below and click on the link that is displayed.  It will pop up a new browser tab where you have to authorize Colab to access your google drive.  Then, copy the sequence of numbers and letters that is displayed and paste it in the space that shows up in the code cell below.


In [3]:
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
os.mkdir("/content/stat344ne_yolo/")
#!unzip -uq "/content/drive/My Drive/stat344ne_yolo/cats_and_dogs_small.zip" -d "/content/stat344ne_yolo/"

# Load saved YOLO version 3 model fit

This is saved in the file `yolov3_keras.h5` in the google drive folder.  The model weights were downloaded from the official YOLO website and loaded into a Keras model using code provided in the https://github.com/experiencor/keras-yolo3 repository.

In [0]:
yolov3 = load_model('/content/drive/My Drive/stat344ne_yolo/yolov3_keras.h5')

# Utility functions

### Utility functions to generate predictions -- run as is

The following code is from https://github.com/experiencor/keras-yolo3.  I'm not asking you to make any changes to the code in the next cell, you can run it as is.  You might just read the function documentation and see what each function does.

In [0]:
class BoundBox:
    '''
    A class to represent bounding boxes.  The boxes are represented by:
     - (xmin, ymin): the coordinates of the top left corner
     - (xmax, ymax): the coordinates of the lower right corner
     - objness: optionally an estimated probability that the box contains an
       object
     - classes: optionally an estimated probability of class
    '''
    def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):
        self.xmin = xmin
        self.ymin = ymin
        self.xmax = xmax
        self.ymax = ymax
        
        self.objness = objness
        self.classes = classes

        self.label = -1
        self.score = -1

    def get_label(self):
        if self.label == -1:
            self.label = np.argmax(self.classes)
        
        return self.label
    
    def get_score(self):
        if self.score == -1:
            self.score = self.classes[self.get_label()]
            
        return self.score


def _sigmoid(x):
    return 1. / (1. + np.exp(-x))


def preprocess_input(image, net_h, net_w):
    '''
    resize an input image to the dimensions required by the input layer for the
    yolo model

    Arguments:
     - image: a numpy array with an input image
     - net_h: input height required for neural network
     - net_w: input width required for neural network
    '''
    new_h, new_w, _ = image.shape

    # determine the new size of the image
    if (float(net_w)/new_w) < (float(net_h)/new_h):
        new_h = (new_h * net_w)/new_w
        new_w = net_w
    else:
        new_w = (new_w * net_h)/new_h
        new_h = net_h

    # resize the image to the new size
    resized = cv2.resize(image[:,:,::-1]/255., (int(new_w), int(new_h)))

    # embed the image into the standard letter box
    new_image = np.ones((net_h, net_w, 3)) * 0.5
    new_image[int((net_h-new_h)//2):int((net_h+new_h)//2), int((net_w-new_w)//2):int((net_w+new_w)//2), :] = resized
    new_image = np.expand_dims(new_image, 0)

    return new_image



def correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w):
    '''
    rescale yolo output boxes from image size used by network to original input
    image size

    Arguments:
     - boxes: list of boxes output by yolo model
     - image_h: height of input image
     - image_w: width of input image
     - net_h: input height required for neural network
     - net_w: input width required for neural network
    
    No return; input boxes list is modified to contain adjusted boxes
    '''
    if (float(net_w)/image_w) < (float(net_h)/image_h):
        new_w = net_w
        new_h = (image_h*net_w)/image_w
    else:
        new_h = net_w
        new_w = (image_w*net_h)/image_h
        
    for i in range(len(boxes)):
        x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_w
        y_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_h
        
        boxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)
        boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)
        boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)
        boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)


def draw_boxes(image, boxes, labels, obj_thresh):
    '''
    Draw boxes output by yolo on input image

    Arguments:
     - image: a numpy array with an input image
     - boxes: list of boxes output by yolo model
     - labels: object type label associated with each box
     - obj_thresh: minimum threshold for object class probability to include box
    
    Return:
     - updated numpy array for image augmented with boxes and class labels
    '''
    for box in boxes:
        label_str = ''
        label = -1
        
        for i in range(len(labels)):
            if box.classes[i] > obj_thresh:
                label_str += labels[i]
                label = i
                print(labels[i] + ': ' + str(box.classes[i]*100) + '%')
                
        if label >= 0:
            cv2.rectangle(image, (box.xmin,box.ymin), (box.xmax,box.ymax), (0,255,0), 3)
            cv2.putText(image, 
                        label_str + ' ' + str(box.get_score()), 
                        (box.xmin, box.ymin - 13), 
                        cv2.FONT_HERSHEY_SIMPLEX, 
                        1e-3 * image.shape[0], 
                        (0,255,0), 2)
        
    return image

# Generate Predictions

The following code reads in the picture of Dino on the couch and generates predictions.  You can run this code as is:

In [0]:
image_path = "/content/drive/My Drive/stat344ne_yolo/Dino.png"

# set some configuration parameters
# input dimensions for network
net_h, net_w = 416, 416

# specifications of anchor box widths and heights; more on this next
anchors = [[116,90,  156,198,  373,326],  [30,61, 62,45,  59,119], [10,13,  16,30,  33,23]]

# Specifications of labels for MSCOCO data set.
# Not sure why they chose these classes?
# Note that our version of YOLO will only be able to classify into these 80
# classes.
labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", \
          "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", \
          "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", \
          "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", \
          "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", \
          "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", \
          "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", \
          "chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse", \
          "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", \
          "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]

# Read in the image and preprocess using utility functions defined above
# This resizes the image to expected network input dimensions
image = cv2.imread(image_path)
image_h, image_w, _ = image.shape
new_image = preprocess_input(image, net_h, net_w)

# Generate the prediction
yolos = yolov3.predict(new_image)

You may recall I mentioned briefly that the YOLO model makes predictions at multiple scales.  The model predictions contain one set of predictions based on a $13 \times 13$ grid of cells, a second set of predictions based on a $26 \times 26$ grid of cells, and a third based on a $52 \times 52$ grid of cells.  This allows the method to detect objects of different sizes in the image.  Each of these has its own set of 3 anchor boxes with different width and height.  This is described briefly in Section 2.3 of the paper about version 3 of YOLO a https://pjreddie.com/media/files/papers/YOLOv3.pdf.

The code below prints out the shapes of each of these separate arrays of anchor box predictions:

In [26]:
print("first shape  = " + str(yolos[0].shape))
print("second shape = " + str(yolos[1].shape))
print("third shape  = " + str(yolos[2].shape))

first shape  = (1, 13, 13, 255)
second shape = (1, 26, 26, 255)
third shape  = (1, 52, 52, 255)


#### 1) Why is there a leading 1 on the shapes of each of the arrays above?

We made a prediction for a single image.  In Keras, the first element of the shape is the number of observations.

#### 2) Why is the last dimension of the array 255?

The code below is a hint:

In [28]:
print("255/3 = " + str(255/3))
print("len(labels) = " + str(len(labels)))

255/3 = 85.0
len(labels) = 80


The length of the last dimension is 3*(1 + 4 + 80) because there are 3 anchor boxes, 1 entry per anchor box for the probability there is an object associated with that anchor box, 4 entries per anchor box for the location of the bounding box, and 80 entries per anchor box for class probabilities.

# Utility functions to decode network outputs -- modifications required

The following function takes output from network and turns it into predictions of bounding box location, probability there is an object, and object class probabilities.  Note that the output from the network above has not yet had any transformations or activation functions applied to it.  You will need to apply sigmoid activations to all entries for classification and the appropriate transformations to the other entries.

In [0]:
def decode_netout(netout, anchors, obj_thresh, net_h, net_w):
    '''
    Decode output from the yolo network for a single image.

    Arguments:
     - netout: numpy array of shape (grid_h, grid_w, 255) with predictions for
       one image
     - anchors: list of length 6 with width and heigh of each of 3 anchor boxes
       in order [box1_w, box1_h, box2_w, box2_h, box3_w, box3_h]
     - obj_thresh: minimum probability threshold for an object to keep it in the
       list of boxes
     - net_h, net_w: width and height of input to yolo network
    
    Returns:
     - a list of bounding boxes for objects in the image.
    '''
    # extract number of grid cells
    grid_h, grid_w = netout.shape[:2]

    # number of anchor boxes
    nb_box = 3

    # reshape output.  After this, the network output array has shape
    # (grid_h, grid_w, 3, 85).  This separates the predicted values for each
    # anchor box to make indexing for each anchor box easier.
    netout = netout.reshape((grid_h, grid_w, nb_box, -1))

    # determing the number of classes.
    # (This is a very involved way of calculating 80 = 85 - 5)
    nb_class = netout.shape[-1] - 5

    # Initialize an empty list of boxes.  This will be populated with boxes.
    boxes = []

    # Apply sigmoid transformation to the predictions of offset for the center
    # coordinates of the object.  Notes:
    #  * These are in positions 0 and 1 of the last dimension of the netout array
    #  * You can simultaneously apply the sigmoid across all grid cells and
    #    anchor boxes with a single function call.
    #  * Note netout[:, :, :, :2] or netout[..., :2] accesses all entries in the
    #    first three dimensions of the array and up to but not including entry 2
    #    in the last dimension
    #  * In this notebook, the function is defined as _sigmoid
    # 
    # Replace None with appropriate indexing and a call to _sigmoid
    netout[..., :2] = _sigmoid(netout[..., :2])
    
    # Apply exponential transformation to the predictions of multipliers for box
    # width and height, in positions 2 and 3 of the last dimension of the netout
    # array.  Use np.exp() and indexing similar to above.
    netout[..., 2:4] = np.exp(netout[..., 2:4])

    # Apply sigmoid activation to the prediction of the probability that this
    # cell is associated with an object, in position 4 of the last dimension of
    # the netout array.
    netout[..., 4] = _sigmoid(netout[..., 4])

    # Apply sigmoid activation to the predictions of the probability that this
    # cell has an object of each possible class, given that it has an object.
    # These are in positions 5 and later in the netout array.
    netout[..., 5:] = _sigmoid(netout[..., 5:])

    # For each possible object type, compute the probability that each
    # combination of cell and anchor box contains an object of that type,
    # P(object and specified type) = P(object) * P(object of specified type | object)
    netout[..., 5:] = netout[..., 4:5] * netout[..., 5:]

    # Thresholding on object probabilities.  Our goal is to only keep objects
    # that have at least probability obj_thresh of being an object of a
    # specified type.  We'll do this in two steps:
    #  1) compute mask, which is 1 if P(object and specified type) > obj_thresh
    #     and 0 otherwise
    #  2) update the class probabilities to be their current values * the mask
    #     the result is 0 if the mask is 0, and no change to the class
    #     probabilities if the mask is 1.
    # Effectively, this keeps class probabilities unchanged if they are larger
    # than the obj_thresh, and sets them to 0 (to ignore) if they are less than
    # or equal to the obj_thresh
    mask = netout[..., 5:] > obj_thresh
    netout[..., 5:] *= mask

    # We now iterate through all grid cells and anchor boxes, and extract the
    # information for each of them
    for row in range(grid_h):
        for col in range(grid_w):
            for b in range(nb_box):
                # first 4 elements are b_x, b_y, b_w, and b_h
                b_x, b_y, b_w, b_h = netout[row, col, b, :4]

                # element in position 4 is objectness score
                objectness = netout[row, col, b, 4]
                if(objectness == 0.0): continue
                
                # last elements are class probabilities
                classes = netout[row, col, b, 5:]

                # anchor width and height
                anchor_w = anchors[2 * b + 0]
                anchor_h = anchors[2 * b + 1]

                # Compute width and height of bounding box based on anchor_w,
                # anchor_h, b_w, and b_h
                bbox_w = anchor_w * b_w
                bbox_h = anchor_h * b_h

                # Compute center of bounding box as proportion of image width
                # and height
                bbox_x = (col + b_x) / grid_w
                bbox_y = (row + b_y) / grid_h

                # Rescale width and height of bounding box to proportion of
                # image width and height
                bbox_w = bbox_w / net_w
                bbox_h = bbox_h / net_h
                
                # Create box object with relative coordinates of upper left and
                # lower right bounding box coordinates, P(object), and
                # P(object and class) values
                bbox = BoundBox(
                    bbox_x-bbox_w/2, bbox_y-bbox_h/2,
                    bbox_x+bbox_w/2, bbox_y+bbox_h/2,
                    objectness,
                    classes)

                boxes.append(bbox)

    return boxes

The code below runs the image processing 3 times with different values of the object probability threshold `obj_thresh`.  It then saves the resulting files in google drive with your name at the beginning.  Take a look at the files in google drive, and answer the question below.

In [51]:
# Please enter your name here.  For example I might put my_name = 'evan'
my_name = 'evan'

obj_thresh_vals = [0.01, 0.5, 0.9]

for obj_thresh in obj_thresh_vals:
    print("\n obj_thresh = " + str(obj_thresh))
    # Regenerate the prediction (we already did this above -- we need to do it
    # again here since the decode_netout function modifies the prediction array
    image = cv2.imread(image_path)
    image_h, image_w, _ = image.shape
    new_image = preprocess_input(image, net_h, net_w)
    yolos = yolov3.predict(new_image)

    boxes = []
    for i in range(len(yolos)):
        # decode the output of the network
        boxes += decode_netout(yolos[i][0], anchors[i], obj_thresh, net_h, net_w)

    # correct the sizes of the bounding boxes
    correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w)

    # draw bounding boxes on the image using labels
    draw_boxes(image, boxes, labels, obj_thresh) 

    # write the image with bounding boxes to file
    file_name = image_path[:-4] + '_' + my_name + '_' + str(obj_thresh) + '_detected' + image_path[-4:]
    cv2.imwrite(file_name, (image).astype('uint8'))


 obj_thresh = 0.01
sink: 1.3248303905129433%
sofa: 6.179864704608917%
cat: 1.893511414527893%
dog: 87.05536127090454%
teddy bear: 1.8391726538538933%
chair: 1.3590497896075249%
sofa: 86.49919033050537%
dog: 2.234449051320553%
chair: 3.7241321057081223%
sofa: 82.71430730819702%
sofa: 2.158830128610134%
chair: 1.0314123705029488%
sofa: 98.93141984939575%
sofa: 92.07996129989624%
chair: 1.2433595024049282%
sofa: 76.88549757003784%
sofa: 2.9111532494425774%
sofa: 98.55548739433289%
sofa: 83.05723667144775%
sofa: 9.404147416353226%
tvmonitor: 1.2800831347703934%
clock: 9.829282760620117%
dog: 4.911653324961662%
teddy bear: 3.031836450099945%
dog: 4.655338451266289%
teddy bear: 16.001227498054504%

 obj_thresh = 0.5
dog: 87.05536127090454%
sofa: 86.49919033050537%
sofa: 82.71430730819702%
sofa: 98.93141984939575%
sofa: 92.07996129989624%
sofa: 76.88549757003784%
sofa: 98.55548739433289%
sofa: 83.05723667144775%

 obj_thresh = 0.9
sofa: 98.93141984939575%
sofa: 92.07996129989624%
sofa: 98.55

#### 3) Take a look at the images that were created and saved in the google drive with boxes on them.  Comment in a sentence or two on how the object detection threshold affects the output.

When a smaller object detection threshold is used, more boxes are kept.  Some of them have labels that are clearly incorrect, such as 'clock' for the lamp.  But it did find a lamp.

# Intersection over Union and non-max suppression

To eliminate duplicate boxes for the same detected object, we must implement non-max suppression based on an evaluation of the intersection over union metric.

In [0]:
def _interval_overlap(interval_a, interval_b):
    '''
    Calculate the length of the overlap between two intervals a and b

    Arguments:
     - interval_a: a list of floats [x1, x2] with x1 < x2
     - interval_b: a list of floats [x3, x4] with x3 < x4
    
    Returns:
     - The length of the overlap between the intervals.  Take cases:
        * If the left and right endpoints of interval_b are both less than x1,
          overlap is 0
        * If the left endpoint of interval_b is less than x1 but the right
          endpoint of interval_b is greater than x1, there is some overlap:
          the smaller of the right endpoints minus x1
        * The same logic also applies with the intervals in the other order
    '''
    x1, x2 = interval_a
    x3, x4 = interval_b

    if x3 < x1:
        if x4 < x1:
            return 0
        else:
            return min(x2,x4) - x1
    else:
        if x2 < x3:
             return 0
        else:
            return min(x2,x4) - x3          


def bbox_iou(box1, box2):
    '''
    Calculate the intersection over union for two boxes

    Arguments:
     - box1, box2: objects of class BoundBox
    
    Return:
     - The intersection over union of box1 and box2
    '''
    # Calculate the intersection area.  Do this in three steps:
    #  1. Find the interval overlap of the boxes along the horizontal axis
    #     using the _interval_overlap function defined above
    #  2. Find the interval overlap of the boxes along the vertical axis
    #     using the _interval_overlap function defined above
    #  3. Find the product of the interval overlap along the horizontal and
    #     vertical axes.
    
    # assemble lists of box coordinates suitable for use as arguments to
    # _interval_overlap
    box1_horiz_coords = [box1.xmin, box1.xmax]
    box1_vert_coords = [box1.ymin, box1.ymax]
    box2_horiz_coords = [box2.xmin, box2.xmax]
    box2_vert_coords = [box2.ymin, box2.ymax]
    
    # call _interval_overlap to find the horizontal overlap of the boxes
    intersect_horiz = _interval_overlap(box1_horiz_coords, box2_horiz_coords)

    # call _interval_overlap to find the vertical overlap of the boxes
    intersect_vert = _interval_overlap(box1_vert_coords, box2_vert_coords)
    
    # find the area of the intersection
    intersect = intersect_horiz * intersect_vert

    # assemble box widths and heights
    w1 = box1.xmax-box1.xmin
    h1 = box1.ymax-box1.ymin
    w2 = box2.xmax-box2.xmin
    h2 = box2.ymax-box2.ymin

    # find the area of the union.  Recall that
    # |A union B| = |A| + |B| - |A intersection B|
    union = w1*h1 + w2*h2 - intersect
    
    # find the intersection over union
    iou = intersect / union

    return iou



def do_nms(boxes, nms_thresh):
    '''
    Do non-max suppression

    Arguments:
     - boxes: list of boxes identified by predict method
       Note that boxes[i].classes[c] is the predicted probability of class c for
       box number i
     - nms_thresh: probability threshold for non-max suppression
    
    Return:
     - reduced set of boxes after non-max suppression
    '''
    # If any boxes were found, extract the number of classes (80)
    # otherwise, return because there's nothing to do.
    if len(boxes) > 0:
        nb_class = len(boxes[0].classes)
    else:
        return
    
    # For each class, do non-max suppression.
    # Recall the overall procedure for one class:
    # 1. Put the boxes in order from highest probability to lowest for the class
    #    currenty under consideration
    # 2. Repeat until there are no boxes remaining:
    #     a. Choose the box i with highest probability for the class currently
    #        under consideration
    #     b. For each remaining box j, calculate the IOU between boxes i and j.
    #        If the IOU is > nsm_thresh, eliminate box j from consideration for
    #        this class (set its predicted probability for this class to 0)

    # loop over classes being predicted (loop runs from c = 0 to c = 79)
    for c in range(nb_class):
        # First, we need to determine how the boxes should be ordered from
        # highest probability for class c to lowest.  Sort functions typically
        # sort in increasing order, though.  A trick to get around this is to
        # sort in increasing order of negative class probability:
        # if p1 > p2, then -p1 < -p2, so when sorted p1 will be placed first

        # Use a list comprehension to create a list containing -1 * the
        # probability of class c for each box in boxes.
        # You will need to access box.classes[c] if box is one of the boxes
        neg_box_probs = [-box.classes[c] for box in boxes]

        # the numpy argsort creates a vector of indices that would sort its
        # argument.  For instance, after the call below,
        # neg_box_probs[sorted_indices[0]] < neg_box_probs[sorted_indices[1]]
        # You may be interested in checking out the documentation at
        # https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html
        sorted_indices = np.argsort(neg_box_probs)

        # Loop over the sorted indices
        for i in range(len(sorted_indices)):
            # index_i is the index of the box that assigned i'th highest
            # probability to class c
            index_i = sorted_indices[i]

            # if this box assigns probability 0 to class c, skip it
            if boxes[index_i].classes[c] == 0: continue

            # otherwise, loop through all boxes that initially assigned lower
            # probability to class c than box i
            for j in range(i+1, len(sorted_indices)):
                # get index for box j
                index_j = sorted_indices[j]

                # calculate the IOU for boxes at indices index_i and index_j
                # using the bbox_iou function defined above
                iou = bbox_iou(boxes[index_i], boxes[index_j])

                # if the IOU is >= nms_thresh, set the probability of class c
                # for the box at index_j to 0
                if iou >= nms_thresh:
                    boxes[index_j].classes[c] = 0

Now, run the code below.  This is the same code as you ran above, but with one extra line to do non-max suppression with `nms_thresh = 0.45`.

In [57]:
# Regenerate the prediction
image = cv2.imread(image_path)
image_h, image_w, _ = image.shape
new_image = preprocess_input(image, net_h, net_w)
yolos = yolov3.predict(new_image)

obj_thresh = 0.5
nms_thresh = 0.45

boxes = []
for i in range(len(yolos)):
    # decode the output of the network
    boxes += decode_netout(yolos[i][0], anchors[i], obj_thresh, net_h, net_w)

# correct the sizes of the bounding boxes
correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w)

# suppress non-maximal boxes
do_nms(boxes, nms_thresh)     

# draw bounding boxes on the image using labels
draw_boxes(image, boxes, labels, obj_thresh) 

# write the image with bounding boxes to file
cv2.imwrite(image_path[:-4] + '_' + my_name + '_detected_final' + image_path[-4:], (image).astype('uint8'))

dog: 87.05536127090454%
sofa: 98.93141984939575%


True

#### 4) Take a look at your output image.  Comment in a sentence or two on how non-max suppression affects the output of the method, and how it differs from the object detection threshold.

We now have only one detected box for each object.  Non-max suppression eliminates duplicate boxes for the same object type, while the object detection threshold eliminates all boxes for low-probability objects.

You might want to run the method on another photo, like the one of Benedict in the google drive folder :)

In [59]:
image_path = "/content/drive/My Drive/stat344ne_yolo/benedict.jpg"
image = cv2.imread(image_path)
image_h, image_w, _ = image.shape
new_image = preprocess_input(image, net_h, net_w)
yolos = yolov3.predict(new_image)

obj_thresh = 0.5
nms_thresh = 0.45

boxes = []
for i in range(len(yolos)):
    # decode the output of the network
    boxes += decode_netout(yolos[i][0], anchors[i], obj_thresh, net_h, net_w)

# correct the sizes of the bounding boxes
correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w)

# suppress non-maximal boxes
do_nms(boxes, nms_thresh)     

# draw bounding boxes on the image using labels
draw_boxes(image, boxes, labels, obj_thresh) 

# write the image with bounding boxes to file
cv2.imwrite(image_path[:-4] + '_' + my_name + '_detected_final' + image_path[-4:], (image).astype('uint8'))

cat: 99.58126544952393%
sofa: 93.96252036094666%


True