# Welcome to the YOLO Workshop!

In this workshop, we will learn how to use the YOLO algorithm for object detection

## System Setup

Since we will work with external files, you will need to **link this Google Colab Notebook file to your Google Drive Account**.

This is done with the following piece of code.
If you're using Jupyter Notebook on your computer, you don't need to do this

In [None]:
import os
from google.colab import drive
drive.mount('/content/drive', force_remount=False)   #link to google drive

os.chdir("/content/drive/My Drive/perception projects/object tracking")  #change directory to directory with needed files for YOLO tracking
!ls      #print out files in directory

Mounted at /content/drive
images	movie.mp4  yolo_Starter.ipynb  yolov3


In [None]:
# ! wget https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg
# ! wget https://github.com/Jeremy26/tracking_course/blob/master/Detection/yolo_workshop/coco.names

**ABOUT GPU** <p>

**YOLO** is an object detection algorithm.
Like most of them, it works better on GPU.
GPU will allow parallel computing (instead of sequential). You will not have a vector of operation, but a matrix of operations.<p>
![CPUvsGPU…](https://www.nvidia.fr/docs/IO/144175/cpu-and-gpu.jpg)

Instead of 1 frame per second, you can run at 50 or 60 frames per second.

The version we'll use is developed by OpenCV.
OpenCV has a DNN (Deep Neural Networks) module that includes popular obstacle detection algorithms we saw in course.
We'll use something similar as [this post](https://www.pyimagesearch.com/2017/08/21/deep-learning-with-opencv/)


*   If you want to work on CPU, skip this section

*   If you want to use GPU, you will need OpenCV > 4.2.0 and compatible CUDA/CUDNN

As of today (March 2020), the preinstalled version of OpenCV and CUDA in Google Colab are unsufficient. We work on very recent libraries.<p>

**I will not cover installation of CUDA/CUDNN in this post. We'll use super-fast CPU.
If you want to use GPU, I recommend you exit Colab and try to run on your own machine or AWS.**


## Import the necessary libraries
We will need OpenCV, Matplotlib, and NumPy

In [None]:
# your code here
import cv2
import numpy as np
import matplotlib.pyplot as plt
import random
from IPython.display import set_matplotlib_formats

## Define the class YOLO and the init() function

In [None]:
class YOLO():
    def __init__(self):
        """
        - YOLO takes an image as input. We should set the dimension of the image to a fixed number.
        - The default choice is often 416x416.
        - YOLO applies thresholding and non maxima suppression, define a value for both
        - Load the classes (.names), model configuration (cfg file) and pretrained weights (weights file) into variables
        - If the image is 416x416, the weights must be corresponding to that image
        - Load the network with OpenCV.dnn function
        """
        # TODO
        self.confThreshold = 0.5     #confidence threshold
        self.nmsThreshold = 0.4      #non maxima suppression threshold
        self.inpWidth = 608          # input image width
        self.inpHeight = 608         #input image height
        classesFile = "yolov3/coco.names"      #path for names file used for labels
        self.classes = None
        self.detected_classes = []
        self.color_list = []
        with open(classesFile,'rt') as f:     #read classes file and put into list
            self.classes = f.read().rstrip('\n').split('\n')

        modelConfiguration = "yolov3/yolov3.cfg"    #configuration file path
        modelWeights = "yolov3/yolov3.weights"
        self.net = cv2.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
        self.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
        self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)

In [None]:
def getOutputsNames(self):
    '''
    Get the names of the output layers
    '''
    # Get the names of all the layers in the network
    layersNames = self.net.getLayerNames()
    # Get the names of the output layers, i.e. the layers with unconnected outputs
    return [layersNames[i[0] - 1] for i in self.net.getUnconnectedOutLayers()]

YOLO.getOutputsNames = getOutputsNames

In [None]:
def drawPred(self, frame, classId, conf, left, top, right, bottom):
    '''
    Draw a bounding box around a detected object given the box coordinates
    Later, we could repurpose that to display an ID
    '''
    # Draw a bounding box.
    # your code here
    
    if classId not in self.detected_classes:   # if the classid has not already been detected
      self.detected_classes.append(classId)        #append to detected class list 
      color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
      while color in self.color_list:  # if color already exist create different color
        color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
      self.color_list.append(color)     #apppend new color
    
    else:
      idx = self.detected_classes.index(classId)     #find index of classId in list
      color = self.color_list[idx]                   #assign color
        

    cv2.rectangle(frame,(left,top),(right,bottom),(color),3)
    
    label = '%.2f' % conf    # reduce float variable of label percentage

    # Get the label for the class name and its confidence
    # your code here
    if self.classes:
      assert(classId < len(self.classes))      #check if the classId is greater than the class list hence it exists in the label list
      label = '%s:%s' % (self.classes[classId],label)   #convert label name and label percentage to string

    
    FONT = cv2.FONT_HERSHEY_SIMPLEX
    FONT_SCALE = 0.5
    FONT_THICKNESS = 1
    bg_color = (255, 255, 255)  #background color
    label_color = (0, 0, 0)

    #Display the label at the top of the bounding box
    # your code here
    labelsize, baseline = cv2.getTextSize(label,FONT,FONT_SCALE,FONT_THICKNESS)

    top = max(top,labelsize[1])
    val = 1.0
    tp = 6
    cv2.rectangle(frame, (left, top - round(val*labelsize[1])),(left + round(val*labelsize[0]), top + baseline), bg_color, cv2.FILLED)
    cv2.putText(frame,label, (left , top),FONT,FONT_SCALE,label_color,FONT_THICKNESS)

    return frame
    
YOLO.drawPred = drawPred

In [None]:
def postprocess(self,frame, outs):
    """
    Postprocessing step. Take the output out of the neural network and interpret it.
    We should use that output to apply NMS thresholding and confidence thresholding
    We should use the output to draw the bounding boxes using the dramPred function
    """
    frameHeight = frame.shape[0]
    frameWidth = frame.shape[1]
    classIds = []
    confidences = []
    boxes = []
    # Scan through all the bounding boxes output from the network and keep only the
    # ones with high confidence scores. Assign the box's class label as the class with the highest score.
    # your code here
    
    for out in outs:
      for detection in out:
        scores = detection[5:]
        classId = np.argmax(scores)
        confidence = scores[classId]
        if confidence > self.confThreshold:
          center_x = int(detection[0] * frameWidth)
          center_y = int(detection[1] * frameHeight)
          width = int(detection[2] * frameWidth)
          height = int(detection[3] * frameHeight)
          left = int(center_x - width / 2)
          top = int(center_y - height / 2)
          classIds.append(classId)
          confidences.append(float(confidence))
          boxes.append([left, top, width, height])


    # Perform non maximum suppression to eliminate redundant overlapping boxes with
    # lower confidences.
    # your code here
    indices  = cv2.dnn.NMSBoxes(boxes,confidences,self.confThreshold,self.nmsThreshold)
    
    for i in indices:
      i = i[0]
      box = boxes[i]
      left = box[0]
      top = box[1]
      width = box[2]
      height = box[3]
    # Draw the bounding boxes on the image
    # your code here
      frame = self.drawPred(frame,classIds[i],confidences[i],left,top,left+width,top+height)

    return frame

YOLO.postprocess = postprocess

In [None]:
def inference(self,image):
    """
    Main loop.
    Input: Image
    Output: Frame with the drawn bounding boxes
    """
    # Create a 4D blob from a frame.  convert image into 4d blob as DNN input
    blob = cv2.dnn.blobFromImage(image,1/255,(self.inpWidth,self.inpHeight),[0,0,0],1,crop=False)

    # Sets the input to the network
    self.net.setInput(blob)

    # Runs the forward pass to get output of the output layers
    outs = self.net.forward(self.getOutputsNames())

    # Remove the bounding boxes with low confidence
    final_frame = self.postprocess(image,outs)
    return final_frame

YOLO.inference = inference

In [None]:
os.chdir("/content/drive/My Drive/perception projects/object tracking")
img = plt.imread("/content/drive/My Drive/perception projects/object tracking/images/man_and_dog.jpg")
img = cv2.resize(img,(1080,1200))
# !pwd
yolo = YOLO()
oi = yolo.inference(img)
plt.figure(figsize=(20,20))
plt.imshow(oi)
plt.show()
plt.imsave("images/man_dog_output.jpg",oi)

Output hidden; open in https://colab.research.google.com to view.

In [None]:
# yolo = YOLO()
# cap = cv2.VideoCapture("images/day_drive.mp4")

# output_file = "images/day_drive_output.avi"
# fps = 30
# video_size = (round(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
# fourcc = cv2.VideoWriter_fourcc(*'MJPG')
# save_output = cv2.VideoWriter(output_file, fourcc, fps, video_size) 

# while cap.isOpened():
#   ret,frame = cap.read()
#   if ret:
#     oi = yolo.inference(frame)
#     save_output.write(oi)

# save_output.release()  
# cv2.destroyAllWindows()
# print("done")

### Detection on a video!

A video is just a set of frames, we will call the inference function for each frame of the video and save it.


In [None]:
from moviepy.editor import VideoFileClip
!pwd
video_file = "/content/drive/My Drive/perception projects/object tracking/images/day_drive_3mins.mp4"
clip = VideoFileClip(video_file).subclip(0,10)
white_clip = clip.fl_image(yolo.inference)
%time white_clip.write_videofile("movie.mp4",audio=False)

In [None]:
import io
import base64
from IPython.display import HTML

video = io.open('movie.mp4', 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls width="720" height="480">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))) 