## YOLO (You Only Look Once) algorithm:

YOLO is an efficient algorithm for detecting multiple in a single image. It segments the image into regions, and calculates the propability for each region to belong to a specific class.

Here we use this algorithm with a webcamera to label objects in the view.

First, we need to install some dependences, mainly OpenCV library. A commonly used library for computer vision applications.

In [None]:
#install required libraries
!conda install python=3.5 -y
!conda install -c menpo opencv3 -y
!pip3 install opencv-contrib-python


Here, we need to install the model weights for the network. There are many weights and configuration files can be installed, which affect the speed and precision of the detection depending on your target application. Check [YOLO website](https://pjreddie.com/darknet/yolo/) for more details about the available pre-trained models.

In [None]:
#weights can be downloaded from here
#https://pjreddie.com/darknet/yolo/

#!curl -O https://pjreddie.com/media/files/yolov3-tiny.weights
#!curl -O https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3-tiny.cfg
    
!curl -O https://pjreddie.com/media/files/yolov3.weights
!curl -O https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg
    

!curl -O https://raw.githubusercontent.com/pjreddie/darknet/master/data/coco.names

### Helper Functions

Here are some helper functions to load a network, detect prediction, and draw them.

In [45]:
import cv2
import numpy as np

def LoadNetwork(name="yolov3"):
    #load labels
    with open("coco.names", 'r') as f:
        classes = [line.strip() for line in f.readlines()]

    #create the network
    net = cv2.dnn.readNet(name+".weights", name+".cfg")
    return net,classes

def DetectObjects(net,image):
    Width = image.shape[1]
    Height = image.shape[0]

    scale = 0.001
    blob = cv2.dnn.blobFromImage(image, scale, (416,416), (0,0,0), True, crop=False)

    net.setInput(blob)
    
    layer_names = net.getLayerNames()
    output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

    outs = net.forward(output_layers)
    class_ids = []
    confidences = []
    boxes = []
    conf_threshold = 0.5
    nms_threshold = 0.3


    for out in outs:
        for detection in out:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > 0.3:
                center_x = int(detection[0] * Width)
                center_y = int(detection[1] * Height)
                w = int(detection[2] * Width)
                h = int(detection[3] * Height)
                x = center_x - w / 2
                y = center_y - h / 2
                class_ids.append(class_id)
                confidences.append(float(confidence))
                boxes.append([x, y, w, h])

    indices = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold)

    
    indices=[i[0] for i in indices]
    
    result=[]
    
    for i in indices:
        result.append([class_ids[i],round(100*confidences[i]),boxes[i]])
    
    return result



def DrawPrediction(img,classes, class_id, confidence, x, y, x_plus_w, y_plus_h,COLORS):
    label = "{0}: {1}%".format(classes[class_id],confidence)
    color = COLORS[class_id]
    cv2.rectangle(img, (x,y), (x_plus_w,y_plus_h), color, 2)
    cv2.putText(img, label, (x-10,y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

def DrawResults(image,classes,results,COLORS):
    for r in results:
        box=r[2]
        x=round(box[0])
        y=round(box[1])
        w=round(box[2])
        h=round(box[3])
        DrawPrediction(image,classes,r[0],r[1],x,y,x+w,y+h,COLORS)


### Main Functions

Here is the example of using the previous helper functions. First, we initiate the builtin webcamera (if you have USB camera, change the index in cv2.VideoCapture).

Next we create the network, and specify the model name (depending on which model we downloaded) without the extension.

In [38]:
import cv2

#start webcamera
cam = cv2.VideoCapture(0)

#create network
net,classes=LoadNetwork("yolov3")

COLORS = np.random.uniform(0, 255, size=(len(classes), 3))

Main loop is here, it will capture an image from the camera, detect objects, and then draw the results.

Depending on your application, you can use "results" to check which objects were detected.

Results is an array of detected objects, and each result (R) has three elements:
* R\[0\]: Class index of the detected object. You can check the name of it using lookup "classes\[R\[0\]\]".
* R\[1\]: Propability of the detected object as a percentage.
* R\[2\]: Bounding box of the object (x,y,w,h)

In [None]:
import time

while True:
    ret, image = cam.read()
    
    start_time = time.time()
    results=DetectObjects(net,image)
    elapsed_time = time.time() - start_time
    
    DrawResults(image,classes,results,COLORS)
    cv2.putText(image, "Detection time:{0}ms".format(int(elapsed_time*1000)), (20,20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,0), 2)
    
    cv2.imshow("objects", image)
    cv2.waitKey(1)

In [36]:
cam.release()
cv2.destroyAllWindows()