# Real-time object detection with OpenCV

We will use OpenCV's [Deep Neural Network module](https://docs.opencv.org/master/d6/d0f/group__dnn.html) to load a pre-trained MobileNet-SSD network and detect objects in a webcam live-stream.

In [None]:
# Code by Adrian Rosebrock
# Modified by Franziska Mack for Parsons Summer Python Class 2020
# https://www.pyimagesearch.com/2017/09/18/real-time-object-detection-with-deep-learning-and-opencv/

# MobileNet-SSD detection network (caffe implementation)
# https://github.com/chuanqi305/MobileNet-SSD

from imutils.video import VideoStream, FPS
import numpy as np
import imutils
import time
import cv2

In [None]:
# load trained model and text description of its network architecture (prototxt file)
prototxt = "/Users/franziskamack/Documents/GitHub/python/Week_09/real-time-object-detection/MobileNetSSD_deploy.prototxt.txt"
model = "/Users/franziskamack/Documents/GitHub/python/Week_09/real-time-object-detection/MobileNetSSD_deploy.caffemodel"

# use opencv's Deep Neural Network module to read the model in
net = cv2.dnn.readNetFromCaffe(prototxt, model)

In [None]:
# initialize the list of class labels MobileNet SSD was trained to detect
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
    "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
    "dog", "horse", "motorbike", "person", "pottedplant", "sheep",
    "sofa", "train", "tvmonitor", "scissors", "banana", "apple", "carrot"]

# generate a set of bounding box colors for each class
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

In [None]:
# initialize video stream
vs = VideoStream(src=0).start()
time.sleep(2.0)

Before we can pass our video stream to the network, we need to pre-process the data. 
Let's have a closer look at OpenCV’s __blobFromImage__ module, which creates a 4-dimensional blob from the image.

In [None]:
# grab the frame from the video stream and resize it
frame = vs.read()
frame = imutils.resize(frame, width=400)

# convert the frame to a blob
blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 0.007843, (300, 300), 127.5)

# let's look at the blob
print(type(blob))
print(blob.shape)

In [None]:
# pass the blob through the network and obtain the detections and predictions
net.setInput(blob)
netOutput = net.forward()

print(type(netOutput))
print(netOutput.shape)

The output of the models corresponds to an array of size (1, 1, 100, 7). We are interested in the results of the layer [0,0,:,:], where the dimension with 100 values corresponds to the number of detected bounding boxes and 7 corresponds to the class id, the confidence score and the bounding box coordinates.

In [None]:
print(netOutput[0,0,:,:].shape)
netOutput[0,0,:,:]

Let's integrate this into our video stream:

In [None]:
# loop over the frames from the video stream
while True:
    # grab the frame from the video stream and resize it
    frame = vs.read()
    frame = imutils.resize(frame, width=400)

    # grab the frame dimensions
    h = frame.shape[0]
    w = frame.shape[1]
    # convert the frame to a blob
    blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 0.007843, (300, 300), 127.5)

    # pass the blob through the network and obtain the detections and predictions
    net.setInput(blob)
    netOutput = net.forward()
    
    # loop over the detections
    for detection in netOutput[0,0,:,:]:
        # extract the confidence (i.e., probability) associated with
        # the prediction
        confidence = float(detection[2])

        # filter out weak detections by ensuring the 'confidence' is greater than 80%
        if confidence > 0.8:
            # extract the index of the class label from the 'detection'
            idx = int(detection[1])
            
            # then compute the (x, y)-coordinates of the bounding box for the object
            left = int(detection[3] * w)
            top = int(detection[4] * h)
            right = int(detection[5] * w)
            bottom = int(detection[6] * h)
 
            #draw a rectangle around detected objects
            cv2.rectangle(frame, (left, top), (right, bottom), COLORS[idx], thickness=2)

            # draw the prediction on the frame
            label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100)
            y = top - 15 if top - 15 > 15 else top + 15
            cv2.putText(frame, label, (left, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)

    # show the output frame
    cv2.imshow("Frame", frame)

    # if the 'q' key was pressed, break from the loop
    if cv2.waitKey(1) == ord("q"):
        break
        
# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()