## Trying OpenCV
OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library.<br/>
Being a BSD-licensed product, OpenCV makes it easy for businesses to utilize and modify the code.

The library has more than 2,500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, etc.


### Install OpenCV for Python

In [None]:
!pip install opencv-python

In [None]:
import cv2
print(cv2.__version__)

### Read in an image file

In [None]:
# download the image file we'll be using
import sys
import types
import pandas as pd
import numpy as np
import urllib.request

img_name = 'dog.jpg'

url = 'https://github.com/jacquesroy/byte-size-data-science/raw/master/data/' + img_name
# filename = url.rsplit('/', 1)[-1]
urllib.request.urlretrieve(url, img_name)

%ls -l

### Display the image

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(10, 10))
img = plt.imread(img_name)
plt.axis('off')
plt.title('image')
plt.imshow(img)

### Get the model weights
This particular model is trained on COCO dataset (common objects in context) from Microsoft.<br/>
It is capable of detecting 80 common objects:

airplane, apple, backpack, banana, baseball bat, baseball glove, bear, bed, bench, bicycle, bird, boat, book, bottle, bowl, broccoli, bus, <br/>
cake, car, carrot, cat, cell phone, chair, clock, couch, cow, cup, dining table, dog, donut, elephant, fire hydrant, fork, frisbee, giraffe, <br/>
hair drier, handbag, horse, hot dog, keyboard, kite, knife, laptop, microwave, motorcycle, mouse, orange, oven, parking meter, person, <br/>
pizza, potted plant, refrigerator, remote, sandwich, scissors, sheep, sink, skateboard, skis, snowboard, spoon, sports ball, stop sign, <br/>
suitcase, surfboard, teddy bear, tennis racket, tie, toaster, toilet, toothbrush, traffic light, train, truck, tv, umbrella, vase, wine glass, <br/>
zebra

In [None]:
!rm yolov3.weights yolov3.cfg
!wget https://pjreddie.com/media/files/yolov3.weights
!wget https://github.com/pjreddie/darknet/raw/master/cfg/yolov3.cfg
!ls -l

## yolov3.weights: 248MB
This means that there are about 62M weights in this file.

In [None]:
classes = ["person", "bicycle", "car", "motorcycle", "airplane", "bus",
"train", "truck", "boat", "traffic light", "fire hydrant",
"stop sign", "parking meter", "bench", "bird", "cat", "dog",
"horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
"backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
"skis", "snowboard", "sports ball", "kite", "baseball bat", 
"baseball glove", "skateboard", "surfboard", "tennis racket",
"bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl",
"banana", "apple", "sandwich", "orange", "broccoli", "carrot",
"hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant",
"bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote",
"keyboard", "cell phone", "microwave", "oven", "toaster", "sink",
"refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
"hair drier", "toothbrush"]

In [None]:
# Read the color image
# options: IMREAD_COLOR (1), IMREAD_GRAYSCALE (0), IMREAD_UNCHANGED (-1)
img = cv2.imread(img_name, cv2.IMREAD_COLOR) # Returns None on bad file
dims = img.shape
print("Image width: {}, height: {}, depth: {}".format(dims[1], dims[0], dims[2]))

## Loading the model
We instantiate the model from the files loader earlier: `yolov3.weights yolov3.cfg`

Then we take a look at some attributes of the model.

In [None]:
net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')

In [None]:
# How many layers? How many weights overall?
names = net.getLayerNames()
print("Number of layers: " + str(len(names)))
cnt = 0
for id in range(1, len(names)) :
    layer = net.getLayer(id)
    # Each blob is a list of numpy arrays
    for blob in layer.blobs :
        # multiply the dimensions to get the real number of weights
        d = 1
        for s in range(len(blob.shape)) :
            d = d * blob.shape[s]
        cnt = cnt + d

print("number of weights: " + str(cnt))

## Process the image
We convert the `img` numpy array of uint8 and shape (576, 768, 3) to another numpy array of float32 and shape (1, 3, 576, 768).<br/>
We then set that `blob` numpy rray as the input to our model.

The `scale` value is a multiplier for the values in the array. The value is roughly **1 / 255**. This way, all the values should be smaller or equal to one. Small values are better for neural networks.

In [None]:
# scale=0.00392
scale = 1./255.
dims = img.shape
# blobFromImage(image, scale, (Width,Height), (0,0,0), True, crop=False)
blob = cv2.dnn.blobFromImage(img, scale, (dims[1], dims[0]), (0,0,0), True, crop=False)

# Set the input to the model
net.setInput(blob)

### Helper functions
The `get_output_layers` function gets the names of the output layers.

The `draw_bounding_boxes` draws bounding boxes around the objects detected. It also display the class of the object.<br/>
Since there are 80 classes, we could have used the `COLORS` variable to have random colors assigned to each class. I decided to hardcode the colors. No color were perfect in displaying all the classes names found. I left the different colors that I tried as comments.

In [None]:
# COLORS = np.random.uniform(0, 255, size=(len(classes), 3))

def get_output_layers(net):
    layer_names = net.getLayerNames()
    output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
    return output_layers

# function to draw bounding box on the detected object with class name
def draw_bounding_box(img, class_id, confidence, x, y, x_plus_w, y_plus_h):
    label = str(classes[class_id])
    # color = COLORS[class_id]
    color1 = np.array([0.0,0.0,255.]) # red
    #color2 = np.array([0.0,211.0,255.0]) # Yellow
    #color2 = np.array([255.0,128.0,0.0]) # Blue
    #color2 = np.array([0.0,0.0,0.0]) # Black
    color2 = np.array([0.0,255.0,255.0])# Other yellow
    # color2 = np.array([0.0,0.0,128.0]) # Maroon
    #color2 = np.array([255.0,255.0,0.0]) # Cyan
    #color2 = np.array([255.0,0.0,255.0]) # Magenta
    #color2 = np.array([128.0,0.0,128.0]) # Purple
    # cv2.rectangle(img, (x,y), (x_plus_w,y_plus_h), color, 2)
    cv2.rectangle(img, (x,y), (x_plus_w,y_plus_h), color1, 2)
    # cv2.putText(img, label, (x-10,y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
    cv2.putText(img, label, (x-10,y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, color2, 2)

### Find the objects and bounding boxes
The `forward()` method computed the output of the output layers.<br/>
Out of these outputs, we find the most likely class. If it has a confidence level higher than 50%, we use it.

The output includes the position of the object and its width and heights as a fraction of the original width and size. 
This way we can easily create a bounding box for the object.

We collect all the information into lists.

In [None]:
# run inference through the network and gather predictions from output layers
outs = net.forward(get_output_layers(net))
# input image shape (dims=img.shape)
Width=dims[1]
Height=dims[0]

# initialization
class_ids = []
confidences = []
boxes = []
conf_threshold = 0.5
nms_threshold = 0.4

# for each detection from each output layer, get the confidence, class id, bounding box params
# and ignore weak detections (confidence < 0.5)
for out in outs:
    for detection in out:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        if confidence > 0.5:
            center_x = int(detection[0] * Width)
            center_y = int(detection[1] * Height)
            w = int(detection[2] * Width)
            h = int(detection[3] * Height)
            x = center_x - w / 2
            y = center_y - h / 2
            class_ids.append(class_id)
            confidences.append(float(confidence))
            boxes.append([x, y, w, h])


In [None]:
print("Classes found: " + str(class_ids))
print("Classes names: " + str([classes[i] for i in class_ids]) )
print("Classes confidence: " + str(confidences))


### Drawing bounding boxes
The `NMSBoxes()` method makes sure the proper boxes are selected based on confidence and are then drawn on the image.

The function `draw_bounding_box()` uses the `rectangle()` and the `putText()` methods.

In [None]:
# apply non-max suppression
indices = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold)

# go through the detections remaining
# after nms and draw bounding box
for i in indices:
    i = i[0]
    box = boxes[i]
    x = box[0]
    y = box[1]
    w = box[2]
    h = box[3]
    
    draw_bounding_box(img, class_ids[i], confidences[i], round(x), round(y), round(x+w), round(y+h))

# save output image to disk
cv2.imwrite("object-detection.jpg", img)


### Display result

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(10, 10))
img = plt.imread('object-detection.jpg')
plt.axis('off')
plt.title('image')
plt.imshow(img)