# DEEP LEARNING FOR COMPUTER VISION

This project is focused on object recognition. It has **two parts**.The first one is about object detection using images, while the second one uses videos.

## Business impact

**Part 1. Object detection using images:** There are many companies that need to "tag" certain material (format may be pdf, jpg, etc.) in order to allow digitalization, data collection, and analytics. Usually this would be done by a person, tagging each object one by one. Using object decection, a company could detect objects, put them in a dataframe and build a database that is needed. I present a prototype below that requires tailoring for the specific need of each company.

*Possible extension idea:* coupled with image to text script, the more powerfull digitalization tool could be created because we could pick up objects, their description and all other text.

**Part 2. Object detection using videos:** The biggest impact in object detection in videos has to do with privacy, security and removal of human error factor. The company could feed all videos to the script and tag objects that appear and for how long. After that it could analyse data in collected in form of a relational database. It can be used to create automated monitoring and notification system, removing also the need to spend time watching the videos. It could create filters to get alert if some object appears. In such case, the system would have higher privacy level because no other person would constantly monitor the videos. Also, it would be a more secure system because of removal of human error factor because people could make mistakes or miss something important, while the object detection script would catch it (with high certainty) and feed it to a database. 

*Possible extension idea:* the idea could be extented to real time monitoring systems. Take for example entry/exit system of some object, we could train the model to catch dangerous objects such as weapons and get an alert in such case.

## Description

* Used YOLO (You only look once) algorithm. YOLO is a deep learning object detection algorithm and is popular because it is very fast and it detects objects in only one pass. There is are 3 frameworks to use it:darknet (written in C), darkflow (Python and uses TensorFlow), and OpenCV.
* Used OpenCV because it has support for darknet framework.
* Used YOLOv3-320 pre-trained model for classifying of 80 different classes. __[Download link.](https://pjreddie.com/darknet/yolo/)__
* The model was trained on COCO (Common Objects in Context) dataset. __[Download link.](https://github.com/pjreddie/darknet/blob/master/data/coco.names)__



## Examples

Photo material for examples obtained from from morgueFile. __[MorgueFile](https://morguefile.com/)__ is a website database for free high resolution digital stock photography for either corporate or public use.

Video material for examples obtained from Pixabay. __[Pixabay](https://pixabay.com/)__ offers images and videos, all released under __[Creative Commons Zero (CC0) License](https://creativecommons.org/publicdomain/zero/1.0/)__.


### Inputs

| Pictures | Videos   |
|------|------|
|   __[car_crash_morgueFile.jpg](https://morguefile.com/p/1130766)__  | __[Bangkok.mp4](https://pixabay.com/videos/bangkok-thailand-city-asia-road-30949/)__|
|   __[traffic_morgueFile.jpg](https://morguefile.com/p/1128775)__  | __[New_York_City.mp4](https://pixabay.com/videos/new-york-city-manhattan-people-cars-1044/)__|


### Outputs

| Pictures | Videos   |
|------|------|
|   car_crash_morgueFile_objects_detected.jpg  | Bangkok_objects_detected.avi|
|   traffic_morgueFile_objects_detected.jpg  | New_York_City_objects_detected.avi|



## Part 1. Object detection using images

In [1]:
# Import packages
import cv2 # 3.4.2 or newer
import numpy as np
import pandas as pd

We need three files to run the algorithm:

* Weights file: a trained model
* Cfg file: a configuration file
* Image file: for which we will perform object detection

In [2]:
# Load YOLO deep neural network with weights and configuration that we downloaded
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")

# Put all the classes names in a list (reading from the coco.names file)
classes = []
with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]

# Get layer names
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# To create rectagles of different color for each class
np.random.seed(92)
colors = np.random.randint(0, 255, size=(len(classes), 3), dtype="uint8")

In [3]:
# Loading image for which we want to perform object detection
img = cv2.imread("traffic_morgueFile.jpg")

# Resize the picture and get its width and height
img = cv2.resize(img, None, fx=0.3, fy=0.3) # resize the proportions
height, width, channels = img.shape

Blob is used to extract features from the image and to resize them. YOLO accepts three sizes:

* smallest 320×320 lower accuracy, and higher speed
* middle 416×416
* largest 609×609 higher accuracy, and lower speed

In [11]:
blob = cv2.dnn.blobFromImage(img, 1 / 255.0, (416, 416), (0, 0, 0), swapRB=True, crop=False)

net.setInput(blob) # input the blob image in the network
outs = net.forward(output_layers) # array that conains all the informations about objects detected

In [5]:
# Set the confidence level for a detection to be shown, and NMS threshold for noise reduction
CONFIDENCE_LEVEL = 0.5
NMS_THRESHOLD = 0.4

class_ids = []
confidence_levels_list = []
boxes = []

# for every layer output
for output in outs:
    # for every detection
    for detection in output:
        scores = detection[5:]
        class_id = np.argmax(scores) # to get the class_id
        confidence = scores[class_id] # to get the confidence for a certain class_id
        
        if confidence > CONFIDENCE_LEVEL:
            # Object detected
            # Use width and heigth because we need to associate it with original size
            center_x = int(detection[0] * width) # to get center_x coordinate
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)

            # Rectangle coordinates; (x,y) represents bottom left corner
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            
            # boxes: list of rectangle coordinates for detected objects
            # class_ids: list for detected objects 
            # confidence_levels_list: list of confidence levels for detected objects
            boxes.append([x, y, w, h])
            confidence_levels_list.append(float(confidence))
            class_ids.append(class_id)

In [6]:
# Get a dataframe that contains detected objects and confidence
class_name = [str(classes[class_ids[i]]) for i in class_ids]
df = pd.DataFrame({'object': class_name, 'confidence': confidence_levels_list})
df

Unnamed: 0,object,confidence
0,car,0.889183
1,car,0.887948
2,car,0.565972
3,car,0.677397
4,car,0.804338
...,...,...
90,car,0.993208
91,car,0.983978
92,car,0.992442
93,car,0.973989


In [7]:
# We use a function called Non maximum suppression to remove the noise (many boxex for the same object)
indexes = cv2.dnn.NMSBoxes(boxes, confidence_levels_list, CONFIDENCE_LEVEL, NMS_THRESHOLD)

# font = cv2.FONT_HERSHEY_SIMPLEX
for i in range(len(boxes)):
    if i in indexes:
        x, y, w, h = boxes[i]
        text = str(classes[class_ids[i]]) + ': ' + str(round(confidence_levels_list[i],3))
        color = [int(c) for c in colors[class_ids[i]]]
        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
        cv2.putText(img, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

# Write the image
cv2.imwrite("traffic_morgueFile_objects_detected.jpg", img)

# To just see the image instead
# cv2.imshow("Image", img)
# cv2.waitKey(0)
# cv2.destroyAllWindows()

True

## Part 2. Object detection using videos

In [2]:
# Import additional libraries
import imutils

In [3]:
# Load YOLO deep neural network with weights and configuration that we downloaded
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")

# Put all the classes names in a list (reading from the coco.names file)
classes = []
with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]

# Get layer names
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
np.random.seed(92)

# To create rectagles of different color for each class
np.random.seed(92)
colors = np.random.randint(0, 255, size=(len(classes), 3), dtype="uint8")

In [4]:
# Set the confidence level for a detection to be shown, and NMS threshold for noise reduction
CONFIDENCE_LEVEL = 0.5
NMS_THRESHOLD = 0.4

# Start the Video Stream with the input video file, and frame dimensions
vs = cv2.VideoCapture("Bangkok.mp4")
writer = None
(W, H) = (None, None)

# for every frame from the video
while True:
    # get frame from the video
    (grabbed, frame) = vs.read()

    # if grabbed is False, this breaks the while loop because we reached the end
    if not grabbed:
        break

    # if the frame dimensions are empty, grab them
    if W is None or H is None:
        (H, W) = frame.shape[:2]
    
    blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416), (0, 0, 0), swapRB=True, crop=False)
    
    net.setInput(blob) # input the blob image in the network
    outs = net.forward(output_layers) # array that conains all the informations about objects detected

    boxes = []
    confidence_levels_list = []
    class_ids = []
    
    # for every layer output
    for output in outs:
        # for every detection
        for detection in output:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]

            if confidence > CONFIDENCE_LEVEL:
                box = detection[0:4] * np.array([W, H, W, H])
                
                (center_x, center_y, width, height) = box.astype("int")

                # Rectangle coordinates; (x,y) represents bottom left corner
                x = int(center_x - (width / 2))
                y = int(center_y - (height / 2))

                boxes.append([x, y, int(width), int(height)])
                confidence_levels_list.append(float(confidence))
                class_ids.append(class_id)


    # We use a function called Non maximum suppression to remove the noise (many boxex for the same object)
    indexes = cv2.dnn.NMSBoxes(boxes, confidence_levels_list, CONFIDENCE_LEVEL, NMS_THRESHOLD)

    # check if at least one detection in the frame
    if len(indexes) > 0:
        # for every index of the detected object
        for i in indexes.flatten():
            # coordinates
            x, y, w, h = boxes[i][0], boxes[i][1], boxes[i][2], boxes[i][3]
            
            # create a rectangle and text and put it on top of the frame
            text = str(classes[class_ids[i]]) + ': ' + str(round(confidence_levels_list[i],3))
            color = [int(c) for c in colors[class_ids[i]]]
            cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
            cv2.putText(frame, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
    
    if writer is None:
        # initialize video writer
        fourcc = cv2.VideoWriter_fourcc(*"MJPG")
        writer = cv2.VideoWriter("Bangkok_objects_detected.avi", fourcc, 30, (frame.shape[1], frame.shape[0]), True)

    # write the output frame to disk
    writer.write(frame)

writer.release()
vs.release()

# References

* __[YOLO website](https://pjreddie.com/darknet/yolo/)__,
* __[Common Objects in Context](http://cocodataset.org/#overview)__,
* __[YOLO object detection with OpenCV](https://www.pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv/)__ Author: Adrian Rosebrock,
* __[Real-time Object Detection with YOLO, YOLOv2 and now YOLOv3](https://medium.com/@jonathan_hui/real-time-object-detection-with-yolo-yolov2-28b1b93e2088)__ Author: Jonathan Hui,
* __[Deep Learning based Object Detection using YOLOv3 with OpenCV ( Python / C++ )](https://www.learnopencv.com/deep-learning-based-object-detection-using-yolov3-with-opencv-python-c/)__ Author: Sunita Nayak,
* __[YOLOv3: An Incremental Improvement](https://arxiv.org/abs/1804.02767)__, Authors: Redmon, Joseph and Farhadi, Ali.