# 8 Deep Learning with Keras

## 8.5 YOLO: Object Detection

YOLO = You Only Look Once. Version v3 is analyzed here, v1 was published in 2016 by FAIR.

The model receives an image and outputs AABBs of detected object classes known to it.

Prior object detection models used to apply several models to different regions and scales. YOLO applies one model to each image (thus, You Only Look Once). The network divides the imge into regions and predicts bounding boxes and probabilities for each region.

As a result, YOLO is much faster (100x-1000x) than alternatives.

The code can be downloaded from
https://pjreddie.com/darknet/yolo/

#### COCO Dataset and YOLO pre-trained weights

- COCO Dataset from Microsoft: Common Objects in Context
    - https://cocodataset.org/
    - 80 different object categories
    - 1.5M object instances
    - Try and explore dataset online
- We use the pretrained YOLO model, because training it from the scratch would take many hours
    - Download the YOLO v3 weights used in the course:
    https://drive.google.com/file/d/1yT2-zmNFymMgY42Z72LIuqMaiWvYEUQR/view?usp=sharing
    - The model is very complex: 200 MB!!

Code Source:
https://github.com/xiaochus/YOLOv3

YOLO is implemented in the Darknet framework by Joseph Redmon (the author of YOLO), but the demo file converts it to Tensorflow. The folder structure is necessary if the demo files are used.

In [2]:
import os
import time
import cv2
import numpy as np
from model.yolo_model import YOLO

In [3]:
# Function to preprare any image
def process_image(img):
    """Resize, reduce and expand image.

    # Argument:
        img: original image.

    # Returns
        image: ndarray(64, 64, 3), processed image.
    """
    image = cv2.resize(img, (416, 416),
                       interpolation=cv2.INTER_CUBIC)
    image = np.array(image, dtype='float32')
    image /= 255.
    image = np.expand_dims(image, axis=0)

    return image

In [4]:
# Get class names from file
def get_classes(file):
    """Get classes name.

    # Argument:
        file: classes name for database.

    # Returns
        class_names: List, classes name.

    """
    with open(file) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]

    return class_names

In [5]:
# Function to draw boxes
def draw(image, boxes, scores, classes, all_classes):
    """Draw the boxes on the image.

    # Argument:
        image: original image.
        boxes: ndarray, boxes of objects.
        classes: ndarray, classes of objects.
        scores: ndarray, scores of objects.
        all_classes: all classes name.
    """
    for box, score, cl in zip(boxes, scores, classes):
        x, y, w, h = box

        top = max(0, np.floor(x + 0.5).astype(int))
        left = max(0, np.floor(y + 0.5).astype(int))
        right = min(image.shape[1], np.floor(x + w + 0.5).astype(int))
        bottom = min(image.shape[0], np.floor(y + h + 0.5).astype(int))

        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(all_classes[cl], score),
                    (top, left - 6),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 0, 255), 1,
                    cv2.LINE_AA)

        print('class: {0}, score: {1:.2f}'.format(all_classes[cl], score))
        print('box coordinate x,y,w,h: {0}'.format(box))

    print()

In [6]:
# Inference: prepare image, call yolo.predict, draw results
def detect_image(image, yolo, all_classes):
    """Use yolo v3 to detect images.

    # Argument:
        image: original image.
        yolo: YOLO, yolo model.
        all_classes: all classes name.

    # Returns:
        image: processed image.
    """
    pimage = process_image(image)

    start = time.time()
    boxes, classes, scores = yolo.predict(pimage, image.shape)
    end = time.time()

    print('time: {0:.2f}s'.format(end - start))

    if boxes is not None:
        draw(image, boxes, scores, classes, all_classes)

    return image

In [7]:
# Detect objects frame-by-frame... it's slow
def detect_video(video, yolo, all_classes):
    """Use yolo v3 to detect video.

    # Argument:
        video: video file.
        yolo: YOLO, yolo model.
        all_classes: all classes name.
    """
    video_path = os.path.join("videos", "test", video)
    camera = cv2.VideoCapture(video_path)
    cv2.namedWindow("detection", cv2.WINDOW_AUTOSIZE)

    # Prepare for saving the detected video
    sz = (int(camera.get(cv2.CAP_PROP_FRAME_WIDTH)),
        int(camera.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    fourcc = cv2.VideoWriter_fourcc(*'mpeg')

    
    vout = cv2.VideoWriter()
    vout.open(os.path.join("videos", "res", video), fourcc, 20, sz, True)

    while True:
        res, frame = camera.read()

        if not res:
            break

        image = detect_image(frame, yolo, all_classes)
        cv2.imshow("detection", image)

        # Save the video frame by frame
        vout.write(image)

        if cv2.waitKey(110) & 0xff == 27:
                break

    vout.release()
    camera.release()

In [8]:
# Thresholds are passed
# object threshold: min confidence necessary to consider an object
yolo = YOLO(0.6, 0.5)
file = 'data/coco_classes.txt'
all_classes = get_classes(file)

2021-12-11 09:50:50.418264: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-11 09:50:50.418734: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance.




## Detecting images

In [9]:
# It is quite slow, not realtime (90 seconds on my Mac with a large test image)
f = 'jingxiang-gao-489454-unsplash.jpg'
path = 'images/test/'+f
image = cv2.imread(path)
image = detect_image(image, yolo, all_classes)
cv2.imwrite('images/res/' + f, image)

time: 4.82s
class: person, score: 0.64
box coordinate x,y,w,h: [2523.89198542 1482.56196737  619.40865219 1302.60187244]
class: bicycle, score: 0.84
box coordinate x,y,w,h: [2877.69538164 2008.9804821  1303.69913578  717.74824047]



True

In [11]:
# # detect videos one at a time in videos/test folder    
# video = 'library1.mp4'
# detect_video(video, yolo, all_classes)