# Tutorial 08 - Object Detection with YOLO

## Setup DevCube GPU

Before you can start, you have to find a GPU on the system that is not heavily used by other users. Otherwise you cannot initialize your neural network.


**Hint:** the command is **nvidia-smi**, just in case it is displayed above in two lines because of a line break.

As a result you get a summary of the GPUs available in the system, their current memory usage (in MiB for megabytes), and their current utilization (in %). There should be six or eight GPUs listed and these are numbered 0 to n-1 (n being the number of GPUs). The GPU numbers (ids) are quite at the beginning of each GPU section and their numbers increase from top to bottom by 1.

Find a GPU where the memory usage is low. For this purpose look at the memory usage, which looks something like '365MiB / 16125MiB'. The first value is the already used up memory and the second value is the total memory of the GPU. Look for a GPU where there is a large difference between the first and the second value.

**Remember the GPU id and write it in the next line instead of the character X.**

In [2]:
# Change X to the GPU number you want to use,
# otherwise you will get a Python error
# e.g. USE_GPU = 4
USE_GPU = 7

<font color=red>**YOLO needs a lot of GPU memory. If the notebook does not work on a certain GPU, then try a GPU with 16GB of memory.**</font>

In [4]:
!nvidia-smi

Tue Nov 30 12:25:49 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Quadro RTX 5000     On   | 00000000:01:00.0 Off |                  Off |
| 33%   30C    P8    17W / 230W |      1MiB / 16125MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 5000     On   | 00000000:24:00.0 Off |                  Off |
| 33%   28C    P8     7W / 230W |      1MiB / 16125MiB |      0%      Default |
|       

### Choose one GPU

**The following code is very important and must always be executed before using TensorFlow in the exercises, so that only one GPU is used and that it is set in a way that not all its memory is used at once. Otherwise, the other students will not be able to work with this GPU.**

The following program code imports the TensorFlow library for Deep Learning and outputs the version of the library.

Then, TensorFlow is configured to only see the one GPU whose number you wrote in the above cell (USE_GPU = X) instead of the X.

Finally, the GPU is set so that it does not immediately reserve all memory, but only uses more memory when needed. 

(The comments within the code cell explains a bit of what is happening if you are interested to better understand it. See also the documentation of TensorFlow for an explanation of the used methods.)

In [5]:
# Import TensorFlow 
import tensorflow as tf

# Print the installed TensorFlow version
print(f'TensorFlow version: {tf.__version__}\n')

# Get all GPU devices on this server
gpu_devices = tf.config.list_physical_devices('GPU')

# Print the name and the type of all GPU devices
print('Available GPU Devices:')
for gpu in gpu_devices:
    print(' ', gpu.name, gpu.device_type)
    
# Set only the GPU specified as USE_GPU to be visible
tf.config.set_visible_devices(gpu_devices[USE_GPU], 'GPU')

# Get all visible GPU  devices on this server
visible_devices = tf.config.get_visible_devices('GPU')

# Print the name and the type of all visible GPU devices
print('\nVisible GPU Devices:')
for gpu in visible_devices:
    print(' ', gpu.name, gpu.device_type)
    
# Set the visible device(s) to not allocate all available memory at once,
# but rather let the memory grow whenever needed
for gpu in visible_devices:
    tf.config.experimental.set_memory_growth(gpu, True)

TensorFlow version: 2.3.0

Available GPU Devices:
  /physical_device:GPU:0 GPU
  /physical_device:GPU:1 GPU
  /physical_device:GPU:2 GPU
  /physical_device:GPU:3 GPU
  /physical_device:GPU:4 GPU
  /physical_device:GPU:5 GPU
  /physical_device:GPU:6 GPU
  /physical_device:GPU:7 GPU

Visible GPU Devices:
  /physical_device:GPU:7 GPU


## Tutorial object detection with YOLO

This tutorial will introduce you to object detection using the very powerful YOLO model. Many of the ideas in this notebook are described in the two YOLO papers: [Redmon et al., 2016](https://arxiv.org/abs/1506.02640) and [Redmon and Farhadi, 2016](https://arxiv.org/abs/1612.08242). 

### LEARNING OBJECTIVES:

- Use object detection on a car detection dataset

+ Recognizing Multiple Images with YOLO Darknet

## 01 - Task Description

You are working on a self-driving car. As a critical component of this project, you'd like to first build a car detection system. To collect data, you've mounted a camera to the hood (meaning the front) of the car, which takes pictures of the road ahead every few seconds while you drive around. You've gathered all these images into a folder and have labelled them by drawing bounding boxes around every car you found. Here's an example of what your bounding boxes look like.

<center>
<img src="images/box_label.png" style="width:500px;height:250;">
</center>
<caption><center>  **Figure 1** : **Definition of a box**<br> </center></caption>

If you have 80 classes that you want the object detector to recognize, you can represent the class label $c$ either as an integer from 1 to 80, or as an 80-dimensional vector (with 80 numbers) one component of which is 1 and the rest of which are 0.  

In this exercise, you will learn how "You Only Look Once" (YOLO) performs object detection, and then apply it to car detection. Because the YOLO model is very computationally expensive to train, we will load pre-trained weights for you to use. 

## 02 - Intro YOLO

"You Only Look Once" (YOLO) is a popular algorithm because it achieves high accuracy while also being able to run in real-time. This algorithm "only looks once" at the image in the sense that it requires only one forward propagation pass through the network to make predictions. After non-max suppression, it then outputs recognized objects together with the bounding boxes.

### Model details

#### Inputs and outputs
- The **input** is a batch of images, and each image has the shape (m, 608, 608, 3)
- The **output** is a list of bounding boxes along with the recognized classes. Each bounding box is represented by 6 numbers $(p_c, b_x, b_y, b_h, b_w, c)$ as explained above. If you expand $c$ into an 80-dimensional vector, each bounding box is then represented by 85 numbers. 

#### Anchor Boxes
* Anchor boxes are chosen by exploring the training data to choose reasonable height/width ratios that represent the different classes.  For the example below, 5 anchor boxes were chosen  (to cover the 80 classes), and stored in the file './model_data/yolo_anchors.txt'
* The dimension for anchor boxes is the second to last dimension in the encoding: $(m, n_H,n_W,anchors,classes)$.
* The YOLO architecture is: IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85).  


#### Encoding
Let's look in greater detail at what this encoding represents. 
<center> 
<img src="images/architecture.png" style="width:800px;height:500;">
     </center>
<caption><center> <u> **Figure 2** </u>: **Encoding architecture for YOLO**<br> </center></caption>

If the center/midpoint of an object falls into a grid cell, that grid cell is responsible for detecting that object.

Since we are using 5 anchor boxes, each of the 19 x19 cells thus encodes information about 5 boxes. Anchor boxes are defined only by their width and height.

For simplicity, we will flatten the last two last dimensions of the shape (19, 19, 5, 85) encoding. So the output of the Deep CNN is (19, 19, 425).

<center>
<img src="images/flatten.png" style="width:700px;height:400;">
</center>
<caption><center> <u> **Figure 3** </u>: **Flattening the last two last dimensions**<br> </center></caption>

#### Class score

Now, for each box (of each cell) we will compute the following element-wise product and extract a probability that the box contains a certain class.  
The class score is $score_{c,i} = p_{c} \times c_{i}$: the probability that there is an object $p_{c}$ times the probability that the object is a certain class $c_{i}$.

<center>
<img src="images/probability_extraction.png" style="width:700px;height:400;">
</center>
<caption><center> <u> **Figure 4** </u>: **Find the class detected by each box**<br> </center></caption>

##### Example of figure 4
* In figure 4, let's say for box 1 (cell 1), the probability that an object exists is $p_{1}=0.60$.  So there's a 60% chance that an object exists in box 1 (cell 1).  
* The probability that the object is the class "category 3 (a car)" is $c_{3}=0.73$.  
* The score for box 1 and for category "3" is $score_{1,3}=0.60 \times 0.73 = 0.44$.  
* Let's say we calculate the score for all 80 classes in box 1, and find that the score for the car class (class 3) is the maximum.  So we'll assign the score 0.44 and class "3" to this box "1".

#### Visualizing classes
Here's one way to visualize what YOLO is predicting on an image:
- For each of the 19x19 grid cells, find the maximum of the probability scores (taking a max across the 80 classes, one maximum for each of the 5 anchor boxes).
- Color that grid cell according to what object that grid cell considers the most likely.

Doing this results in this picture: 

<center> 
<img src="images/proba_map.png" style="width:300px;height:300;"> 
</center>
<caption><center><u> **Figure 5** </u>: Each one of the 19x19 grid cells is colored according to which class has the largest predicted probability in that cell.<br></center></caption>  
  
  
Note that this visualization isn't a core part of the YOLO algorithm itself for making predictions; it's just a nice way of visualizing an intermediate result of the algorithm. 

#### Visualizing bounding boxes
Another way to visualize YOLO's output is to plot the bounding boxes that it outputs. Doing that results in a visualization like this:  

<center><img src="images/anchor_map.png" style="width:200px;height:200;"></center>
<caption><center>  **Figure 6** : Each cell gives you 5 boxes. In total, the model predicts: 19x19x5 = 1805 boxes just by looking once at the image (one forward pass through the network)! Different colors denote different classes. <br> </center></caption>

In [6]:
# helper function to formate time string
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return f"{h}:{m:>02}:{s:>05.2f}"

## 03 - Performing Car Detection with YOLO v3



### Using YOLO in Python

To make use of YOLO in Python, you have several options:

* **[DarkNet](https://pjreddie.com/darknet/yolo/)** - The original implementation of YOLO, written in C.
* **[yolov3-tf2](https://github.com/zzh8829/yolov3-tf2)** - An unofficial Python package that implements YOLO in Python, using TensorFlow 2.0.



### Installing YoloV3-TF2

YoloV3-TF2 is not available directly through either PIP or CONDA. Therefore you need to go through several steps to install it.  This section describes the process of installing YoloV3-TF2.    For a local install, you must perform these steps only once for your virtual Python environment.  If you are installing locally, make sure to install to the same virtual environment that you created for this course.  The following command installs YoloV3-TF2 directly from it's GitHub repository.

In [7]:
import sys

!{sys.executable} -m pip install git+https://github.com/zzh8829/yolov3-tf2.git@master

Collecting git+https://github.com/zzh8829/yolov3-tf2.git@master
  Cloning https://github.com/zzh8829/yolov3-tf2.git (to revision master) to /tmp/pip-req-build-q7zuqijk
  Running command git clone -q https://github.com/zzh8829/yolov3-tf2.git /tmp/pip-req-build-q7zuqijk
Building wheels for collected packages: yolov3-tf2
  Building wheel for yolov3-tf2 (setup.py) ... [?25ldone
[?25h  Created wheel for yolov3-tf2: filename=yolov3_tf2-0.1-py3-none-any.whl size=9187 sha256=e4149f679ab1acc30be78816b952c5ea97fcb49a8cd9ef7634b60067b6564fa4
  Stored in directory: /tmp/pip-ephem-wheel-cache-6bolfhu7/wheels/dc/40/57/f6ce9c0aa58da78f10d29a11476132dbf0a616bb92826be28f
Successfully built yolov3-tf2
Installing collected packages: yolov3-tf2
Successfully installed yolov3-tf2-0.1
You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.[0m


To use of YoloV3-TF2 we need the following files:

* **yolov3.weights** - These are the pre-trained weights provided by the author of YOLO.
* **coco.names** - The names of the 80 items that the **yolov3.weights** neural network was trained to recognize.
* **yolov3.tf** - The YOLO weights converted to a format that TensorFlow can use directly.

These are located at the ususal location **'coursematerial/GIS/YOLO'**. (You do **not** need to copy in your working directory.)

Researchers have trained YOLO on a variety of different computer image datasets.  The version of YOLO weights used in this course is from the dataset Common Objects in Context (COCO). [[Cite: lin2014microsoft]](https://arxiv.org/abs/1405.0312) This dataset contains images labeled into 80 different classes. 

YOLO was also adapted for mobile devices by creating the YOLO Tiny pre-trained weights that use a much smaller convolutional neural network and still achieve acceptable levels of quality.  Though YoloV3-TF2 can work with either YOLO Tiny or regular YOLO we are not using the tiny weights for this tutorial.

In [8]:
from pathlib import Path
ROOT = str(Path.home()) + r'/coursematerial/GIS/YOLO/'

### Transfering Weights

In this tutorial we use trained weights for our YOLO networks.  It can take considerable time to train a YOLO network from scratch.  If you would like to train a YOLO network to recognize images other than the COLO provided images, then you may need to train your own YOLO information.  If training from scratch is something you need to do, there is further information on this at the YoloV3-TF2 GitHub repository.

The weights provided by the original authors of YOLO is not directly compatible with TensorFlow.  Because of this, the provided YOLO  weights have been convert the into a TensorFlow compatible format 

The conversion script is no longer needed once this script converts the YOLO weights have to a TensorFlow format.  Because this executable file resides in the same directory as the course files, we delete it at this point.

In [9]:
import os
filename_classes =  os.path.join(ROOT,'coco.names')
filename_converted_weights = os.path.join(ROOT,'yolov3.tf')

Now that we have all of the files needed for YOLO, we are ready to use it to recognize components of an image.

### Running YOLO (Darkflow)

The YoloV3-TF2 library can easily integrate with Python applications.  The initialization of the library consists of three steps.  First, it is essential to import all of the needed packages for the library.  Next, the Python program must define all of the YOLO configurations through the Keras flags architecture. The Keras flag system primarily works from the command line; however, it also allows configuration programmatically in an application.  For this example, we configure the package programmatically.  Finally, we must scan available devices so that our application takes advantage of any GPUs.   The following code performs all three of these steps.

In [10]:
import time
from absl import app, flags, logging
from absl.flags import FLAGS
import cv2
import numpy as np
import tensorflow as tf
from yolov3_tf2.models import (YoloV3, YoloV3Tiny)
from yolov3_tf2.dataset import transform_images, load_tfrecord_dataset
from yolov3_tf2.utils import draw_outputs
import sys
from PIL import Image, ImageFile
import requests

# Flags are used to define several options for YOLO.
flags.DEFINE_string('classes', filename_classes, 'path to classes file')
flags.DEFINE_string('weights', filename_converted_weights, 'path to weights file')
flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny')
flags.DEFINE_integer('size', 416, 'resize images to')
flags.DEFINE_string('tfrecord', None, 'tfrecord instead of image')
flags.DEFINE_integer('num_classes', 80, 'number of classes in the model')
FLAGS([sys.argv[0]])

['/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py']

It is important to understand that Keras flags can only be defined once. If you are going to classify more than one image, make sure that you do not define the flags additional times.

The following code initializes a YoloV3-TF2 classification object.  The weights are loaded, and the object is ready for use as the **yolo** variable.  It is not necessary to reload the weights and obtain a new **yolo** variable for each classification.  

In [None]:
# This example does not use the "Tiny version"
if FLAGS.tiny:
    yolo = YoloV3Tiny(classes=FLAGS.num_classes)
else:
    yolo = YoloV3(classes=FLAGS.num_classes)

# Load weights and classes
yolo.load_weights(FLAGS.weights).expect_partial()
print('weights loaded')

class_names = [c.strip() for c in open(FLAGS.classes).readlines()]
print('classes loaded')

Next, we obtain an image to classify. For this example, the program loads an image from the coursematerial folder.  YoloV3-TF2 expects that the image is in the format of a Numpy array.  An image file, such as JPEG or PNG, is converted into this raw Numpy format by calling the TensorFlow **decode_image** function.  YoloV3-TF2 can obtain images from other sources, so long as the program first decodes them to raw Numpy format.  The following code obtains the image in this format.

Images are provided in the **'coursematerial/GIS/YOLO/images'** folder that you can directly access.

In [None]:
image =  os.path.join(ROOT, r'images/0035.jpg')
content = tf.io.read_file(image)
img_raw = tf.image.decode_image(content, channels=3)

At this point, we can classify the image that was just loaded.  The program should preprocess the image so that it is the size expected by YoloV3-TF2.  Your program also sets the confidence threshold at this point.  Any sub-image recognized with confidence below this value is not returned by YOLO.

In [None]:
# Preprocess image
img = tf.expand_dims(img_raw, 0)
img = transform_images(img, FLAGS.size)

# Desired threshold (any sub-image below this confidence 
# level will be ignored.)
FLAGS.yolo_score_threshold = 0.5

# Recognize and report results
t1 = time.time()
boxes, scores, classes, nums = yolo(img)
t2 = time.time()
print(f"Prediction time: {hms_string(t2 - t1)}")

Note that prediction time for your first image takes the longest. The next images take much less time.

It is important to note that the **yolo** class instantiated here is a callable object, which means that it can fill the role of both an object and a function. Acting as a function, *yolo* returns three arrays named **boxes**, **scores**, and **classes** that are of the same length.  The function returns all sub-images found with a score above the minimum threshold.  Additionally, the **yolo** function returns an array named called **nums**. The first element of the **nums** array specifies how many sub-images YOLO found to be above the score threshold.

* **boxes** - The bounding boxes for each of the sub-images detected in the image sent to YOLO.
* **scores** - The confidence for each of the sub-images detected.
* **classes** - An array index to the string class names for each of the items. These are COCO names such as "person" or "dog." 
* **nums** - The number of images above the threshold.

Your program should use these values to perform whatever actions you wish as a result of the input image.  The following code simply displays the images detected above the threshold.

In [None]:
print('detections:')
for i in range(nums[0]):
    cls = class_names[int(classes[0][i])]
    score = np.array(scores[0][i])
    box = np.array(boxes[0][i])
    print(f"\t{cls}, {score}, {box}")

Your program should use these values to perform whatever actions you wish as a result of the input image.  The following code simply displays the images detected above the threshold.

YoloV3-TF2 includes a function named **draw_outputs** that allows the sub-image detections to visualized.  The following image shows the output of the draw_outputs function.  You might have first seen YOLO demonstrated as an image with boxes and labels around the sub-images. A program can produce this output with the arrays returned by the **yolo** function.

In [None]:
# Display image using YOLO library's built in function
img = img_raw.numpy()
img = draw_outputs(img, (boxes, scores, classes, nums), class_names)
#cv2.imwrite(FLAGS.output, img) # Save the image
display(Image.fromarray(img, 'RGB')) # Display the image

Note that the **yolo** class returns everything as TensorFlow tensor where the first dimension (in our example) is always 1 as there is one image. That is the reason you have to use, e.g., classes[0] to get to your data

In [None]:
print(f'classes: {classes.shape}')
print(f'boxes:   {boxes.shape}')
print(f'scores:  {scores.shape}')
print(f'nums:    {nums.shape}')

In [None]:
print(f'Classes:\n {classes[0]}')

In [None]:
print(f'Scores:\n {scores[0]}')

### Some ideas to play around with YOLO and practice:

1. Try a different image (maybe one with different objects).
2. Modify the code above (or write new code in the cells below) in such a way that a random image from the images location is selected and run the YOLO prediction.
3. Use a different score threshold. What happens?
4. Try out your own street image and detect the represented objects with YOLO.