<a href="https://colab.research.google.com/github/rahiakela/building-computer-vision-applications-using-artificial-neural-networks/blob/master/6-deep-learning-in-object-detection/3_training_YOLOv3_model_for_object_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training a YOLOv3 Model for Object Detection

YOLOv3 is the youngest of all the object detection algorithms.It has not made it to the TensorFlow object detection API yet.YOLOv3 uses the Darknet-53 architecture to train the model.

We will use the official API and weights of the pretrained model to perform transfer learning of our YOLOv3 model from the same Oxford-IIIT Pet dataset that we used in the previous SSD model. We will run the training on Google Colab and use a GPU hardware accelerator.

## Installing the Darknet Framework

Darknet is an open source neural network framework written in C and CUDA that runs on both CPUs and GPUs. First, clone the Darknet GitHub repository and then build the source.

In [1]:
%%shell

git clone https://github.com/ansarisam/darknet.git
# Official repository
#git clone https://github.com/pjreddie/darknet.git

Cloning into 'darknet'...
remote: Enumerating objects: 5912, done.[K
remote: Total 5912 (delta 0), reused 0 (delta 0), pack-reused 5912[K
Receiving objects: 100% (5912/5912), 6.34 MiB | 29.66 MiB/s, done.
Resolving deltas: 100% (3922/3922), done.




After the repository is cloned, expand the file browser, navigate to the darknet
directory, and download the Makefile to your local computer. Edit the Makefile
(highlighted in bold letters) and change GPU=1 and OPENCV=1, as shown here:

```c
GPU=1
CUDNN=0
OPENCV=1
OPENMP=0
DEBUG=0
```

Make sure no other change is made to the Makefile, or you may have trouble
building your Darknet code.

Now we are ready to build the Darknet framework.

In [None]:
# Running the make Command to Build Darknet
%%shell
cd darknet/
make

After the build process successfully completes, run the below command to test your installation. 

It should print `usage: ./darknet <function>` if the installation is successful.

In [3]:
# Testing the Darknet Installation
%%shell
cd darknet
./darknet

usage: ./darknet <function>




## Downloading Pre-trained Convolutional Weights

Let's downloads pre-trained weights of the COCO dataset trained on the
Darknet-53 framework.

In [4]:
# Downloading Pre-trained Darknet-53 Weights
%%shell
mkdir pretrained
cd pretrained
wget https://pjreddie.com/media/files/darknet53.conv.74

--2020-12-01 08:59:58--  https://pjreddie.com/media/files/darknet53.conv.74
Resolving pjreddie.com (pjreddie.com)... 128.208.4.108
Connecting to pjreddie.com (pjreddie.com)|128.208.4.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 162482580 (155M) [application/octet-stream]
Saving to: ‘darknet53.conv.74’


2020-12-01 09:15:51 (167 KB/s) - ‘darknet53.conv.74’ saved [162482580/162482580]





## Downloading an Annotated Oxford-IIIT Pet Dataset

Let's downloads the pet dataset with both the images and annotations.

In [None]:
# Downloading the Pet Dataset Images and Annotations
%%shell

mkdir petdata
cd petdata

wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz

tar -xvf images.tar.gz
tar -xvf annotations.tar.gz

The images directory contains a few files with the extension .mat, which
causes the training to break. So need to remove these .mat files.

In [6]:
# Deleting the Invalid File Extension .mat
%%shell

cd /content/petdata/images
rm *.mat



## Preparing the Dataset

The YOLOv3 training API expects the dataset to have a certain format and directory structure. The pet data that we downloaded has two subdirectories:

- **images** and 
- **annotations**. 

The images directory contains all the labeled images that we will use for
training and testing. The annotations directory contains annotation files in XML format, one XML file per image.

YOLOv3 expects the following files:

- **train.txt**: This file contains the absolute path of images—one image path
per line—that will be used for training.
- **test.txt**: This file contains the absolute path of images—one image path
per line—that will be used for testing.
- **class.data**: This file contains a list of names of the object classes—one
name per line.
- **labels**: This directory is in the same location where train.txt and test.txt are located. This labels directory contains annotation files, one file per image. The file name in this directory must be the same as the image file name, except that it has the extension .txt.

For example, if the image file name is `Abyssinian_1.jpg`, the annotation file name in the labels directory must be `Abyssinian_1.txt`. Each annotation text file must contain the annotated bounding box and object class in one single line in the following format:

```python
<object-class> <x_center> <y_center> <width> <height>
```

where
- `<object-class>` is the integer class index of the object, from 0 to (num_
class-1).
- `<x_center> and <y_center>` are float values representing the center of
the bounding boxes relative to the image height and width.
- `<width> <height>` are the width and height of bounding boxes relative
to the image height and width.

Note that the entries in this file are separated by blank spaces and not
by commas or any other delimiters.

An example entry of the annotation text file is as follows (ensure the fields are separated by white space and not comma or any other delimiter.):

```python
10 0.63 0.28500000000000003 0.28500000000000003 0.215
```

In [8]:
import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET

In [10]:
# Converting Image Annotations from XML to TXT
def xml_to_csv(path, img_path, label_path):
  if not os.path.exists(label_path):
    os.makedirs(label_path)

  class_list = []
  for xml_file in glob.glob(path + "/*.xml"):
    xml_list = []
    tree = ET.parse(xml_file)
    root = tree.getroot()
    for member in root.findall("object"):
      imagename = str(root.find("filename").text)
      print("image ", imagename)
      index = int(imagename.rfind("_"))
      print("index: ", index)
      classname = imagename[0:index]

      class_index = 0
      if (class_list.count(classname) > 0):
        class_index = class_list.index(classname)
      else:
        class_list.append(classname)
        class_index = class_list.index(classname)

      print("width: ", root.find("size").find("width").text)
      print("height: ", root.find("size").find("height").text)
      print("minx: ", member[4][0].text)
      print("ymin:", member[4][1].text)
      print("maxx: ", member[4][2].text)
      print("maxy: ", member[4][3].text)
      w = float(root.find("size").find("width").text)
      h = float(root.find("size").find("height").text)
      dw = 1.0 / w
      dh = 1.0 / h
      x = (float(member[4][0].text) + float(member[4][2].text)) / 2.0 - 1
      y = (float(member[4][1].text) + float(member[4][3].text)) / 2.0 - 1
      w = float(member[4][2].text) - float(member[4][0].text)
      h = float(member[4][3].text) - float(member[4][1].text)
      x = x * dw
      w = w * dw
      y = y * dh
      h = h * dh

      value = (class_index, x, y, y, h)
      print("The line value is: ", value)
      print("csv file name: ", os.path.join(label_path, imagename.rsplit('.', 1)[0] + '.txt'))
      xml_list.append(value)
      
      df = pd.DataFrame(xml_list)
      df.to_csv(os.path.join(label_path, imagename.rsplit(".", 1)[0] + ".txt"), index=None, header=False, sep=" ")

  class_df = pd.DataFrame(class_list)

  return class_df

In [11]:
def create_training_and_test(image_dir, label_dir):
  file_list = []
  for img in glob.glob(image_dir + "/*"):
    print(os.path.abspath(img))

    imagefile = os.path.basename(img)
    textfile = imagefile.rsplit(".", 1)[0] + ".txt"

    if not os.path.isfile(label_dir + "/" + textfile):
      print("delete image file ", img)
      os.remove(img)
      continue
    file_list.append(os.path.abspath(img))

  file_df = pd.DataFrame(file_list)
  train = file_df.sample(frac=0.7, random_state=10)
  test = file_df.drop(train.index)
  train.to_csv("petdata/train.txt", index=None, header=False)
  test.to_csv("petdata/test.txt", index=None, header=False)

In [None]:
img_dir = "petdata/images"
label_dir = "petdata/labels"

xml_path = os.path.join(os.getcwd(), "petdata/annotations/xmls")
img_path = os.path.join(os.getcwd(), img_dir)
label_path = os.path.join(os.getcwd(), label_dir)

class_df = xml_to_csv(xml_path, img_path, label_path)
class_df.to_csv("petdata/class.data", index=None, header=False)
create_training_and_test(img_dir, label_dir)
print("Successfully converted xml to csv.")

## Configuring the Training Input

We need a configuration file that has the path information for the training and test sets.

The format of the config file is as follows:

```python
classes= 37
train = /content/petdata/train.txt
valid = /content/petdata/test.txt
names = /content/petdata/class.data
backup = /content/yolov3_model
```

where the classes variable takes the number of object classes our training images have (37 pet classes in our example), the train and valid variables take the path to the training and validation lists that we created earlier, names takes the path to the file containing class names, and the backup variable points to the directory path where the trained YOLO model will be saved.

Save this text file and give it a name with a .cfg extension. In our case, we save this file as pet_input.cfg. We will then upload this file to Colab in the directory path /content/darknet/cfg.

In [14]:
%%shell

# download update pet_input.cfg from github
wget https://raw.githubusercontent.com/rahiakela/building-computer-vision-applications-using-artificial-neural-networks/master/6-deep-learning-in-object-detection/pet_input.cfg

# copy dowloaded file to /content/darknet/cfg
cp pet_input.cfg /content/darknet/cfg

--2020-12-01 10:49:42--  https://raw.githubusercontent.com/rahiakela/building-computer-vision-applications-using-artificial-neural-networks/master/6-deep-learning-in-object-detection/pet_input.cfg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 148 [text/plain]
Saving to: ‘pet_input.cfg’


2020-12-01 10:49:42 (4.34 MB/s) - ‘pet_input.cfg’ saved [148/148]





## Configuring the Darknet Neural Network

Download the sample network config file from `/content/darknet/cfg/yolov3-voc.cfg` from Colab and save it in your local computer. You may rename this file to something relevant to your dataset. For example, we have renamed it to `yolov3-pet.cfg` for this exercise.

We will edit this file to match our data. The most important part of the file that we are going to edit is the yolo layer.

Search for the section `[yolo]` in the config file. There should be three yolo layers. We will edit the number of object classes, which is 37 in our case. In all three places, we will change the number of classes to 37.

In addition, we will change the filters values in the convolutional layer just before the yolo layer in all three places. The value of filters in the convolutional layer before the yolo layer is determined by the following formula:

```python
filter = num/3 * (num_class+5)
Filter = (9/3) * (37 + 5) = 126
```

Make sure you changed the classes and filters values at three places in the config file.

Other parameters that we will edit are as follows:

- **width=416**, which is the width of the input image. All images will
be resized to this width.
- **height=416**, which is the height of the input image. All images will
be resized to this height.
- **batch=64**, which indicates how frequently we want weights to be
updated.
- **subdivisions=16**, which indicates how many examples will be
loaded in memory if the GPU does not have large enough memory
to load the data examples equal to the batch size. If you see an “out
of memory” exception when you execute the training, tune this
number and gradually decrease it until you see no memory error.
- **max_batches=74000**, which indicates how many batches the training
should run. If you set it too high, the training may take a long time
to complete. If it is too low, the network will not learn enough.
Practically, it has been established that the max_batch size should be
2,000 times the number of classes. In our case, we have 37 classes, so
the max_batch value should be 2,000×37 = 74,000. If you have only
one class, set the max_batches value to a minimum of 4,000.

Save the config file and then upload it to the cfg directory path: `/
content/darknet/cfg`.



In [20]:
%%shell

# download update yolov3-pet.cfg from github
wget https://raw.githubusercontent.com/rahiakela/building-computer-vision-applications-using-artificial-neural-networks/master/6-deep-learning-in-object-detection/yolov3-pet.cfg

# copy dowloaded file to /content/darknet/cfg
cp yolov3-pet.cfg /content/darknet/cfg

--2020-12-01 11:22:47--  https://raw.githubusercontent.com/rahiakela/building-computer-vision-applications-using-artificial-neural-networks/master/6-deep-learning-in-object-detection/yolov3-pet.cfg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8337 (8.1K) [text/plain]
Saving to: ‘yolov3-pet.cfg.2’


2020-12-01 11:22:47 (85.9 MB/s) - ‘yolov3-pet.cfg.2’ saved [8337/8337]





## Training a YOLOv3 Model

Now let's execute the YOLOv3 training by passing the parameters to the training are the paths to `pet_input.cfg`, `yolov3-pet.cfg`, and the pre-trained darknet model.

In [None]:
%%shell

cd darknet/
./darknet detector train cfg/pet_input.cfg cfg/yolov3-pet.cfg /content/pretrained/darknet53.conv.74

Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.540412, .5R: -nan, .75R: -nan,  count: 0
Region 94 Avg IOU: 0.054834, Class: 0.494438, Obj: 0.664120, No Obj: 0.513917, .5R: 0.000000, .75R: 0.000000,  count: 2
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.469831, .5R: -nan, .75R: -nan,  count: 0
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.538877, .5R: -nan, .75R: -nan,  count: 0
Region 94 Avg IOU: 0.251527, Class: 0.298048, Obj: 0.152079, No Obj: 0.512447, .5R: 0.000000, .75R: 0.000000,  count: 1
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.468161, .5R: -nan, .75R: -nan,  count: 0
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.541067, .5R: -nan, .75R: -nan,  count: 0
Region 94 Avg IOU: 0.313674, Class: 0.000000, Obj: 0.164733, No Obj: 0.512926, .5R: 0.000000, .75R: 0.000000,  count: 1
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.468786, .5R: -nan, .75R: -nan,  count: 0
Region 82 Avg IOU: -nan, Class: -nan,

Notice the last three lines, which are printed at the end when the training is
completely done. It shows the location where the checkpoints, intermediate weights, and final weights are saved.

You should copy the entire directory containing the final model to your private
Google Drive so that you could use the trained model in your applications.

While the training is on, the console prints a lot of information, which is displayed in the web browser. After a while, the web browser becomes unresponsive. Clearing the console output may be a good idea to prevent the browser from getting killed.

## How Long the Training Should Run

Typically the training should run for at least 2,000 iterations per class, but not less than 4,000 iterations in total. In our example with a pet dataset, we have 37 classes. That means we should set max_batches to 74000.

Observe the output while the training is going on, and notice the losses after each iteration. If the loss stabilizes and does not change over batches, we should consider stopping the training. Ideally, the loss should be close to zero. However, for most practical purposes, our goal should be to have losses stabilized below 0.05.

## Final Model

After the network finishes learning, the final YOLOv3 model will be saved in the directory `/content/yolov3_model`. The name of the model file will be `yolov3-pet_final.weights`.

Download this model or save it to your private Google Drive folder, because Google Colab deletes all your files when the session expires. We will use this model in object detection in real time, both in images and in videos.