##Training YOLO on a custom dataset.

We'll use the official YOLO-v4 implementation to identify the location of buses and trucks in images.



##1.Installing Darknet

First pull the `darknet` repository from GitHub and compile it in the environment. The model is written in a separate language, which is different from PyTorch.

In [19]:
# import os
# if not os.path.exists('darknet'):
#   # 1.Pull the Git repo
#   !git clone https://github.com/AlexeyAB/darknet
#   %cd darknet

#   # 2. Reconfigure the Makefile
#   !sed -i 's/OPENCV=0/OPENCV=1' Makefile

#   # !!!Incase you don't have a GPU, make sure to comment out the below 3 lines !!!#
#   # !sed -i 's/GPU=0/GPU=1' Makefile
#   # !sed -i 's/CUDNN=0/CUDNN=1' Makefile
#   # !sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1' Makefile

#   # 3. Compile the darknet source code
#   !make

#   # 4. Install the torch_snippets package
#   %pip install -q torch_snippets

#   # 5. Download and extract the dataset, and remove the ZIP file to save space
#   !wget --quiet https://www.dropbox.com/agmzwk95v96ihic/open-images-bus-trucks.tar.xz
#   !tar -xf open-images-bus-trucks.tar.xz

#   # 6. Fetch the pretrained weights to make a sample prediction
#   !wget --quiet https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights


In [20]:
import os
if not os.path.exists('darknet'):
  !git clone https://github.com/AlexeyAB/darknet
  %cd darknet
  !sed -i 's/OPENCV=0/OPENCV=1/' Makefile
  # !!! In case you dont have a GPU, make sure to comment out the below 3 lines !!! #
  !sed -i 's/GPU=0/GPU=1/' Makefile
  !sed -i 's/CUDNN=0/CUDNN=1/' Makefile
  !sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1/' Makefile
  !make
  %pip install -q torch_snippets
  !wget --quiet https://www.dropbox.com/s/agmzwk95v96ihic/open-images-bus-trucks.tar.xz
  !tar -xf open-images-bus-trucks.tar.xz
  !rm open-images-bus-trucks.tar.xz
  !wget --quiet https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights

In [21]:
# 7. Test whether the installation is successful by running the following commad
!./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/person.jpg

 CUDA-version: 12020 (12020), cuDNN: 8.9.6, CUDNN_HALF=1, GPU count: 1  
 CUDNN_HALF=1 
 OpenCV version: 4.5.4
 0 : compute_capability = 750, cudnn_half = 1, GPU: Tesla T4 
net.optimized_memory = 0 
mini_batch = 1, batch = 8, time_steps = 1, train = 0 
   layer   filters  size/strd(dil)      input                output
   0 Create CUDA-stream - 0 
 Create cudnn-handle 0 
conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   3 route  1 		                           ->  304 x 304 x  64 
   4 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   5 conv     32       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  32 0.379 BF
   6 conv     64       3 x 3/ 1    304 x 304 x  32 ->  304 x 304 x  64 3.407 BF
   7 Shortcut Layer: 4,  wt = 0, wn = 0, outputs: 304 x 304 x  64 0.006 BF
   8 conv   

##2.Setting up the dataset format

YOLO uses a fixed format for training. Once we store the images and labels in the required format, we can train on the dataset with a single command. So, let's learn about the files and folder structure needed for YOLO to train.

There are three important steps:

###2.1 Step 1

Create a text file at `data/obj.names` containing the names of classes, one class per line, by running the following line (`%%writefile` is a magic command that creates a textfile at `data/obj.names` with whatever content is present in the notebook cell)

In [22]:
%%writefile data/obj.names
bus
trucks

Overwriting data/obj.names


###2.2 Step 2

Create a text file at `data/obj.data` describing the parameters in the dataset and the locations of the text files containing train and test image paths and location of the file containing object names and the folder where you want to save trained models.

In [23]:
%%writefile data/obj.data
classes = 2
train = data/train.txt
valid = data/val.txt
names = data/obj.names
backup = backup/

Overwriting data/obj.data


###2.3 Step 3

Move all images and ground truth text files to the `data/obj` folser. We will copy images from the `bus-trucks` dataset to this folder along with labels.

In [24]:
# Check the directory structure
!ls -R open-images-bus-trucks

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
24aa737199e24579.jpg  66d8f759e1be2c7f.jpg  a98ca2ce4b63fc7f.jpg  ed3cb8048829a578.jpg
24b2059b792ee2cd.jpg  66da6fb135055e1d.jpg  a98e5109cc06a2db.jpg  ed3cd743fef9acc0.jpg
24b368990c14441a.jpg  66dc119f39718479.jpg  a99ad9cd9d49af8c.jpg  ed4102556a6ec2b6.jpg
24b4857ee3b13b9f.jpg  66de56e0b1cf6b80.jpg  a99d97524c1526b3.jpg  ed4cd8488a64f7dd.jpg
24b75c0ff5281493.jpg  66e7580e0884b39e.jpg  a9a46979bab4a968.jpg  ed4fdacc15e446ca.jpg
24b961e9ecb7b64d.jpg  66e7d5eeeea529b6.jpg  a9ad0d08b557bf50.jpg  ed50118d3669d005.jpg
24c0f633d7be7bed.jpg  66f6e6793898737c.jpg  a9b257f8365cd2e7.jpg  ed580415426f81b5.jpg
24c1e2e2dbd25394.jpg  66faac0779c9bbf4.jpg  a9b2c91225c14ab6.jpg  ed5d834f596f4e5e.jpg
24c3fafffaa55b26.jpg  670079e2f613c3bb.jpg  a9b72f64baa3866d.jpg  ed5dbd8b5f3c23ca.jpg
24c498ce015ae930.jpg  6703fde0c7ae8596.jpg  a9b92c90ed5b622d.jpg  ed62ade9944c9229.jpg
24c6002d892671f5.jpg  670920575be7e7b0.jpg  a9baad8ad1099c13.jpg 

In [25]:
!mkdir -p data/obj
!cp -r open-images-bus-trucks/images/* data/obj/
!cp -r open-images-bus-trucks/yolo_labels/all/{train,val}.txt data/
!cp -r open-images-bus-trucks/yolo_labels/all/labels/*.txt data/obj/

Note that all the training and validation images are in the same `data/obj` folder. We also move a bunch of text files to the same folder.Each file that contains the ground truth for an image shares the same name as the image.

##3.Configuring the architecture



In [26]:
# create a copy of the existing configuration and modify it in place
!cp cfg/yolov4-tiny-custom.cfg cfg/yolov4-tiny-bus-trucks.cfg

# max_batches to 4000 (since the dataset is small enough)
!sed -i 's/max_batches = 500200/max_batches=4000/' cfg/yolov4-tiny-bus-trucks.cfg

# number of sub-batches per batch
!sed -i 's/subdivisions=1/subdivisions=16/' cfg/yolov4-tiny-bus-trucks.cfg

# number of batches after which learning rate is decayed
!sed -i 's/steps=400000,450000/steps=3200,3600/' cfg/yolov4-tiny-bus-trucks.cfg

# number of classes is 2 as opposed to 80 (which is the number of COCO classes)
!sed -i 's/classes=80/classes=2/g' cfg/yolov4-tiny-bus-trucks.cfg

#in the classification and regression heads, change number of output convolution filters
#from 255 -> 21 and 57 -> 33, since we have fewer classes we don't need as many filters
!sed -i 's/filters=255/filters=21/g' cfg/yolov4-tiny-bus-trucks.cfg
!sed -i 's/filters=57/filters=33/g' cfg/yolov4-tiny-bus-trucks.cfg

##4.Training and testing the model

We will get the weights from the following Github location and store them in `build/darknet/x64`.

In [27]:
!wget --quiet https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29
!cp yolov4-tiny.conv.29 build/darknet/x64/

Finally, we will train the model using the following line.

In [28]:
!./darknet detector train data/obj.data cfg/yolov4-tiny-bus-trucks.cfg yolov4-tiny.conv.29 -dont_show -mapLastAt


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
 total_bbox = 312387, rewritten_bbox = 0.274019 % 
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 30 Avg (IOU: 0.859175), count: 4, class_loss = 0.447845, iou_loss = 0.045148, total_loss = 0.492993 
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 37 Avg (IOU: 0.808794), count: 1, class_loss = 0.368147, iou_loss = 1.123369, total_loss = 1.491516 
 total_bbox = 312392, rewritten_bbox = 0.274015 % 
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 30 Avg (IOU: 0.600583), count: 2, class_loss = 0.404588, iou_loss = 0.085781, total_loss = 0.490370 
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 37 Avg (IOU: 0.633851), count: 2, class_loss = 0.797693, iou_loss = 1.492771, total_loss = 2.290464 
 total_bbox = 312396, rewritten_bbox = 0.274011 % 
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 30 Avg (IOU: 0.832717), count: 4, class_los

The `-dont_show` flag skips showing intermediate prediction images and `-mapLastAt` will periodically print the mean average precision on the validation data. The whole of the training might take 1 or 2 hours. The weights are periodically stored in a backup folder and can be used after training for predictions such as the following code, which makes predictions on a new image.

In [30]:
from torch_snippets import Glob, stem, show, read
# upload your own images to a folder
image_paths = [str(f) for f in Glob('images-of-trucks-and-busses')]
for f in image_paths:
  !./darknet detector test data/obj.data cfg/yolov4-tiny-bus-trucks.cfg\
  backup/yolov4-tiny-bus-trucks_4000.weights {f}
  !mv predictions.jpg {stem(f)}_pred.jpg

for i in Glob('*_pred.jpg', silent=True):
  show(read(i, 1), sz=20)