[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sergiodl-naist/yolov4-training-with-oidv6/blob/main/YOLOv4_OIDv6_Training.ipynb)

# Training YOLOv4 with Custom Dataset from Open Images Database v6 (OIDv6)

This Jupyter Notebook is based on [YOLOv4: A step-by-step guide for Custom Data Preparation with Code](https://techylem.com/yolov4-guide-with-code/) with information of [TRAIN A CUSTOM YOLOv4 OBJECT DETECTOR (Using Google Colab)](https://medium.com/analytics-vidhya/train-a-custom-yolov4-object-detector-using-google-colab-61a659d4868).

It also uses the [OIDv6 tool](https://github.com/DmitryRyumin/OIDv6) to download the dataset from Google's [Open Images Dataset](https://storage.googleapis.com/openimages/web/index.html). And its helper scripts are inspired from [OIDv4_ToolKit](https://github.com/ahsan44411/OIDv4_ToolKit)

Details about darknet customization options to make them work better on Google's Colab are available at

* [Darknet FAQ](https://www.ccoderun.ca/programming/darknet_faq/)
* [CFG Parameters in the [net] section](https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-%5Bnet%5D-section)

Information about Yolov4-tiny training and when to stop training at [AlexeyAB/darknet](https://github.com/AlexeyAB/darknet#how-to-train-tiny-yolo-to-detect-your-custom-objects).

## Choose an environment

Please choose the type of the environment this Notebook is being run, whether is run off-line locally or in Google's Colab

In [None]:
GOOGLE_COLAB_ENV = True
MODEL_TO_TRAIN = "yolov4-tiny" # (Only supported options: yolov4 or yolov4-tiny)

from os import path, getcwd
if GOOGLE_COLAB_ENV:
    CONTENT = "/content"
    DATASET = CONTENT + "/multidata"
    SCRIPTS = CONTENT + "/yolov4-training-with-oidv6"
    DARKNET = CONTENT + "/darknet"
else:
    CONTENT = path.realpath(getcwd())
    DATASET = CONTENT + "/multidata"
    SCRIPTS = CONTENT
    DARKNET = CONTENT + "/darknet"

CFG_FILE = ""
PRE_TRAINED_WEIGHTS = ""
PTW_FILENAME = ""
G_DRIVE_MOUNTPOINT = CONTENT
G_DRIVE_DATASETZIP = CONTENT + "/MyDrive" + "/Training/Data/dataset.zip"

if MODEL_TO_TRAIN == "yolov4":
    CFG_FILE = DARKNET + "/cfg/yolov4-custom.cfg"
    PRE_TRAINED_WEIGHTS_URL = "https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.conv.137"
    PTW_FILENAME = "yolov4.conv.137"
elif MODEL_TO_TRAIN == "yolov4-tiny":
    CFG_FILE = DARKNET + "/cfg/yolov4-tiny-custom.cfg"
    PRE_TRAINED_WEIGHTS_URL = "https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29"
    PTW_FILENAME = "yolov4-tiny.conv.29"

## Clone this repository tools

On this repository I created a set of tools to easily customize darkent's YOLOv4 configuration, please clone it first if you haven't already.

In [None]:
# You only need to clone the repository if you are in Google Colab
!git clone https://github.com/sergiodl-naist/yolov4-training-with-oidv6

The scripts will be available at `/content/yolov4-training-with-oidv6` if this notebook is run in Google's Colab.

## Prepare Custom Dataset

### Download your dataset

**This part should be done on your local development machine.  
If you want to use the dataset on Google's Colab, later we will upload it to Google's Drive and then it will be available there by mounting and copying the data**

If you want to globally install in your session oidv6 run

`$ pip3 install --user oidv6`

But **it is advised to better use python environments** through `pipenv`. This will isolate the packages installed to this directory.

If using `pipenv` you only need to run `$ pipenv install -r path/to/requirements.txt` to install all python dependencies. Jupyter Notebooks for local sessions will be automatically installed too. 

Create a `classes.txt` file where each line will be a class you want to download.

You can download the classes names from https://storage.googleapis.com/openimages/web/download.html under section "Annotations and metadata"; row "Metadata" and pressing the button "Class Names".

Think about how many pictures for training, validation and testing you want.
Example: train - 300, validation - 75, test - 10.

Where 300 images for training are the 80\% and 75 images for validation 75 are 20\% of a dataset of 375 per class.

Note that the amount of images per class available on the OID is not the same. Some classes have under 100 images or less.

Download your dataset for training with

`$ oidv6 downloader en --type_data train --classes ./classes.txt --limit 300 --multi_classes`

the dataset for validation with

`$ oidv6 downloader en --type_data validation --classes ./classes.txt --limit 75 --multi_classes`

and the dataset for testing with

`$ oidv6 downloader en --type_data test --classes ./classes.txt --limit 10 --multi_classes`

Note that with `--limit NN` you specify how many images you want for each class of the dataset

`--multi_classes` will put all pictures on one directory ("train", "validation", "test" according to your `--type_data` selected) and all annotations labels inside one directory named `labels` inside your dataset directory. Without this option, each class' pictures will be downloaded into individual directories in which each one of them will have an individual `labels` directory with all annotations label files.

**Some classes might not have any picture in the OID** a red warning will be shown during the download to let you know about these classes without pictures.


### Adapt your Dataset to YOLOv4 format

The annotation files for bounding boxes have this format

`label_name x1 y1 x2 y2`

Which is not the annotation label format compatible with YOLO which is

`label_index box_center_x box_center_y box_width box_heigth`

Where

**label_index**: is the index of the label inside `classes.txt`

**box_center_x**: is the x value* of the center of the bouding box

**box_center_y**: is the y value* of the center of the bouding box

**box_width**: is the width* of the bouding box

**box_heigth**: is the height* of the bouding box

And that such coordinates, width and height are represented by a float number between \[0, 1\] where 0 is the origin and 1 is the max. width or max. height.

Also, the darknet tool will ask for a list of all the file paths of the images for training and validation.

All this is solved with the script `prepare_dataset.py`.

It will also generate the configuration file `objects.txt` needed for darknet.

Run `$ python3 prepare_dataset.py`

Now, inside the OIDv6 directory you will have `multidata` directory with your dataset, a copy of `classes.txt` and the configuration file `objects.txt`. 

Zip the 3 of them up into a `dataset.zip` file, and **upload it to your Google Drive** inside a directory structure like this: **`Training/Data/dataset.zip`**


### Mount Google Drive

Let's mount your Google Drive into Colab's runtime (or unzip your data in the current directory if you are running this notebook off-line locally)

In [None]:
# Only Colab's
from google.colab import drive
drive.mount(G_DRIVE_MOUNTPOINT)

Unzip your dataset into Colab's environment filesystem

In [None]:
!unzip "$G_DRIVE_DATASETZIP" -d "$CONTENT"

## Prepare Darknet Tool

### Download darknet source code

Clone darknet project. This is a framework/tool to train and customize several YOLO versions.

In [None]:
!git clone https://github.com/AlexeyAB/darknet

Modify makefile to work with the GPU and OpenCV

In [None]:
%cd darknet
!sed -i 's/OPENCV=0/OPENCV=1/' Makefile
!sed -i 's/GPU=0/GPU=1/' Makefile
!sed -i 's/CUDNN=0/CUDNN=1/' Makefile
!sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1/' Makefile

Check if CUDA Compiler is installed

In [None]:
!/usr/local/cuda/bin/nvcc --version

Compile darknet

In [None]:
!make

### (Optional) Test darknet

Download pre-trained YOLOv4 weights

In [None]:
!wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights

Defining function to show output images

In [None]:
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
def imShow(path):
  image = cv2.imread(path)
  height, width = image.shape[:2]
  resized_image = cv2.resize(image,(3*width, 3*height), interpolation = cv2.INTER_CUBIC)

  fig = plt.gcf()
  fig.set_size_inches(18, 10)
  
  plt.axis("off")
  plt.imshow(cv2.cvtColor(resized_image, cv2.COLOR_BGR2RGB))
  plt.show()

Run the Object Recognition Model on test image.

In [None]:
!./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/person.jpg
imShow('predictions.jpg')

### Customize Darknet's Configuration

Customize `darknet/cfg/yolov4-tiny-custom.cfg` or `darknet/cfg/yolov4-custom.cfg` into your liking and name it `my-yolov4-tiny.cfg` or `my-yolov4.cfg` respectively.

Or you can run the script `customize_yolov4.py` to automatically adapt them to your current dataset according to your `classes.txt` file.

Changes that this script it will make:

```
[net]
max_batches = (# of Classes * 2000)
steps = (80% of max_batches), (90% of max_batches)

#### Last section of the configuration file
###### Three pairs if yolov4-custom, two pairs if yolov4-tiny-custom

[convolutional]
filters = ( (# of Classes + 5) * 3 )
[yolo]
classes = # of Classes

[convolutional]
filters = ( (# of Classes + 5) * 3 )
[yolo]
classes = # of Classes

[convolutional]
filters = ( (# of Classes + 5) * 3 )
[yolo]
classes = # of Classes
```

Run the customizing tool:

In [None]:
!python3 "$SCRIPTS"/customize_yolov4.py "$CFG_FILE" "$SCRIPTS"/classes.txt

CUSTOM_CFG_FILE = SCRIPTS + "/my-" + MODEL_TO_TRAIN + ".cfg"

Download pre-trained weights

In [None]:
!wget "$PRE_TRAINED_WEIGHTS_URL"
PRE_TRAINED_WEIGHTS = DARKNET + "/" + PTW_FILENAME

Or if you want to use different weights or resume a past training session, set them here

In [None]:
PRE_TRAINED_WEIGHTS = "/drive/MyDrive/Training/backup/my-yolov4-tiny_last.weights"

Begin training

In [None]:
# %%capture
!./darknet detector train \
  "$CONTENT"/objects.txt \
  "$CUSTOM_CFG_FILE" \
  "$PRE_TRAINED_WEIGHTS" \
  -dont_show \
  -map