<a href="https://colab.research.google.com/github/elyorman/YOLOv5-train-on-custom-data/blob/main/Train_yolov5_on_gun_knife_dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# YOLOv5 train on custom data

* Collect dataset from [Open Images](https://storage.googleapis.com/openimages/web/index.html) using [OIDv4_ToolKit](https://github.com/EscVM/OIDv4_ToolKit.git)
* Convert image labels into YOLO format 
* Train YOLOv5 based on the [Official YOLOv5 repository](https://github.com/ultralytics/yolov5)
* Detect with the trained model 

## Dataset collection 

In [1]:
%cd /content/drive/MyDrive/portfolio/yolov5/kaggle_yolov5

/content/drive/MyDrive/portfolio/yolov5/kaggle_yolov5


In [2]:
!git clone https://github.com/EscVM/OIDv4_ToolKit.git 
%cd ./OIDv4_ToolKit
!pip install -r requirements.txt

Cloning into 'OIDv4_ToolKit'...
remote: Enumerating objects: 422, done.[K
remote: Total 422 (delta 0), reused 0 (delta 0), pack-reused 422[K
Receiving objects: 100% (422/422), 34.08 MiB | 18.32 MiB/s, done.
Resolving deltas: 100% (146/146), done.
/content/drive/MyDrive/portfolio/yolov5/kaggle_yolov5/OIDv4_ToolKit
Collecting awscli
  Downloading awscli-1.20.48-py3-none-any.whl (3.7 MB)
[K     |████████████████████████████████| 3.7 MB 5.3 MB/s 
Collecting s3transfer<0.6.0,>=0.5.0
  Downloading s3transfer-0.5.0-py3-none-any.whl (79 kB)
[K     |████████████████████████████████| 79 kB 6.9 MB/s 
Collecting colorama<0.4.4,>=0.2.5
  Downloading colorama-0.4.3-py2.py3-none-any.whl (15 kB)
Collecting docutils<0.16,>=0.10
  Downloading docutils-0.15.2-py3-none-any.whl (547 kB)
[K     |████████████████████████████████| 547 kB 61.9 MB/s 
[?25hCollecting botocore==1.21.48
  Downloading botocore-1.21.48-py3-none-any.whl (7.9 MB)
[K     |████████████████████████████████| 7.9 MB 36.5 MB/s 
[?25

In [1]:
%cd /content/drive/MyDrive/portfolio/yolov5/kaggle_yolov5/OIDv4_ToolKit

/content/drive/MyDrive/portfolio/yolov5/kaggle_yolov5/OIDv4_ToolKit


## Download the images 

In [2]:
#this command will download the images based on the class that we give
!python3 main.py downloader --classes Handgun Shotgun Knife --type_csv train --multiclasses 1 --limit 200

[92m
		   ___   _____  ______            _    _    
		 .'   `.|_   _||_   _ `.         | |  | |   
		/  .-.  \ | |    | | `. \ _   __ | |__| |_  
		| |   | | | |    | |  | |[ \ [  ]|____   _| 
		\  `-'  /_| |_  _| |_.' / \ \/ /     _| |_  
		 `.___.'|_____||______.'   \__/     |_____|
	[0m
[92m
             _____                    _                 _             
            (____ \                  | |               | |            
             _   \ \ ___  _ _ _ ____ | | ___   ____  _ | | ____  ____ 
            | |   | / _ \| | | |  _ \| |/ _ \ / _  |/ || |/ _  )/ ___)
            | |__/ / |_| | | | | | | | | |_| ( ( | ( (_| ( (/ /| |    
            |_____/ \___/ \____|_| |_|_|\___/ \_||_|\____|\____)_|    
                                                          
        [0m
    [INFO] | Downloading ['Handgun', 'Shotgun', 'Knife'] together.[0m
[91m   [ERROR] | Missing the class-descriptions-boxable.csv file.[0m
[94m[DOWNLOAD] | Do you want to download the missing file? [

## Prepare the dataset
Convert the images into YOLO format so we can use them to train on. The conversion code was borrowed from [theAIGuysCode](https://github.com/theAIGuysCode/OIDv4_ToolKit/blob/master/convert_annotations.py)

In [5]:
#source: https://github.com/theAIGuysCode/OIDv4_ToolKit/blob/master/convert_annotations.py 
import os
import cv2
import numpy as np
from tqdm import tqdm
import argparse
import fileinput

# function that turns XMin, YMin, XMax, YMax coordinates to normalized yolo format
def convert(filename_str, coords):
    os.chdir("..")
    image = cv2.imread(filename_str + ".jpg")
    coords[2] -= coords[0]
    coords[3] -= coords[1]
    x_diff = int(coords[2]/2)
    y_diff = int(coords[3]/2)
    coords[0] = coords[0]+x_diff
    coords[1] = coords[1]+y_diff
    coords[0] /= int(image.shape[1])
    coords[1] /= int(image.shape[0])
    coords[2] /= int(image.shape[1])
    coords[3] /= int(image.shape[0])
    os.chdir("Label")
    return coords

ROOT_DIR = os.getcwd()

# create dict to map class names to numbers for yolo
classes = {}
with open("classes.txt", "r") as myFile:
    for num, line in enumerate(myFile, 0):
        line = line.rstrip("\n")
        classes[line] = num
    myFile.close()
# step into dataset directory
os.chdir(os.path.join("OID", "Dataset"))
DIRS = os.listdir(os.getcwd())

# for all train, validation and test folders
for DIR in DIRS:
    if os.path.isdir(DIR):
        os.chdir(DIR)
        print("Currently in subdirectory:", DIR)
        
        CLASS_DIRS = os.listdir(os.getcwd())
        # for all class folders step into directory to change annotations
        for CLASS_DIR in CLASS_DIRS:
            if os.path.isdir(CLASS_DIR):
                os.chdir(CLASS_DIR)
                print("Converting annotations for class: ", CLASS_DIR)
                
                # Step into Label folder where annotations are generated
                os.chdir("Label")

                for filename in tqdm(os.listdir(os.getcwd())):
                    filename_str = str.split(filename, ".")[0]
                    if filename.endswith(".txt"):
                        annotations = []
                        with open(filename) as f:
                            for line in f:
                                for class_type in classes:
                                    line = line.replace(class_type, str(classes.get(class_type)))
                                labels = line.split()
                                coords = np.asarray([float(labels[1]), float(labels[2]), float(labels[3]), float(labels[4])])
                                coords = convert(filename_str, coords)
                                labels[1], labels[2], labels[3], labels[4] = coords[0], coords[1], coords[2], coords[3]
                                newline = str(labels[0]) + " " + str(labels[1]) + " " + str(labels[2]) + " " + str(labels[3]) + " " + str(labels[4])
                                line = line.replace(line, newline)
                                annotations.append(line)
                            f.close()
                        os.chdir("..")
                        with open(filename, "w") as outfile:
                            for line in annotations:
                                outfile.write(line)
                                outfile.write("\n")
                            outfile.close()
                        os.chdir("Label")
                os.chdir("..")
                os.chdir("..")
        os.chdir("..")

Currently in subdirectory: train
Converting annotations for class:  Handgun_Shotgun_Knife


100%|██████████| 588/588 [04:31<00:00,  2.16it/s]


Now we have got the data on what we can train YOLOv5. The next step is cloning YOLOv5 github and moving our data inside the YOLOv5 project.

## Setting up YOLOv5 

In [5]:
# clone yolov5 repo
!git clone https://github.com/ultralytics/yolov5.git


In [7]:
%cd /content/drive/MyDrive/portfolio/yolov5/yolov5_custom/yolov5

/content/drive/MyDrive/portfolio/yolov5/yolov5_custom/yolov5


In [8]:
!pip install -r requirements.txt

Collecting PyYAML>=5.3.1
  Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)
[K     |████████████████████████████████| 636 kB 5.3 MB/s 
Collecting thop
  Downloading thop-0.0.31.post2005241907-py3-none-any.whl (8.7 kB)
Installing collected packages: thop, PyYAML
  Attempting uninstall: PyYAML
    Found existing installation: PyYAML 3.13
    Uninstalling PyYAML-3.13:
      Successfully uninstalled PyYAML-3.13
Successfully installed PyYAML-5.4.1 thop-0.0.31.post2005241907


In [11]:
#make a folder for our custom data in yolov5 
!mkdir -p Dataset/custom_data

In [12]:
#copy and move our images and labels into custom_data folder
!cp /content/drive/MyDrive/portfolio/yolov5/yolov5_custom/OIDv4_ToolKit/OID/Dataset/train/Handgun_Shotgun_Knife/*.txt ./Dataset/custom_data
!cp /content/drive/MyDrive/portfolio/yolov5/yolov5_custom/OIDv4_ToolKit/OID/Dataset/train/Handgun_Shotgun_Knife/*.jpg ./Dataset/custom_data

In [13]:
#split the data by calling autiosplit funtion 
from utils.datasets import *
autosplit('./Dataset/custom_data', weights=(0.8, 0.2, 0.0))

Autosplitting images from Dataset/custom_data


100%|██████████| 588/588 [00:04<00:00, 141.01it/s]


Make custom_data.yaml file and set the data classes and paths 

In [17]:
!echo "path: ./Dataset" >> data/custom_data.yaml
!echo "train: autosplit_train.txt" >> data/custom_data.yaml
!echo "val:   autosplit_train.txt" >> data/custom_data.yaml

!echo "nc : 3" >> data/custom_data.yaml
!echo "names: ['Shotgun', 'Handgun', 'Knife' ]" >> data/custom_data.yaml

!cat data/custom_data.yaml

path: ./Dataset
train: autosplit_train.txt
val:   autosplit_train.txt
nc : 3
names: ['Shotgun', 'Handgun', 'Knife' ]


## Start training YOLOv5

In [18]:
# Train YOLOv5s on custom data for 200 epochs
!python train.py --img 640 --batch 16 --epochs 200 --data custom_data.yaml --weights yolov5s.pt --cache

[34m[1mtrain: [0mweights=yolov5s.pt, cfg=, data=custom_data.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=200, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, entity=None, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias=latest, local_rank=-1, freeze=0, patience=100
[34m[1mgithub: [0mup to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 2021-9-27 torch 1.9.0+cu102 CUDA:0 (Tesla K80, 11441.1875MB)

[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degr