# Training a new model on YOLOv5

YOLOv5 already provides some pre-trained models to be used for object detection. Nonetheless, you can also create train a new model to identify some labels based on your necessity.

In our case, as we intend to identify some labels that are not covered by those pre-trained models, a new model is essential to achieve our goal.

This tutorial was inspired by the article [How to Create an End to End Object Detector using Yolov5?](https://towardsdatascience.com/how-to-create-an-end-to-end-object-detector-using-yolov5-35fbb1a02810). Some code in this notebook is based on the content from that article.

## Setup Environment

### Installing PyTorch

Under the hood, YOLOv5 uses PyTorch as its framework. To use PyTorch in its maximum perfomance (using the GPU for training and prediction), we need to follow some steps when installing it.

> This method should work on Nvidia GPUs. If you have a different GPU or want to use the CPU, visit [PyTorch Download](https://pytorch.org/get-started/locally/) page for instructions.

First, you need to install the latest driver for the Nvidia GPU and CUDA Driver 11.x.
After that, I recommend installing the cuDNN driver for CUDA 11.x.

In your Python environment (I recommend creating a new venv for this project) run
the following command:

```sh
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
```

If everything goes right, PyTorch should be installed and we can follow up with YOLOv5 installation.


### Installing YOLOv5

Now that PyTorch is installed, we're ready to setup YOLOv5. For that you'll need to download YOLOv5 repository and install its dependencies.

```sh
git clone https://github.com/ultralytics/yolov5.git
cd yolov5
pip3 install -r requirements.txt
```

## Handling Dataset

### Image annotation

When creating a new dataset for our model, we need to add some images with labels indicating what is in the image and the location of those objects. For that we need an annotation tool to make our job easier.

In this project, we used [**labelImg**](https://github.com/tzutalin/labelImg), as it is free and is licensed under the MIT License. All images should be added to a single folder and that folder should be opened on **labelImg**. After you create annotations, a new `<filename>.txt` is going to be created for each image.

### Organize dataset into training data and validation data

To train the model, YOLOv5 needs a dataset to be organized into two sections:
- Training data (used to train the model and estimate new weights)
- Validation data (used to test the accuracy of the model)

Now that we have a folder containing all the images and annotations we want to use for training, the script below can be used to randomly organize the data into those two sets.

In [None]:
## Inside YOLOv5 directory

import glob, os
import random

# put your own path here
dataset_path = 'dataset'

# Percentage of images to be used for the validation set
percentage_test = 20

# Populate the folders
%mkdir training
%cd training
%mkdir data
%mkdir data/images
%mkdir data/labels
%mkdir data/images/train
%mkdir data/images/valid
%mkdir data/labels/train
%mkdir data/labels/valid

p = percentage_test/100

for pathAndFilename in glob.iglob(os.path.join(f"training\{dataset_path}", "*.png")):  
    title, ext = os.path.splitext(os.path.basename(pathAndFilename))
    if random.random() <= p :
        print(f"cp training/{dataset_path}/{title}.png training/data/images/valid")
        print(f"cp training/{dataset_path}/{title}.txt training/data/labels/valid")
        os.system(f"cp training/{dataset_path}/{title}.png training/data/images/valid")
        os.system(f"cp training/{dataset_path}/{title}.txt training/data/labels/valid")
    else:
        print(f"cp training/{dataset_path}/{title}.png training/data/images/train")
        print(f"cp training/{dataset_path}/{title}.txt training/data/labels/train")
        os.system(f"cp training/{dataset_path}/{title}.png training/data/images/train")
        os.system(f"cp training/{dataset_path}/{title}.txt training/data/labels/train")


## Train Model

You should take a look on [description](https://pytorch.org/hub/ultralytics_yolov5/) for different models to be used.

For the model, copy the file for the model you want (e.g. `models/yolov5s.yaml` -> `training/yolov5s.yaml`). Don't forget to change `nc: 3` in that file to the number of classes the model is training to identify.

You should also create a new file `training/dataset.yaml` containing:

```yaml
# path to train and validation datasets
train: training/data/images/train/
val: training/data/images/valid/

# number of classes
nc: 3

# class names
names: ['cuphead', 'player-bullet', 'boss-carrot''boss-radish']
```

To run the training:

```sh
python train.py --img 640 --batch 16 --epochs 300 --data training/dataset.yaml --cfg training/yolov5s.yaml --weights " " --workers 2
```