# Enemy detection model - YOLOv5

This notebook will be used to test alternative ways of training a convolutional neural network (CNN) model using [YOLOv5](https://github.com/ultralytics/yolov5).

When doing a regular training process, where all the images used by the model had been labeled beforehand, the labeling stage proved to be the longest, most tiring and time-consuming one.

To make the labeling stage shorter and more flexible, the steps presented by this notebook will show how to do two alternative ways of training instead of a regular one; that is, numerous models will be trained, each one using images labeled automatically based on the inference of its previous model.

Two types of training will be tested:
- **incremental training:** each subsequent model will be trained starting from where its previous model stopped (previous `best.pt` model will be used as starting weight for each new model training);
- **training from zero:** similar to the incremental training, but each model will starting to train from zero instead of using its previous `best.pt` model.

If any of these alternative trainings show good results, that means the model can be trained with a smaller manually labeled dataset, and all new images can be automatically labeled by the model. Different image sources could be used to make the model increasingly better in a way faster rate.

The models generated by the YOLO algorithm will serve the purpose of detecting enemies in real-time in the first map of the game [Dusk's](https://store.steampowered.com/app/519860/DUSK/) Endless mode (The Farm). The models should be able to detect the five different types of enemies present on the map.

To keep up with Dusk's frantic pace, a model that is both accurate and fast enough is needed. Therefore, YOLOv5 was chosen for being a good compromise between accuracy and speed compared to other YOLO versions, at least on normal GPU systems.

## 1. First steps

### 1.1 Google Colab

This subsection will be helpful if you plan on running this notebook on Google Colab. Otherwise, skip to subsection 1.2.

First, upload this notebook under the desired directory of your Google Drive account.

After doing that, execute the following cell to connect your Drive account to Colab:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

After that, insert the path where this notebook was saved in your Drive account:

In [None]:
import os

NOTEBOOK_PATH = 'dusk-aimbot/model training' # CAN BE CHANGED
GOOGLE_DRIVE_PATH = os.path.join('drive', 'My Drive', NOTEBOOK_PATH)

%cd ./$GOOGLE_DRIVE_PATH

### 1.2 Clone YOLOv5 repository and import libraries

To allow the model to automatically label new images, a few files from the `yolov5` folder have been altered (explanation on section 3.2).

As a result, the `yolov5` is already present and can be accessed using the command below:

In [None]:
%cd yolov5

Install the packages required by YOLOv5 with the following command:

In [None]:
%pip install -qr requirements.txt

Don't forget to import all the needed libraries:

In [None]:
import torch
from yolov5 import utils
import os
import yaml
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import glob
import random
import shutil

from IPython import display
from IPython.display import clear_output
from pathlib import Path
from sklearn.model_selection import train_test_split


%matplotlib inline

random.seed(108)

## 2. Data handling

Initially, it's important to have an overall idea of how you plan to train your models.

For this notebook, we will train our models from 256 to 2048 images, with a sequence of models of 256 - 384 - 512 - 768 - 1024 - 1536 - 2048 images each.

For each model except the last one, our dataset will be splitted in 80% of training data and 20% of validation data. For the last model (2048 images), the train, validation, and test sets will contain 80%, 10%, and 10% of the data, respectively.

The first model will use the manually labeled data as its training set. The validation set of all models (and the test set of the last model) will use data that needs to be manually labeled as well.

### 2.1 Organize Directories

[According to the YOLOv5 wiki](https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data), it is recommended to organize the data inside a `/datasets` directory next to the `/yolov5` directory, as represented below:

![data_directory_structure](https://miro.medium.com/max/1400/1*J2UTo9Z2hJCeaTwB1d_aNw.png)

The following function will generate data subdirectories splitted into train, val, and test inside `/datasets`:

In [None]:
def create_data_directories(data_name):
    Path(f"../datasets/{data_name}/images/train").mkdir(parents=True, exist_ok=True)
    Path(f"../datasets/{data_name}/images/valid").mkdir(parents=True, exist_ok=True)
    Path(f"../datasets/{data_name}/images/test").mkdir(parents=True, exist_ok=True)
    Path(f"../datasets/{data_name}/labels/train").mkdir(parents=True, exist_ok=True)
    Path(f"../datasets/{data_name}/labels/valid").mkdir(parents=True, exist_ok=True)
    Path(f"../datasets/{data_name}/labels/test").mkdir(parents=True, exist_ok=True)

In [None]:
# Creating 'datasets' directory for each type of training
create_data_directories('dusk_enemies/incremental')
create_data_directories('dusk_enemies/from_zero')

### 2.2 Organize data

Our data directories were generated, but we still need to split our data and move it there.
Thus, create a `/image_data` directory next to the `/datasets` and `/yolov5` directories.

Inside this directory, create: 
1. `/background_images` for images without YOLO labels;
2. `/labeled_images` for images with YOLO labels related to them;
3. `/labels` for YOLO labels (.txt files) related to those images in `/labeled_images`.

as shown below:
```bash
.
├── yolov5/
├── datasets/
└── image_data/
    ├── background_images/
    ├── labeled_images/
    └── labels/
```

Background images are separated from labeled images because they are split in a different manner.

However, if you have all your images in a single folder, put all of them inside `/labeled_images` and run the following cell to create `/background_images` automatically and move your background images inside it:

In [None]:
IMG_PATH = "../image_data/labeled_images"
LABEL_PATH = "../image_data/labels"
BACKGROUND_PATH = "../image_data/background_images"

# Creates a directory for background images if it doesn't exist
Path(BACKGROUND_PATH).mkdir(parents=True, exist_ok=True)

bg_count = 0
for image in glob.iglob(f"{IMG_PATH}/*.jpeg"):

    # Gets image name
    name = image.split(IMG_PATH)[1].split(".jpeg")[0]
    
    # Checks if there is a correspondent txt label file related to the image
    label_exists = os.path.isfile(f"{LABEL_PATH}/{name}.txt")
    
    # If the label doesn't exist, the image is a background image (no labels related to it)
    if not label_exists:
        try: # Move background image to an appropriate directory
            bg_count += 1
            shutil.move(image, BACKGROUND_PATH)
        except:
            print(f)
            assert False

print(f"\n{bg_count} background images moved from \"{IMG_PATH}\" to \"{BACKGROUND_PATH}\".\n")

### 2.3 Split data

After generating data directories, our data should be split and moved to their respective directories.

The first model will have 80% of training data and 20% of validation data.

All subsequent models won't use the training set as shown below, only the validation set matters (and the test set for the last model).

To determine which model the validation set refers to, change the value of `MODEL_IMGS` presented below to the count of total images of the current model:

In [None]:
## Code adapted from https://blog.paperspace.com/train-yolov5-custom-data/

IMG_PATH = "../image_data/labeled_images"
LABEL_PATH = "../image_data/labels"

MODEL_IMGS = 256

# Read images and labels
images = [os.path.join(IMG_PATH, x) for x in os.listdir(IMG_PATH)]
labels = [os.path.join(LABEL_PATH, x) for x in os.listdir(LABEL_PATH) if x[-3:] == "txt"]

images.sort()
labels.sort()

# Split the dataset into train / valid / test splits 
train_images, val_images, train_labels, val_labels = train_test_split(images, labels, train_size = 0.5, random_state = 1)

# Our first model has 0.8 of training data. 10% of that should be background images, though.
TRAIN_RATE = MODEL_IMGS * 0.8
train_images = train_images[:(int(TRAIN_RATE) - int(TRAIN_RATE * 0.1))]
train_labels = train_labels[:(int(TRAIN_RATE) - int(TRAIN_RATE * 0.1))]

# Our first model has 0.2 of validation data. 10% of that should be background images, though.
# This is valid for all models.
VALID_RATE = MODEL_IMGS * 0.2
val_images = val_images[:(int(VALID_RATE) - int(VALID_RATE * 0.1))]
val_labels = val_labels[:(int(VALID_RATE) - int(VALID_RATE * 0.1))]

In [None]:
## Code adapted from https://blog.paperspace.com/train-yolov5-custom-data/

# Utility function to move images to their respective directories
def move_files_to_directories(list_of_files, *destination_dir):
    for f in list_of_files:
        for directory in destination_dir:
            try:
                shutil.copy(f, directory)
            except:
                print(f)
                assert False
        os.remove(f)

In [None]:
ZERO_IMG_PATH = '../datasets/dusk_enemies/from_zero/images'
ZERO_LABEL_PATH = '../datasets/dusk_enemies/from_zero/labels'

INCR_IMG_PATH = '../datasets/dusk_enemies/incremental/images'
INCR_LABEL_PATH = '../datasets/dusk_enemies/incremental/labels'

# Move the splits into their folders

# Should be used solely for the first models
#move_files_to_directories(train_images, f'{ZERO_IMG_PATH}/train/', f'{INCR_IMG_PATH}/train/')
#move_files_to_directories(train_labels, f'{ZERO_LABEL_PATH}/train/', f'{INCR_LABEL_PATH}/train/')

# For each model, creates a validation partition in a folder named 'valid_{MODEL_IMGS}'
move_files_to_directories(val_images, f'{ZERO_IMG_PATH}/valid_{MODEL_IMGS}/', f'{INCR_IMG_PATH}/valid_{MODEL_IMGS}/')
move_files_to_directories(val_labels, f'{ZERO_LABEL_PATH}/valid_{MODEL_IMGS}/', f'{INCR_LABEL_PATH}/valid_{MODEL_IMGS}/')

This time, we will use slicing to split background images:

In [None]:
BG_IMG_PATH = "../image_data/background_images"

MODEL_IMGS = 384

# Read background images
bg_images = [os.path.join(BG_IMG_PATH, x) for x in os.listdir(BG_IMG_PATH)]

random.shuffle(bg_images) # Shuffle background images' list to guarantee randomness

bg_first_half = int(0.5 * len(bg_images))
bg_second_half = int(0.5 * len(bg_images))

TRAIN_RATE = int(MODEL_IMGS * 0.8 * 0.1)
train_bg_images = bg_images[:TRAIN_RATE]

VALID_RATE = int(MODEL_IMGS * 0.2 * 0.1)
val_bg_images = bg_images[TRAIN_RATE : (TRAIN_RATE + VALID_RATE)]

In [None]:
ZERO_IMG_PATH = '../datasets/dusk_enemies/from_zero/images'
INCR_IMG_PATH = '../datasets/dusk_enemies/incremental/images'

# Move the splits into their folders
move_files_to_directories(train_bg_images, f'{ZERO_IMG_PATH}/train/', f'{INCR_IMG_PATH}/train/')
move_files_to_directories(val_bg_images, f'{ZERO_IMG_PATH}/valid/', f'{INCR_IMG_PATH}/valid/')

### 2.4 Create dataset.yaml

A dataset config file (YAML file) should also be created. It defines:

1. the dataset root directory `path` and relative paths to `train` / `val` / `test` image directories;
2. the number of classes `nc` that you want to detect;
3. and the names corresponding to those classes, represented by `names`.

YAML files are commonly created inside `/yolov5/data`. We will create two YAML files, one for each type of training:

In [None]:
!cat data/dusk_enemies_incremental.yaml

In [None]:
!cat data/dusk_enemies_from_zero.yaml

## 3. Training

### 3.1 Initial models

The YOLOv5s model will be used to start training from pretrained weights. The training parameters may be changed as desired.

Both our training procedures will start from the 256 images dataset in the exact same way:

In [None]:
# Incremental training (256 images)
!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_incremental.yaml' --weights 'yolov5s.pt'\
    --project 'dusk_incremental' --name 'train_256_'

clear_output()

In [None]:
# Training from zero (256 images)
!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_from_zero.yaml' --weights 'yolov5s.pt'\
    --project 'dusk_from_zero' --name 'train_256_'

clear_output()

If the training stops for whatever reason, run `train.py` with the `--resume` flag to resume it:

In [None]:
!python train.py --resume

clear_output()

### 3.2 Subsequent models

Subsequent models will need more training data to improve. To automate the labeling process for new images, the script `model_labeling.py` has been created and can be executed in the cell below.

The `-i` flag (or `--images`) refers to the number of total images used by the current model. The script will calculate automatically how many images will be labeled and moved from their source directory based on that.

The `-t` flag (or `--training`) is a boolen value that refers to the type of model that will label the images.
If `True`, then the training is incremental; Otherwise, the training starts from zero.

Both flags are mandatory.

In [None]:
!python "../auxiliar_scripts/model_labeling.py" -i 384 -t True

In that script, the model's confidence threshold is set to 0.5.         
          
If an image does not surpass the threshold minimum value, it is considered a background image and sent to the `images/train` subset.

If it does, the normalized entropy of the confidence values of all classes of that image is calculated for each prediction made (in this project, the `yolov5` has been modified to return the confidence values for all classes instead of only for the predicted class).

If the result is greater than the entropy threshold decided (in this case, 0.4), the image is simply ignored.

If not, the image is moved to `images/train` and a label annotation is created for each bounding box predicted. Finally, the correspondent label file is moved to `labels/train`.

The script also follows the YOLOv5 guidelines by making sure that no more than 10% of the analyzed and moved images are background images.

For each new model that will be trained, it might be mandatory to change the validation path inside its respective YAML file to its own validation set path (`valid_{MODEL_IMGS}`).

#### 3.2.1 Incremental models

In [None]:
# Incremental training (384 images)
PATH_256_MODEL = ""

!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_incremental.yaml' --weights PATH_256_MODEL\
    --project 'dusk_incremental' --name 'train_384_'

clear_output()

In [None]:
# Incremental training (512 images)
PATH_384_MODEL = ""

!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_incremental.yaml' --weights PATH_384_MODEL\
    --project 'dusk_incremental' --name 'train_512_'

clear_output()

In [None]:
# Incremental training (768 images)
PATH_512_MODEL = ""

!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_incremental.yaml' --weights PATH_512_MODEL\
    --project 'dusk_incremental' --name 'train_768_'

clear_output()

In [None]:
# Incremental training (1024 images)
PATH_768_MODEL = ""

!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_incremental.yaml' --weights PATH_768_MODEL\
    --project 'dusk_incremental' --name 'train_1024_'

clear_output()

In [None]:
# Incremental training (1536 images)
PATH_1024_MODEL = ""

!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_incremental.yaml' --weights PATH_1024_MODEL\
    --project 'dusk_incremental' --name 'train_1536_'

clear_output()

In [None]:
# Incremental training (2048 images)
PATH_1536_MODEL = ""

!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_incremental.yaml' --weights PATH_1536_MODEL\
    --project 'dusk_incremental' --name 'train_2048_'

clear_output()

#### 3.2.2 Models from zero

In [None]:
# Training from zero (384 images)
!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_from_zero.yaml' --weights 'yolov5s.pt'\
    --project 'dusk_from_zero' --name 'train_384_'

clear_output()

In [None]:
# Training from zero (512 images)
!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_from_zero.yaml' --weights 'yolov5s.pt'\
    --project 'dusk_from_zero' --name 'train_512_'

clear_output()

In [None]:
# Training from zero (768 images)
!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_from_zero.yaml' --weights 'yolov5s.pt'\
    --project 'dusk_from_zero' --name 'train_768_'

clear_output()

In [None]:
# Training from zero (1024 images)
!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_from_zero.yaml' --weights 'yolov5s.pt'\
    --project 'dusk_from_zero' --name 'train_1024_'

clear_output()

In [None]:
# Training from zero (1536 images)
!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_from_zero.yaml' --weights 'yolov5s.pt'\
    --project 'dusk_from_zero' --name 'train_1536_'

clear_output()

In [None]:
# Training from zero (2048 images)
!python train.py --img 1280 --batch 16 --epochs 100\
    --data 'data/dusk_enemies_from_zero.yaml' --weights 'yolov5s.pt'\
    --project 'dusk_from_zero' --name 'train_2048_'

clear_output()

### 3.3 Plotting results

If Google Colab is used, the `plot_results` function can be used to plot the training results in CSV to a PNG image.

This can be done for each trained model:

In [None]:
from utils.plots import plot_results

# CSV to PNG
CSV_RESULTS_PATH = ""
plot_results(CSV_RESULTS_PATH)

# Shows the resulting plots
PNG_RESULTS_PATH = ""
display.Image(PNG_RESULTS_PATH, width=1000)

## 4. Validation

The validation script will be used to evaluate the trained models. The `--task` flag controls which dataset partition will be used on the validation process.

Below, the model performance is evaluated over the test partition for both models:

In [None]:
PATH_LAST_INCR_MODEL = ""
!python val.py --weights PATH_LAST_INCR_MODEL --batch 32\
    --data 'data/dusk_enemies_incremental.yaml' --task test\
    --project 'dusk_incremental' --name 'validation_on_test_data'

clear_output()

In [None]:
PATH_LAST_ZERO_MODEL = ""
!python val.py --weights PATH_LAST_ZERO_MODEL --batch 32\
    --data 'data/dusk_enemies_from_zero.yaml' --task test\
    --project 'dusk_from_zero' --name 'validation_on_test_data'

clear_output()