# Draw, I'll Help

This notebook contains the training code for YOLOv5 Nano for the task of shape correction in a web-based drawing app.

This notebook is heavily based on the notebook from [Ultralytics's tutorial](https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data).

## Step 0. Setup

YOLOv5 is used for training, so let's clone the source and install its dependencies.

In [None]:
!git clone https://github.com/ultralytics/yolov5
%pip install -r yolov5/requirements.txt --quiet

## Step 1. Preparing the dataset

At this step, you will need a zipped archive with the dataset in YOLOv5 format. You may either use the existing `dataset.zip` file from [Releases](https://github.com/illright/draw-ill-help/releases) or create your own dataset by uploading your annotations to Roboflow and exporting in YOLOv5 format.

In [None]:
from pathlib import Path

image_size = 416  # pixels (square images assumed)
dataset_location = Path('./dataset').resolve()
archive_location = Path('./dataset.zip').resolve()

If you're running this in Google Colab, run the cell below to upload the file directly. Otherwise, make sure that the dataset archive is present in the current directory under the name `dataset.zip` (unless you have changed the name in the cell above).

In [None]:
from google.colab import files

archive = next(iter(files.upload().values()))

with open(archive_location, 'wb') as file:
    file.write(archive)

In [None]:
import os
import shutil

os.makedirs(dataset_location)
shutil.unpack_archive(archive_location, dataset_location)

# Fix the faulty paths to the data files
with open(dataset_location / 'data.yaml', 'w') as data:
    print(f'train: {dataset_location / "train" / "images"}', file=data)
    print(f'val: {dataset_location / "valid" / "images"}', file=data)
    print('nc: 2', file=data)
    print("names: ['Circle', 'Rectangle']", file=data)

## Step 2. Training

YOLOv5 Nano, pretrained on the COCO dataset, is chosen as the starting point because it yields the smallest and fastest model.

The training is configured for 3000 epochs, but it is expected that the model will stop training much earlier than that due to the default patience value of early stopping being 100 epochs.

In [None]:
%rm -rf yolov5/runs

In [None]:
!python yolov5/train.py --img {image_size} --batch-size 256 --epochs 3000 --data {dataset_location}/data.yaml --weights yolov5n.pt --cache

## Step 3. Conversion to Tensorflow.js

We use the built-in converter to export the model into the format that's suitable for Tensorflow.js.

The peculiar Bash snippet `$(ls ./runs/train | sort | tail -1)` simply extracts the directory of the latest train run. YOLOv5's training script produces directories `exp/`, `exp2/`, `exp3/`... in a successive fashion with every run.

We specify top-K to be 1 because the web app only ever submits a single drawing to the model for prediction so there is no point in trying to find any more objects.

In [None]:
top_k = 1
!python yolov5/export.py --weights "yolov5/runs/train/$(ls yolov5/runs/train | sort | tail -1)/weights/best.pt" --img {image_size} --include tfjs  --topk-all {top_k} --topk-per-class {top_k}

## Step 4. Performance evaluation

### Tensorboard

First, let's plot some charts with Tensorboard from the logfiles of YOLOv5. 

The logs are written to `yolov5/runs/` for each training and inference run separately.

In [None]:
%load_ext tensorboard
%tensorboard --logdir yolov5/runs

### Testing out the model's inference

Now, let's run the inference on the test set of images to assess the performance of the trained model.

In [None]:
!python yolov5/detect.py --weights yolov5/runs/train/$(ls yolov5/runs/train | sort | tail -1)/weights/best.pt --img {image_size} --conf 0.75 --max-det 1 --source {dataset_location}/test/images

In [None]:
import glob
from IPython.display import Image, display
from pathlib import Path

last_detect_run = sorted(Path('yolov5/runs/detect').glob('*'))[-1]

for image_name in list(last_detect_run.glob('*.jpg')):
    display(Image(filename=str(image_name.absolute())))
    print("\n")