<img src="https://raw.githubusercontent.com/maxsitt/insect-detect-docs/main/docs/assets/logo.png" width="500">

# YOLOv5 classification model training + ONNX export

Author: &nbsp; Maximilian Sittinger &nbsp;
[<img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="24">](https://github.com/maxsitt) &nbsp;
[<img src="https://upload.wikimedia.org/wikipedia/commons/0/06/ORCID_iD.svg" width="24">](https://orcid.org/0000-0002-4096-8556)

- 📑 [**Insect Detect Docs**](https://maxsitt.github.io/insect-detect-docs/)
- [`insect-detect-ml` GitHub repo](https://github.com/maxsitt/insect-detect-ml)

&nbsp;

**Train a [YOLOv5](https://github.com/ultralytics/yolov5) image classification model on your own custom training data!**

- Using dataset import from [Roboflow](https://roboflow.com/) is recommended, but is not required.
> Choose option *Upload dataset from Google Drive/local file system* instead (slower!).
- Connecting to Google Drive is recommended, but is not required.
> Choose options *Upload dataset from your local file system* and *Download* instead of *Export to Google Drive* (slower!).
- Go to **File** in the top menu bar and choose **Save a copy in Drive** before running the notebook.
- Go to **Runtime** and make sure that **GPU** is selected as Hardware accelerator under **Change runtime type**.
- If you are using Firefox, please make sure to allow notifications for this website.

&nbsp;

---

**References**

1. Official YOLOv5 classification tutorial notebook by Ultralytics &nbsp;
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ultralytics/yolov5/blob/master/classify/tutorial.ipynb)
2. Roboflow tutorial notebook for YOLOv5 classification training &nbsp;
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/train-yolov5-classification-on-custom-data.ipynb)

# Initialization

## Show GPU + Linux distribution

In [None]:
!nvidia-smi -L
print("\n")
!head -n 2 /etc/*release

## YOLOv5 setup

In [None]:
!git clone https://github.com/ultralytics/yolov5
%cd yolov5
%pip install -qr requirements.txt

import torch
import utils
display = utils.notebook_init()

## Recommended: Upload dataset from Roboflow

If you are not sure how to export your annotated dataset, check the [Roboflow docs](https://docs.roboflow.com/exporting-data).

> Alternatively you can upload your dataset (folder structure) from **[Google Drive](#scrollTo=RxOnnOadc5vR)** or directly from your **[local file system](#scrollTo=qKTCWdtkOUw7)** in the next steps!

---

**Folder structure of your classification dataset:**

```
dataset_name
├── train
│   ├── class_1
│   │   ├── IMG_123.jpg
│   └── class_2
│       ├── IMG_456.jpg
├── val
│   ├── class_1
│   │   ├── IMG_789.jpg
│   └── class_2
│       ├── IMG_101.jpg
├── test
│   ├── class_1
│   │   ├── IMG_121.jpg
│   └── class_2
│       ├── IMG_341.jpg
```

In [None]:
%pip install -q roboflow
from roboflow import Roboflow

**Copy only the last three lines of the Download Code and insert them at the top of the next code cell:**

In [None]:
### Paste your Download Code here:
rf = Roboflow(api_key="XXXXXXXXXXXXXXXXXXXX")
project = rf.workspace("maximilian-sittinger").project("insect_detect_classification")
dataset = project.version(2).download("folder")
###

dataset_location = dataset.location

from pathlib import Path
print(f"\nLocation of dataset: {dataset_location}\n")
print("Number of classes:", len(list(Path(f"{dataset_location}/train").glob("*"))))
print("\nNumber of training images:", len(list(Path(f"{dataset_location}/train").glob("**/*.jpg"))))
print("Number of validation images:", len(list(Path(f"{dataset_location}/valid").glob("**/*.jpg"))))
print("Number of test images:", len(list(Path(f"{dataset_location}/test").glob("**/*.jpg"))))

## Recommended: Connect to Google Drive

In [None]:
from google.colab import drive
drive.mount("/content/drive")

In [None]:
#@title ## Optional: Upload dataset from Google Drive {display-mode: "form"}

#@markdown ### Google Drive path to dataset folder:
dataset_path = "MyDrive/Datasets/yolov5-cls_dataset" #@param {type: "string"}

%cp -ai /content/drive/{dataset_path} /content/yolov5

from pathlib import Path
dataset_name = Path(dataset_path).stem
dataset_location = f"/content/yolov5/{dataset_name}"

print(f"Location of dataset: {dataset_location}\n")
print("Number of classes:", len(list(Path(f"{dataset_location}/train").glob("*"))))
print("\nNumber of training images:", len(list(Path(f"{dataset_location}/train").glob("**/*.jpg"))))
print("Number of validation images:", len(list(Path(f"{dataset_location}/val").glob("**/*.jpg"))))
print("Number of test images:", len(list(Path(f"{dataset_location}/test").glob("**/*.jpg"))))

In [None]:
#@title ## Optional: Upload dataset from your local file system {display-mode: "form"}

#@markdown ### Name of your (zipped) dataset folder:
dataset_name = "yolov5-cls_dataset" #@param {type: "string"}
#@markdown - Please make sure to compress your dataset folder to **.zip file** before uploading!
#@markdown - The name of the .zip file should be the same as for the dataset folder.

dataset_location = f"/content/yolov5/{dataset_name}"

from google.colab import files
uploaded = files.upload()

import zipfile
if len(list(zipfile.Path(f"{dataset_name}.zip").iterdir())) > 1:
  !unzip -uq {dataset_name}.zip -d /content/yolov5/{dataset_name}
else:
  !unzip -uq {dataset_name}.zip -d /content/yolov5
%rm {dataset_name}.zip

from pathlib import Path
print(f"\nLocation of dataset: {dataset_location}\n")
print("Number of classes:", len(list(Path(f"{dataset_location}/train").glob("*"))))
print("\nNumber of training images:", len(list(Path(f"{dataset_location}/train").glob("**/*.jpg"))))
print("Number of validation images:", len(list(Path(f"{dataset_location}/val").glob("**/*.jpg"))))
print("Number of test images:", len(list(Path(f"{dataset_location}/test").glob("**/*.jpg"))))

# Model training

In [None]:
#@title ## Optional: Select external logger {display-mode: "form"}

logger = "Weights&Biases" #@param ["Weights&Biases", "Comet", "ClearML"]

#@markdown **More info:**
#@markdown - [Weights & Biases](https://docs.wandb.ai/guides/integrations/yolov5)
#@markdown - [Comet](https://github.com/ultralytics/yolov5/tree/master/utils/loggers/comet)
#@markdown - [ClearML](https://github.com/ultralytics/yolov5/tree/master/utils/loggers/clearml)

if logger == "Weights&Biases":
  %pip install -q wandb
  import wandb
  wandb.login()
elif logger == "Comet":
  %pip install -q comet_ml
  import comet_ml
  comet_ml.init()
elif logger == "ClearML":
  %pip install -q clearml
  import clearml
  clearml.browser_login()

## Tensorboard logger

> If you are using Firefox, **disable Enhanced Tracking Protection** for this website (click on the shield to the left of the address bar) for the Tensorboard logger to work correctly!

In [None]:
%load_ext tensorboard
%tensorboard --logdir /content/yolov5/runs/train-cls

## Train YOLOv5 classification model

- `--name` name of the training run
- `--img` input image size (recommended: [median size](https://blog.roboflow.com/resize-images-with-dimension-insights/) of your training images, divisible by 32)
- `--batch` specify batch size (recommended: 64)
- `--epochs` set the number of training [epochs](https://machine-learning.paperspace.com/wiki/epoch) (recommended: 50-100+ epochs)
- `--data` path to dataset folder
- `--model` specify the pretrained [classification model](https://github.com/ultralytics/yolov5#classification)

In [None]:
training_run_name = "YOLOv5s-cls_128_batch64_epochs100" #@param {type: "string"}
#@markdown **Add UTC timestamp in front of training run name:**
Add_timestamp = True #@param {type:"boolean"}
#@markdown ---

image_size = 128 #@param {type: "integer"}
batch_size = 64 #@param {type:"slider", min:16, max:128, step:16}
number_epochs = 100 #@param {type:"slider", min:10, max:600, step:10}
model = "yolov5s-cls.pt" #@param ["yolov5n-cls.pt", "yolov5s-cls.pt", "yolov5m-cls.pt", "yolov5l-cls.pt", "yolov5x-cls.pt", "resnet18.pt", "resnet34.pt", "resnet50.pt", "resnet101.pt", "efficientnet_b0.pt", "efficientnet_b1.pt", "efficientnet_b2.pt", "efficientnet_b3.pt"]

if Add_timestamp == True:
  from datetime import datetime
  utc_timestamp = datetime.now().strftime("%Y%m%d_%H-%M")
  train_run_name = f"{utc_timestamp}_{training_run_name}"
else:
  train_run_name = training_run_name

%cd /content/yolov5

!python classify/train.py \
--name {train_run_name} \
--img {image_size} \
--batch {batch_size} \
--epochs {number_epochs} \
--data {dataset_location} \
--model {model}

### Export trained model weights to ONNX format for faster CPU inference

In [None]:
%cd /content/yolov5

!python export.py \
--weights runs/train-cls/{train_run_name}/weights/best.pt \
--img {image_size} \
--include onnx \
--simplify

In [None]:
#@title ## Export to Google Drive or Download training results {display-mode: "form"}

training_results = "Export_Google_Drive" #@param ["Export_Google_Drive", "Download"]
#@markdown ---

#@markdown ### Path for saving training results in Google Drive:
GDrive_save_path = "MyDrive/Training_results/YOLOv5-cls"  #@param {type: "string"}

if training_results == "Export_Google_Drive":
  %mkdir -p /content/drive/{GDrive_save_path}
  %cp -ai /content/yolov5/runs/train-cls/{train_run_name} /content/drive/{GDrive_save_path}
elif training_results == "Download":
  %cd /content/yolov5/runs/train-cls
  !zip -rq {train_run_name}.zip {train_run_name}
  from google.colab import files
  files.download(f"{train_run_name}.zip")

# Model validation

Check the performance of your model on the dataset test split.

> Copy the validation results (cell output) and save to .txt file, as they will not be saved automatically.

Change the weights from `best.onnx` to `best.pt` if you want to use the original PyTorch model for validation.

In [None]:
%cd /content/yolov5

!python classify/val.py \
--name {train_run_name}_validate \
--weights runs/train-cls/{train_run_name}/weights/best.onnx \
--data {dataset_location} \
--img {image_size}

# Model inference

Test the classification accuracy of your model on the dataset test split.

Change the weights from `best.onnx` to `best.pt` if you want to use the original PyTorch model for prediction.

In [None]:
%cd /content/yolov5

!python classify/predict.py \
--name {train_run_name}_predict \
--weights runs/train-cls/{train_run_name}/weights/best.onnx \
--source {dataset_location}/test/*/*/ \
--img {image_size}

## Show inference results on test images

In [None]:
import glob
from IPython.display import Image, display

for imageName in glob.glob(f"/content/yolov5/runs/predict-cls/{train_run_name}_predict/*.jpg"):
  display(Image(filename=imageName))
  print("\n")

# Model deployment

That's it! You trained your own [YOLOv5](https://github.com/ultralytics/yolov5) classification model with your custom dataset and exported it to ONNX format for faster CPU inference.

> To deploy the classification model on your local PC (CPU), check out the deployment instructions in the [**Insect Detect Docs**](https://maxsitt.github.io/insect-detect-docs/deployment/classification/).