<img src="https://raw.githubusercontent.com/maxsitt/insect-detect-docs/main/docs/assets/logo.png" width="500">

# YOLOv5 classification model training + ONNX export

[![DOI](https://zenodo.org/badge/580963598.svg)](https://zenodo.org/badge/latestdoi/580963598)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](https://choosealicense.com/licenses/agpl-3.0/)

Author: &nbsp; Maximilian Sittinger &nbsp;
[<img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="24">](https://github.com/maxsitt) &nbsp;
[<img src="https://upload.wikimedia.org/wikipedia/commons/0/06/ORCID_iD.svg" width="24">](https://orcid.org/0000-0002-4096-8556)

- [**Insect Detect Docs**](https://maxsitt.github.io/insect-detect-docs/) 📑
- [`insect-detect-ml`](https://github.com/maxsitt/insect-detect-ml) GitHub repo

&nbsp;

**Train an image classification model on your own custom dataset with [YOLOv5](https://github.com/ultralytics/yolov5#classification)!**

- Go to **File** in the top menu bar and choose **Save a copy in Drive** before running the notebook.
- Go to **Runtime** and make sure that **GPU** is selected as Hardware accelerator under **Change runtime type**.
- If you are using Firefox, please make sure to allow notifications for this website.
- Using dataset import from [Roboflow](https://roboflow.com/) compresses the images which can lead to a decreased model accuracy.
> Choose option [`Upload dataset from Google Drive`](#scrollTo=hFA-ROJ8rUWU) or `Upload dataset from Zenodo` instead.
- Connecting to Google Drive is recommended, but is not required.
> Choose option [`Upload dataset from your local file system`](#scrollTo=qKTCWdtkOUw7) (slower!) and [`Download results`](#scrollTo=h90_4rFQx0mp) instead.

&nbsp;

---

**References**

1. Official YOLOv5 classification tutorial notebook by Ultralytics &nbsp;
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ultralytics/yolov5/blob/master/classify/tutorial.ipynb)
1. Roboflow tutorial notebook for YOLOv5 classification training &nbsp;
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/train-yolov5-classification-on-custom-data.ipynb)

# Initialization

## Show GPU + CPU and Linux distribution

In [None]:
!nvidia-smi -L
print("\nCPU:")
!grep "model name" /proc/cpuinfo
print("\nLinux distribution:")
!grep "PRETTY_NAME" /etc/os-release

## YOLOv5 setup

In [None]:
!git clone https://github.com/maxsitt/yolov5 # custom YOLOv5 fork
%cd /content/yolov5

# Delete onnxruntime from requirements.txt
!sed -i "/onnxruntime/d" requirements.txt

%pip install -qr requirements.txt

# Don't use the package albumentations for image augmentations (installed by default in Google Colab)
# -> images will only be resized (opencv): https://github.com/maxsitt/yolov5/blob/master/utils/augmentations.py#L356-L370
# -> these augmentations will not be used: https://github.com/maxsitt/yolov5/blob/master/utils/augmentations.py#L312-L348
%pip uninstall -y albumentations

import torch
import utils

# Install onnxruntime-gpu if CUDA is available or onnxruntime for CPU inference
if torch.cuda.is_available():
  %pip install -q onnxruntime-gpu
else:
  %pip install -q onnxruntime

display = utils.notebook_init()

## Recommended: Connect to Google Drive

In [None]:
from google.colab import drive
drive.mount("/content/drive")

## Folder structure of your classification dataset

Separating the dataset into a training ("train"), validation ("val") and test split is necessary to correctly evaluate the performance of your model. A split ratio of 70% train, 20% val and 10% test is recommended. You can find more info in this [blog article](https://blog.roboflow.com/train-test-split/).

> You can upload your original dataset and [**split it into train/val/test subsets**](#scrollTo=vkr0vBcOlT-t) in one of the following steps before training!

```
dataset_name
├── train
│   ├── class_1
│   │   ├── IMG_123.jpg
│   └── class_2
│       ├── IMG_456.jpg
├── val
│   ├── class_1
│   │   ├── IMG_789.jpg
│   └── class_2
│       ├── IMG_101.jpg
├── test
│   ├── class_1
│   │   ├── IMG_121.jpg
│   └── class_2
│       ├── IMG_341.jpg
```

In [None]:
#@title ## Upload dataset from Google Drive {display-mode: "form"}

#@markdown ### Google Drive path to your (zipped) dataset folder:
dataset_path = "/content/drive/MyDrive/classification_dataset.zip" #@param {type: "string"}
#@markdown - Please make sure to compress your dataset folder to **.zip** file for much faster upload speed!

from pathlib import Path

dataset_location = f"/content/yolov5/{Path(dataset_path).stem}"

print("Uploading dataset from Google Drive...\n")
!rsync -ah --info=progress2 --no-i-r {dataset_path} /content/yolov5
if Path(dataset_path).suffix == ".zip":
  import zipfile
  zip_path = f"/content/yolov5/{Path(dataset_path).stem}.zip"
  if len(list(zipfile.Path(zip_path).iterdir())) > 1:
    !unzip -uq {zip_path} -d {dataset_location}
  else:
    !unzip -uq {zip_path} -d /content/yolov5
  %rm {zip_path}
print("\nDataset was successfully uploaded!")

if Path(f"{dataset_location}/valid").exists():
  Path(f"{dataset_location}/valid").rename(f"{dataset_location}/val")

print(f"\nLocation of dataset: {dataset_location}")
print(f"\nTotal number of images: {len(list(Path(dataset_location).glob('**/*.jpg')))}")

if Path(f"{dataset_location}/train").exists():
  classes_train = sorted(list(Path(f"{dataset_location}/train").glob("*")))
  print(f"\nNumber of classes: {len(classes_train)}")
  print(f"\nNumber of training images: {len(list(Path(f'{dataset_location}/train').glob('**/*.jpg')))}")
  print("Number of images per class in the train split:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_train))
else:
  classes = sorted(list(Path(dataset_location).glob("*")))
  print(f"\nNumber of classes: {len(classes)}")
  print("\nNumber of images per class:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes))
if Path(f"{dataset_location}/val").exists():
  classes_val = sorted(list(Path(f"{dataset_location}/val").glob("*")))
  print(f"\nNumber of validation images: {len(list(Path(f'{dataset_location}/val').glob('**/*.jpg')))}")
  print("Number of images per class in the val split:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_val))
if Path(f"{dataset_location}/test").exists():
  classes_test = sorted(list(Path(f"{dataset_location}/test").glob("*")))
  print(f"\nNumber of test images: {len(list(Path(f'{dataset_location}/test').glob('**/*.jpg')))}")
  print("Number of images per class in the test split:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_test))

In [None]:
#@title ## Upload dataset from Zenodo {display-mode: "form"}

#@markdown ### Zenodo DOI of dataset:
zenodo_doi = "10.5281/zenodo.8325384" #@param {type: "string"}
#@markdown - Only works with single dataset folder or .zip file in the Zenodo record.

%pip install -q zenodo_get

from pathlib import Path
import zenodo_get

print("\n")
!zenodo_get {zenodo_doi} --output-dir /content/zenodo

dataset_folder = [f for f in Path("/content/zenodo").iterdir() if f.is_dir()]
if len(dataset_folder) > 1:
  print("\nFound more than one dataset folder!")
elif len(dataset_folder) == 1:
  dataset_location = f"/content/yolov5/{dataset_folder[0]}"
  Path(dataset_folder[0]).rename(dataset_location)
else:
  import zipfile
  dataset_zip = list(Path("/content/zenodo").glob("*.zip"))
  if len(dataset_zip) > 1:
    print("\nFound more than one dataset .zip file!")
  else:
    zip_path = dataset_zip[0]
    dataset_location = f"/content/yolov5/{zip_path.stem}"
    if len(list(zipfile.Path(zip_path).iterdir())) > 1:
      !unzip -uq {zip_path} -d {dataset_location}
    else:
      !unzip -uq {zip_path} -d /content/yolov5
    %rm {zip_path}
print("\nDataset was successfully uploaded!")

if Path(f"{dataset_location}/valid").exists():
  Path(f"{dataset_location}/valid").rename(f"{dataset_location}/val")

print(f"\nLocation of dataset: {dataset_location}")
print(f"\nTotal number of images: {len(list(Path(dataset_location).glob('**/*.jpg')))}")

if Path(f"{dataset_location}/train").exists():
  classes_train = sorted(list(Path(f"{dataset_location}/train").glob("*")))
  print(f"\nNumber of classes: {len(classes_train)}")
  print(f"\nNumber of training images: {len(list(Path(f'{dataset_location}/train').glob('**/*.jpg')))}")
  print("Number of images per class in the train split:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_train))
else:
  classes = sorted(list(Path(dataset_location).glob("*")))
  print(f"\nNumber of classes: {len(classes)}")
  print("\nNumber of images per class:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes))
if Path(f"{dataset_location}/val").exists():
  classes_val = sorted(list(Path(f"{dataset_location}/val").glob("*")))
  print(f"\nNumber of validation images: {len(list(Path(f'{dataset_location}/val').glob('**/*.jpg')))}")
  print("Number of images per class in the val split:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_val))
if Path(f"{dataset_location}/test").exists():
  classes_test = sorted(list(Path(f"{dataset_location}/test").glob("*")))
  print(f"\nNumber of test images: {len(list(Path(f'{dataset_location}/test').glob('**/*.jpg')))}")
  print("Number of images per class in the test split:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_test))

In [None]:
#@title ## Upload dataset from your local file system {display-mode: "form"}

#@markdown ### Name of your zipped dataset folder:
dataset_name = "classification_dataset" #@param {type: "string"}
#@markdown - Please make sure to compress your dataset folder to **.zip** file before uploading!
#@markdown - The name of the .zip file should be the same as for the dataset folder.

from pathlib import Path
import zipfile
from google.colab import files

dataset_location = f"/content/yolov5/{dataset_name}"

uploaded = files.upload()

if len(list(zipfile.Path(f"{dataset_name}.zip").iterdir())) > 1:
  !unzip -uq {dataset_name}.zip -d {dataset_location}
else:
  !unzip -uq {dataset_name}.zip -d /content/yolov5
%rm {dataset_name}.zip

if Path(f"{dataset_location}/valid").exists():
  Path(f"{dataset_location}/valid").rename(f"{dataset_location}/val")

print(f"\nLocation of dataset: {dataset_location}")
print(f"\nTotal number of images: {len(list(Path(dataset_location).glob('**/*.jpg')))}")

if Path(f"{dataset_location}/train").exists():
  classes_train = sorted(list(Path(f"{dataset_location}/train").glob("*")))
  print(f"\nNumber of classes: {len(classes_train)}")
  print(f"\nNumber of training images: {len(list(Path(f'{dataset_location}/train').glob('**/*.jpg')))}")
  print("Number of images per class in the train split:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_train))
else:
  classes = sorted(list(Path(dataset_location).glob("*")))
  print(f"\nNumber of classes: {len(classes)}")
  print("\nNumber of images per class:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes))
if Path(f"{dataset_location}/val").exists():
  classes_val = sorted(list(Path(f"{dataset_location}/val").glob("*")))
  print(f"\nNumber of validation images: {len(list(Path(f'{dataset_location}/val').glob('**/*.jpg')))}")
  print("Number of images per class in the val split:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_val))
if Path(f"{dataset_location}/test").exists():
  classes_test = sorted(list(Path(f"{dataset_location}/test").glob("*")))
  print(f"\nNumber of test images: {len(list(Path(f'{dataset_location}/test').glob('**/*.jpg')))}")
  print("Number of images per class in the test split:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_test))

## Upload dataset from Roboflow

If you are not sure how to export your dataset, check the [Roboflow docs](https://docs.roboflow.com/exporting-data).

In [None]:
%pip install -q roboflow

**Copy only the last three lines of your Download Code and insert them in the next code cell:**

In [None]:
from pathlib import Path
from roboflow import Roboflow

%cd /content/yolov5

### Paste your Download Code here:
rf = Roboflow(api_key="XXXXXXXXXXXXXXXXXXXX")
project = rf.workspace("maximilian-sittinger").project("insect_detect_classification_v2")
dataset = project.version(1).download("folder")
###

dataset_location = dataset.location

if Path(f"{dataset_location}/valid").exists():
  Path(f"{dataset_location}/valid").rename(f"{dataset_location}/val")

print(f"\nLocation of dataset: {dataset_location}")
print(f"\nTotal number of images: {len(list(Path(dataset_location).glob('**/*.jpg')))}")

if Path(f"{dataset_location}/train").exists():
  classes_train = sorted(list(Path(f"{dataset_location}/train").glob("*")))
  print(f"\nNumber of classes: {len(classes_train)}")
  print(f"\nNumber of training images: {len(list(Path(f'{dataset_location}/train').glob('**/*.jpg')))}")
  print("Number of images per class in the train split:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_train))
else:
  classes = sorted(list(Path(dataset_location).glob("*")))
  print(f"\nNumber of classes: {len(classes)}")
  print("\nNumber of images per class:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes))
if Path(f"{dataset_location}/val").exists():
  classes_val = sorted(list(Path(f"{dataset_location}/val").glob("*")))
  print(f"\nNumber of validation images: {len(list(Path(f'{dataset_location}/val').glob('**/*.jpg')))}")
  print("Number of images per class in the val split:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_val))
if Path(f"{dataset_location}/test").exists():
  classes_test = sorted(list(Path(f"{dataset_location}/test").glob("*")))
  print(f"\nNumber of test images: {len(list(Path(f'{dataset_location}/test').glob('**/*.jpg')))}")
  print("Number of images per class in the test split:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_test))

In [None]:
#@title ## Optional: Split dataset into train/val/test subsets {display-mode: "form"}

train_ratio = 0.7 #@param {type:"slider", min:0, max:1.0, step:0.1}
val_ratio = 0.2 #@param {type:"slider", min:0, max:1.0, step:0.1}
test_ratio = 0.1 #@param {type:"slider", min:0, max:1.0, step:0.1}
#@markdown ---

#@markdown Set random seed for shuffling of the images before splitting:
random_seed = 1 #@param {type: "integer"}
#@markdown **Use the same seed to make splits reproducible. Change the seed to generate a new split.**
#@markdown > More info about other options: [`split-folders`](https://github.com/jfilter/split-folders#usage)

%pip install -q split-folders

from pathlib import Path
import splitfolders

if Path(f"{dataset_location}/train").exists():
  print("Train split of your dataset already exists!")
else:
  input_dir = dataset_location
  output_dir = f"{input_dir}_split"
  dataset_location = Path(output_dir)

  splitfolders.ratio(input_dir, output_dir, seed=random_seed, ratio=(train_ratio, val_ratio, test_ratio))

  print(f"\nNew location of dataset after split: {dataset_location}")
  print(f"\nTotal number of images: {len(list(Path(dataset_location).glob('**/*.jpg')))}")

  classes_train = sorted(list(Path(f"{dataset_location}/train").glob("*")))
  print(f"\nNumber of classes: {len(classes_train)}")
  print(f"\nNumber of training images: {len(list(Path(f'{dataset_location}/train').glob('**/*.jpg')))}")
  print("Number of images per class in the train split:")
  print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_train))
  if Path(f"{dataset_location}/val").exists():
    classes_val = sorted(list(Path(f"{dataset_location}/val").glob("*")))
    print(f"\nNumber of validation images: {len(list(Path(f'{dataset_location}/val').glob('**/*.jpg')))}")
    print("Number of images per class in the val split:")
    print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_val))
  if Path(f"{dataset_location}/test").exists():
    classes_test = sorted(list(Path(f"{dataset_location}/test").glob("*")))
    print(f"\nNumber of test images: {len(list(Path(f'{dataset_location}/test').glob('**/*.jpg')))}")
    print("Number of images per class in the test split:")
    print("\n".join(f"{c.name}: {len(list((Path(c).glob('*.jpg'))))}" for c in classes_test))

In [None]:
#@title ## Optional: Calculate metrics of your image dataset {display-mode: "form"}

#@markdown - In our experiments, upscaling of most images in the dataset led to better training
#@markdown   results compared to downscaling of the images with the `cv2.INTER_LINEAR` method that
#@markdown   is used by default during YOLOv5 image preprocessing
#@markdown   ([comparison of OpenCV interpolation algorithms](https://web.archive.org/web/20190424180810/http://tanbakuchi.com/posts/comparison-of-openv-interpolation-algorithms/)).
#@markdown   You can use the 90th percentile of the image sizes (divisible by 32) in your dataset
#@markdown   as reference point to set the input image size for model training in the next step.
#@markdown > Compare models trained with different image sizes to find the best accuracy for your dataset!

from pathlib import Path
from statistics import mean, median

import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

images = list(Path(f"{dataset_location}").glob("**/*.jpg"))
print(f"Found {len(images)} .jpg images in the dataset folder.\n")

img_widths = []
img_heights = []

for img in images:
  with Image.open(img) as im:
    img_widths.append(im.width)
    img_heights.append(im.height)

print(f"Mean image width:    {round(mean(img_widths))} (min: {min(img_widths)} / max: {max(img_widths)})")
print(f"Mean image height:   {round(mean(img_heights))} (min: {min(img_heights)} / max: {max(img_heights)})\n")
print(f"Median image width:  {round(median(img_widths))}")
print(f"Median image height: {round(median(img_heights))}\n")

p90_image_size = round(mean([np.percentile(img_widths, 90), np.percentile(img_heights, 90)]))
print(f"Mean 90th percentile of image width/height: {p90_image_size}")
print(f"Recommended image_size for training:        {int(32 * round(p90_image_size / 32))}\n")

plt.scatter(img_widths, img_heights, marker=".", alpha=0.3, edgecolors="black")
plt.axhline(y=mean(img_heights), color="green", linestyle="--", label="Mean")
plt.axvline(x=mean(img_widths), color="green", linestyle="--")
plt.axhline(y=np.percentile(img_heights, 90), color="red", linestyle="--", label="90th percentile")
plt.axvline(x=np.percentile(img_widths, 90), color="red", linestyle="--")
plt.legend()
plt.rcParams["axes.axisbelow"] = True
plt.grid(color="gray", linewidth=0.5, alpha=0.2)
plt.title("Distribution of image width/height in the dataset")
plt.xlabel("Image width")
plt.ylabel("Image height")
#plt.gcf().set_dpi(300) # higher quality for saving to .png
plt.show()

# Model training

In [None]:
#@title ## Optional: Select external logger {display-mode: "form"}

logger = "Weights&Biases" #@param ["Weights&Biases", "Comet", "ClearML"]

#@markdown > More info:
#@markdown - [Weights & Biases](https://docs.wandb.ai/guides/integrations/yolov5)
#@markdown - [Comet](https://github.com/ultralytics/yolov5/tree/master/utils/loggers/comet)
#@markdown - [ClearML](https://github.com/ultralytics/yolov5/tree/master/utils/loggers/clearml)

if logger == "Weights&Biases":
  %pip install -q wandb
  import wandb
  wandb.login()
elif logger == "Comet":
  %pip install -q comet_ml
  import comet_ml
  comet_ml.init()
elif logger == "ClearML":
  %pip install -q clearml
  import clearml
  clearml.browser_login()

## Tensorboard logger

> If you are using Firefox, **disable Enhanced Tracking Protection** for this website (click on the shield to the left of the address bar) for the Tensorboard logger to work correctly!

In [None]:
%load_ext tensorboard
%tensorboard --logdir /content/yolov5/runs/train-cls

## Train image classification model

- `--name` name of the training run
- `--imgsz` input image size
- `--batch` specify batch size (recommended: 64)
- `--epochs` set the number of training [epochs](https://machine-learning.paperspace.com/wiki/epoch) (recommended: 10-30)
- `--data` path to dataset folder
- `--model` specify the pretrained [classification model](https://github.com/ultralytics/yolov5#classification) (recommended: EfficientNet-B0)
- `--cache` cache images in RAM for faster training

In [None]:
training_run_name = "EfficientNet-B0_128_batch64_epochs20" #@param {type: "string"}
#@markdown Add UTC timestamp in front of training run name:
add_timestamp = True #@param {type:"boolean"}
#@markdown ---

image_size = 128 #@param {type:"slider", min:32, max:224, step:32}
batch_size = 64 #@param {type:"slider", min:32, max:128, step:32}
number_epochs = 20 #@param {type:"slider", min:5, max:50, step:5}
model = "efficientnet_b0.pt" #@param ["yolov5n-cls.pt", "yolov5s-cls.pt", "yolov5m-cls.pt", "yolov5l-cls.pt", "yolov5x-cls.pt", "resnet18.pt", "resnet34.pt", "resnet50.pt", "resnet101.pt", "efficientnet_b0.pt", "efficientnet_b1.pt", "efficientnet_b2.pt", "efficientnet_b3.pt"]

if add_timestamp:
  from datetime import datetime
  utc_timestamp = datetime.now().strftime("%Y%m%d_%H-%M")
  train_run_name = f"{utc_timestamp}_{training_run_name}"
else:
  train_run_name = training_run_name

%cd /content/yolov5

!python classify/train.py \
--name {train_run_name} \
--imgsz {image_size} \
--batch {batch_size} \
--epochs {number_epochs} \
--data {dataset_location} \
--model {model} \
--cache

### Export trained model weights to ONNX format for faster CPU inference

In [None]:
%cd /content/yolov5

!python export.py \
--weights runs/train-cls/{train_run_name}/weights/best.pt \
--imgsz {image_size} \
--include onnx \
--simplify \
--device 0 # use "--device cpu" if not connected to GPU runtime

In [None]:
#@title ## Export to Google Drive or Download training results {display-mode: "form"}

training_results = "Export_Google_Drive" #@param ["Export_Google_Drive", "Download"]
#@markdown ---

#@markdown ### Path for saving training results in Google Drive:
GDrive_save_path = "/content/drive/MyDrive/Training_results/YOLOv5-cls" #@param {type: "string"}

if training_results == "Export_Google_Drive":
  print("Exporting training results to Google Drive...\n")
  !rsync -ah --mkpath --info=progress2 --no-i-r /content/yolov5/runs/train-cls/{train_run_name} {GDrive_save_path}
  print("\nTraining results were successfully exported!")
elif training_results == "Download":
  from google.colab import files
  %cd /content/yolov5/runs/train-cls
  !zip -rq {train_run_name}.zip {train_run_name}
  %cd -
  files.download(f"/content/yolov5/runs/train-cls/{train_run_name}.zip")

# Model validation

Test the classification accuracy of your model on the validation and/or test dataset.

> Change the weights from `best.onnx` to `best.pt` if you want to use the original PyTorch model for validation.

In [None]:
task = "val" #@param ["val", "test"]
#@markdown > Use `task: test` to validate on the dataset test split.

from IPython.display import Image, display

val_run_name = f"{train_run_name}_validate_{task}"

%cd /content/yolov5

!python classify/val.py \
--name {val_run_name} \
--weights runs/train-cls/{train_run_name}/weights/best.onnx \
--data {dataset_location} \
--imgsz {image_size} \
--task {task}

print("\n")
display(Image(f"runs/val-cls/{val_run_name}/confusion_matrix_{task}.png", width=800))

In [None]:
#@title ## Export to Google Drive or Download validation results {display-mode: "form"}

validation_results = "Export_Google_Drive" #@param ["Export_Google_Drive", "Download"]
#@markdown ---

#@markdown ### Path for saving validation results in Google Drive:
GDrive_save_path = "/content/drive/MyDrive/Training_results/YOLOv5-cls" #@param {type: "string"}

if validation_results == "Export_Google_Drive":
  print("Exporting validation results to Google Drive...\n")
  !rsync -ah --mkpath --info=progress2 --no-i-r /content/yolov5/runs/val-cls/{val_run_name} {GDrive_save_path}/{train_run_name}
  print("\nValidation results were successfully exported!")
elif validation_results == "Download":
  from google.colab import files
  %cd /content/yolov5/runs/val-cls
  !zip -rq {val_run_name}.zip {val_run_name}
  %cd -
  files.download(f"/content/yolov5/runs/val-cls/{val_run_name}.zip")

# Model inference

Use your model to classify the images in the dataset test split.

> Change the weights from `best.onnx` to `best.pt` if you want to use the original PyTorch model for prediction.

In [None]:
from IPython.display import Image, display

pred_run_name = f"{train_run_name}_predict"

%cd /content/yolov5

!python classify/predict.py \
--name {pred_run_name} \
--weights runs/train-cls/{train_run_name}/weights/best.onnx \
--source {dataset_location}/test/*/*/ \
--imgsz {image_size}

display(Image(f"runs/predict-cls/{pred_run_name}/results/top1_prob.png", width=800))
print("\n")
display(Image(f"runs/predict-cls/{pred_run_name}/results/top1_prob_mean.png", width=800))

In [None]:
#@title ## Export to Google Drive or Download inference results {display-mode: "form"}

inference_results = "Export_Google_Drive" #@param ["Export_Google_Drive", "Download"]
#@markdown Include images with inference results (top 1 class + probability):
include_images = False #@param {type:"boolean"}
#@markdown ---

#@markdown ### Path for saving inference results in Google Drive:
GDrive_save_path = "/content/drive/MyDrive/Training_results/YOLOv5-cls" #@param {type: "string"}

if include_images:
  %cd /content/yolov5/runs/predict-cls
  !zip -rq {pred_run_name}.zip {pred_run_name}
  %cd -

if inference_results == "Export_Google_Drive":
  print("\nExporting inference results to Google Drive...\n")
  if include_images:
    !rsync -ah --mkpath --info=progress2 --no-i-r /content/yolov5/runs/predict-cls/{pred_run_name}.zip {GDrive_save_path}/{train_run_name}
  else:
    !rsync -ah --mkpath --info=progress2 --no-i-r /content/yolov5/runs/predict-cls/{pred_run_name}/results {GDrive_save_path}/{train_run_name}/{pred_run_name}
  print("\nInference results were successfully exported!")
elif inference_results == "Download":
  from google.colab import files
  if include_images:
    files.download(f"/content/yolov5/runs/predict-cls/{pred_run_name}.zip")
  else:
    %cd /content/yolov5/runs/predict-cls
    !zip -rq {pred_run_name}.zip {pred_run_name}/results
    %cd -
    files.download(f"/content/yolov5/runs/predict-cls/{pred_run_name}.zip")

## Show inference results on test images

In [None]:
from pathlib import Path
from IPython.display import Image, display

for img in Path(f"/content/yolov5/runs/predict-cls/{pred_run_name}").glob("*.jpg"):
  display(Image(img))
  print("\n")

# Model deployment

That's it! You trained an image classification model on your custom dataset with [YOLOv5](https://github.com/ultralytics/yolov5#classification) and exported it to ONNX format for faster CPU inference.

> To deploy the classification model on your local PC, check out the deployment instructions in the [**Insect Detect Docs**](https://maxsitt.github.io/insect-detect-docs/deployment/classification/).