<a href="https://colab.research.google.com/github/heinsense2/AIO_CaseStudy/blob/main/Training_on_FathomNet_Custom_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Custom training using YOLOv5 on Fathomnet custom dataset

This notebook explains how to train a custom dataset using YOLOv5 to recognize different marine species presnt in the Monterey bay. This notebook serves as a guideline to produce the results presented in the paper

 *Demystifying image-based machine learning: a practical guide to automated analysis of imagery using modern machine learning tools*, 


The data is prepared using code available [here](https://github.com/heinsense2/AIO_CaseStudy/tree/main/data/scripts).

NOTE: If you wish to use this notebook, you will need to make changes to refer to the datast locations and python script parameters amoung others.

Here are the relevant steps:

*   Create the dataset and annotations (labels). Organize directories.
*   Export dataset to YOLOv5
*   Train YOLOv5 to recognize the objects (marine animals) in our dataset
*   Evaluate our YOLOv5 model's performance
*   Run inference to view the model at work


# 1. Install requirements

In [None]:
# Clone YOLOv5
!git clone https://github.com/ultralytics/yolov5  # clone repo
%cd yolov5
%pip install -qr requirements.txt # install dependencies
%pip install torch==1.8.1 torchvision==0.9.1

import torch
import os
from IPython.display import Image, clear_output  # to display images

print(f"Setup complete. Using torch {torch.__version__} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")


# 2. Assemble Dataset

To train our model, we need to assemble a dataset of representative images with bounding boxes around the objects we want to detect. Our dataset must be in YOLOv5 format.

The Fathomnet data is downloaded and prepared using code available [here](https://github.com/heinsense2/AIO_CaseStudy/tree/main/data/scripts).


When usig Google Colab, it is recommended to have the data available on Google Drive. So we need to first mount our Google Drive.


In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/gdrive')


Mounted at /content/gdrive


In [None]:
# List the directory where the data resides
#!ls "/content/gdrive/My Drive/data"

dataset45.yaml	images	labels


# 3. Train Our Custom YOLOv5 model

We are able to pass a number of arguments, here is what we used:
- **img:** define input image size
- **batch:** determine batch size
- **epochs:** define the number of training epochs. (Note: often, 3000+ are common here!)
- **data:** Our dataset locaiton is saved in the `data.location`
- **weights:** specify a path to weights to start transfer learning from. Here we choose the generic COCO pretrained checkpoint.
- **cache:** cache images for faster training

In [None]:
!python train.py --img 640 --batch 16 --epochs 10 --data {data.directory}/{domain}.yaml --weights yolov5s.pt --cache

# Evaluate Custom YOLOv5 Detector Performance
All results are logged by default to runs/train, with a new experiment directory created for each new training (runs/train/exp2, runs/train/exp3)

Training losses and performance metrics are saved to Tensorboard and also to a CSV logfile results.csv

If you are new to these metrics, the one you want to focus on is `mAP_0.5` - learn more about mean average precision [here](https://blog.roboflow.com/mean-average-precision/).

In [None]:
# Start tensorboard
# Launch after you have started training
# logs save in the folder "runs/train/exp*"
%load_ext tensorboard
%tensorboard --logdir runs

You can also validate the trained detection model on the test dataset by using the val.py script in YOLOv5.


In [None]:
!python val.py --data {data.directory}/{domain}.yaml --weights runs/train/exp/weights/best.pt --task test

#Run Inference  With Trained Weights
Run inference with a pretrained checkpoint on contents of `test/images` folder.

In [None]:
!python detect.py --weights runs/train/exp/weights/best.pt --img 640 --conf 0.65 --source {dataset.location}/test/images

In [None]:
#display inference on ALL test images

import glob
from IPython.display import Image, display

for imageName in glob.glob('/content/yolov5/runs/detect/exp/*.png'): #assuming PNG
    display(Image(filename=imageName))
    print("\n")

# Export Trained Weights for Future Inference
You can now export the trained weights from our detector for inference on your device elsewhere.


In [None]:
#export your model's weights for future use
from google.colab import files
files.download('./runs/train/exp/weights/best.pt')