# Training YOLOv8 on Custom Dataset
It turns out that in the time since I last spent several hours working on creating the best YOLOv5 model to detect the construction images, YOLOv8 came out. The interface to use it is much easier than using YOLOv5. Using *YOLOv8L* as the classification / detection algorithm will probably also be the desired choice considering the marginal difference between *YOLOv8L* and *YOLOv8X*. Most of the problems won't be an issue since the corresponding Jupyter notebook that will use the export of *YOLOV8L* will be in PyTorch.

In [None]:
!pip install ultralytics
!pip install wandb
!pip install roboflow

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ultralytics
  Downloading ultralytics-8.0.10-py3-none-any.whl (258 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.4/258.4 KB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
Collecting hydra-core>=1.2.0
  Downloading hydra_core-1.3.1-py3-none-any.whl (154 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.1/154.1 KB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
Collecting sentry-sdk
  Downloading sentry_sdk-1.13.0-py2.py3-none-any.whl (177 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.4/177.4 KB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
Collecting GitPython>=3.1.24
  Downloading GitPython-3.1.30-py3-none-any.whl (184 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.0/184.0 KB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
Collecting thop>=0.1.1
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wandb
  Downloading wandb-0.13.9-py2.py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
Collecting setproctitle
  Downloading setproctitle-1.3.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (31 kB)
Collecting pathtools
  Downloading pathtools-0.1.2.tar.gz (11 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting docker-pycreds>=0.4.0
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Building wheels for collected packages: pathtools
  Building wheel for pathtools (setup.py) ... [?25l[?25hdone
  Created wheel for pathtools: filename=pathtools-0.1.2-py3-none-any.whl size=8806 sha256=49929aabc3a9bfb0b555d8f942aa6b358ed229e276bcec88696a03b0e7065c8f
  Stored in directory: /root/.cache/pip/wheels/4c/8e/7e/72fbc2

In [None]:
import os

from ultralytics import YOLO
from roboflow import Roboflow
from dotenv import load_dotenv

import wandb
wandb.login()

wandb.init(project="yolov8-1", entity="nyu-construction-lab")

[34m[1mwandb[0m: Currently logged in as: [33mbatavm01[0m ([33mnyu-construction-lab[0m). Use [1m`wandb login --relogin`[0m to force relogin


Protect API Keys by putting the key in Google Drive!

In [None]:
from google.colab import drive
drive.mount("./content/")

Mounted at ./content/


Get the API Key via *dotenv* in order to get the RoboFlow construction dataset.

In [None]:
load_dotenv('content/MyDrive/Colab Notebooks/keys.env')

API_KEY = os.getenv('ROBOFLOW_KEY')
rf = Roboflow(api_key=API_KEY)

project = rf.workspace("michael-batavia").project("construction-annotations")
dataset = project.version(4).download("yolov8")

loading Roboflow workspace...
loading Roboflow project...
Downloading Dataset Version Zip in Construction-Annotations-4 to yolov8: 100% [33993220 / 33993220] bytes


Extracting Dataset Version Zip to Construction-Annotations-4 in yolov8:: 100%|██████████| 2644/2644 [00:00<00:00, 7157.70it/s]


`model` -> Pre-trained YOLOv8 Model with COCO weights \\
`base_model` -> Basic YOLOv8 Model (untrained)

In [None]:
model = YOLO("yolov8l.pt")
# base_model = YOLO("yolov8l.yaml")

Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8l.pt to yolov8l.pt...


  0%|          | 0.00/83.7M [00:00<?, ?B/s]




In [None]:
%cat {dataset.location}/data.yaml
print()
print(f"{dataset.location}/data.yaml")

names:
- barrier_lights
- car
- construction_cones
- construction_equipment
- construction_signs
- curbs
- ladder
- person
- trash_bin
- truck
nc: 10
roboflow:
  license: CC BY 4.0
  project: construction-annotations
  url: https://universe.roboflow.com/michael-batavia/construction-annotations/dataset/4
  version: 4
  workspace: michael-batavia
test: ../test/images
train: Construction-Annotations-4/train/images
val: Construction-Annotations-4/valid/images

/content/Construction-Annotations-4/data.yaml


Hyperparameters

In [None]:
batch_size = 16
image_size = 416
epochs = 200
device = '' # use GPU if available
dropout = True
data_loc = "datasets/Construction-Annotations-4/data.yaml"
optimizer = "SGD" # see if ADAM performs any better

print(data_loc)

datasets/Construction-Annotations-4/data.yaml


Shifting dataset into correct path for YOLOv8 to interpret

In [None]:
import shutil

if not os.path.exists("datasets"):
  # YOLOv8 is annoying and wants any and all datasets to be contained in a datasets folder
  os.mkdir("datasets")
  shutil.move("Construction-Annotations-4", "datasets")

In [None]:
results = model.train(data=data_loc, epochs=epochs, imgsz=image_size, batch=batch_size, 
                      dropout=dropout, optimizer=optimizer, device=device, plots=True, name='exp')

Ultralytics YOLOv8.0.10 🚀 Python-3.8.10 torch-1.13.1+cu116 CUDA:0 (Tesla T4, 15110MiB)
[34m[1myolo/engine/trainer: [0mtask=detect, mode=train, model=yolov8l.yaml, data=datasets/Construction-Annotations-4/data.yaml, epochs=200, patience=50, batch=16, imgsz=416, save=True, cache=False, device=, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=False, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, overlap_mask=True, mask_ratio=4, dropout=True, val=True, save_json=False, save_hybrid=False, conf=0.001, iou=0.7, max_det=300, half=False, dnn=False, plots=False, source=ultralytics/assets/, show=False, save_txt=False, save_conf=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, retina_masks=False, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=F

  0%|          | 0.00/755k [00:00<?, ?B/s]

Overriding model.yaml nc=80 with nc=10

                   from  n    params  module                                       arguments                     
  0                  -1  1      1856  ultralytics.nn.modules.Conv                  [3, 64, 3, 2]                 
  1                  -1  1     73984  ultralytics.nn.modules.Conv                  [64, 128, 3, 2]               
  2                  -1  3    279808  ultralytics.nn.modules.C2f                   [128, 128, 3, True]           
  3                  -1  1    295424  ultralytics.nn.modules.Conv                  [128, 256, 3, 2]              
  4                  -1  6   2101248  ultralytics.nn.modules.C2f                   [256, 256, 6, True]           
  5                  -1  1   1180672  ultralytics.nn.modules.Conv                  [256, 512, 3, 2]              
  6                  -1  6   8396800  ultralytics.nn.modules.C2f                   [512, 512, 6, True]           
  7                  -1  1   2360320  ultralytic

In [None]:
# Test on Validation Set
val_results = model.val()

Ultralytics YOLOv8.0.10 🚀 Python-3.8.10 torch-1.13.1+cu116 CUDA:0 (Tesla T4, 15110MiB)
Fusing... 
Model summary: 268 layers, 43614318 parameters, 0 gradients, 164.9 GFLOPs
[34m[1mval: [0mScanning /content/datasets/Construction-Annotations-4/valid/labels.cache... 109 images, 1 backgrounds, 0 corrupt: 100%|██████████| 109/109 [00:00<?, ?it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:03<00:00,  2.02it/s]
                   all        109        530      0.799      0.638      0.679      0.405
        barrier_lights        109         66      0.842      0.545      0.656      0.322
                   car        109         74      0.707      0.619      0.654       0.31
    construction_cones        109          9          1      0.363      0.503      0.259
construction_equipment        109         64      0.745      0.484      0.513      0.287
    construction_signs        109         11       0.79      0.687      0

# Results

In [None]:
import pandas as pd

results_path = "runs/detect/results.csv"
res_table = pd.read_csv(results_path)
res_table.tail()

FileNotFoundError: ignored

Best Possible Result

In [None]:
desired_col = "metrics/mAP50(B)"
maxAP = res_table.loc[res_table[desired_col].idxmax()]
maxAP.head()

Save the best weights of the model

In [None]:
from google.colab import files

best_weights_loc = "/runs/detect/train/weights/best.pt"

files.download(best_weights_loc)

%cp {best_weights_loc} /content/gdrive/My\ Drive