<a href="https://colab.research.google.com/github/jackiemalooly/aml-group-project/blob/jackie-yolov5/yolov5_train_job.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook uses the YOLOv5 🚀 github repo by Ultralytics to run a training job on [the latest yolov5 release, version 7.0](https://github.com/ultralytics/yolov5/releases).

# Setup

In [None]:
!git clone https://github.com/jackiemalooly/yolov5  # clone
%cd yolov5
%pip install -qr requirements.txt comet_ml  # install

import torch
import utils
display = utils.notebook_init()  # checks

YOLOv5 🚀 f2f86eb Python-3.11.12 torch-2.6.0+cu124 CUDA:0 (NVIDIA A100-SXM4-40GB, 40507MiB)


Setup complete ✅ (12 CPUs, 83.5 GB RAM, 41.1/235.7 GB disk)


Fetch changes from ultralytics so codebase is up to date.

In [None]:
!git remote add upstream https://github.com/ultralytics/yolov5
!git fetch upstream
!git merge upstream/master

error: remote upstream already exists.
Updating f2f86eb3..fe1d4d99
Fast-forward
 .github/workflows/links.yml |   12 [32m+[m[31m-[m
 classify/tutorial.ipynb     |  106 [32m++[m[31m--[m
 segment/tutorial.ipynb      |   98 [32m++[m[31m--[m
 tutorial.ipynb              | 1195 [32m++++++++++++++++++++++[m[31m---------------------[m
 4 files changed, 729 insertions(+), 682 deletions(-)


Optional step to change to a working branch. Do not push work to the master branch.

In [None]:
# Set up your GitHub credentials
import os
username = "jackiemalooly"
token = "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"

# Configure the repository with your credentials
repo_url = "https://github.com/jackiemalooly/yolov5.git"
authenticated_url = f"https://{username}:{token}@github.com/jackiemalooly/yolov5.git"

# Set the remote URL with your credentials
!git remote set-url origin {authenticated_url}

Change to working branch. It's easiest to create this branch in github and then connect to it in a colab working session.

In [None]:
!git checkout jackie_finetune_job

Already on 'jackie_finetune_job'
Your branch is ahead of 'origin/jackie_finetune_job' by 3 commits.
  (use "git push" to publish your local commits)


Commit any changes. If there were updates from the original repo then be sure to commit those to the forked repo.

In [None]:
!git config --global user.email "jackie.malooly@gmail.com"
!git config --global user.name "jackiemalooly"
!git add . && git commit -m "Update codebase" && git push

On branch jackie_finetune_job
Your branch is ahead of 'origin/jackie_finetune_job' by 3 commits.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean


Connect to google drive and import [the military aircraft detection dataset found on Kaggle](https://www.kaggle.com/datasets/a2015003713/militaryaircraftdetectiondataset/data).

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os
import shutil

# Source paths, pulling 50% of the dataset.
source_base = '/content/drive/MyDrive/updated dataset for aircraft detection/split_50_per'
source_train = os.path.join(source_base, 'train')
source_valid = os.path.join(source_base, 'valid')
source_test = os.path.join(source_base, 'test')
source_yaml = os.path.join(source_base, 'data.yaml')

# Target paths - saved in the datasets folder
target_base = '/content/yolov5/datasets'
target_valid = os.path.join(target_base, 'valid')
target_train = os.path.join(target_base, 'train')
target_test = os.path.join(target_base, 'test')
target_yaml = os.path.join(target_base, 'data.yaml')

# Create the target base directory if it doesn't exist
os.makedirs(target_base, exist_ok=True)

# Create symbolic links for each element
for src, tgt in [(source_train, target_train), (source_valid, target_valid), (source_test, target_test), (source_yaml, target_yaml)]:
    if not os.path.exists(tgt):
        if os.path.exists(src):
            os.symlink(src, tgt)
            print(f"Symbolic link created successfully for {os.path.basename(src)}!")
        else:
            print(f"Source path {src} doesn't exist. Please check the path.")
    else:
        print(f"Target path {tgt} already exists. Skipping.")

Symbolic link created successfully for train!
Symbolic link created successfully for valid!
Symbolic link created successfully for test!
Symbolic link created successfully for data.yaml!


Train yolov5 on `--data datasets/military_aircraft_detection_split_50` , starting from pretrained `--weights yolov5n.pt`.

*   Pretrained Models are downloaded automatically from the latest YOLOv5 release
*   Training Results are saved to runs/train/ with incrementing run directories, i.e. runs/train/exp2, runs/train/exp3 etc.

Initialize comet_ml as logger.



In [None]:
%pip install -q comet_ml
import comet_ml; comet_ml.login("nhg2k7rcCRoF7OwnBFlENiEGK", project_name='aml_group_project')

If running baseline...
*   in `train.py` file change function `create_dataloader()` to `augment=False`. And don't forget to save before running training job.
*   update `hyp.no-augmentation.yaml` with [baseline hyperparameters](https://surreyac-my.sharepoint.com/:w:/r/personal/rs02294_surrey_ac_uk/_layouts/15/Doc.aspx?sourcedoc=%7B1F5A5F81-FADD-4581-871E-6F9162E07C50%7D&file=Baseline%20Hyperparemeters.docx&action=default&mobileredirect=true).
*   confirm that `--hyp hyp.no-augmentation.yaml` is added as an argument to python train.py
*   All opt args to add to train.py command: `--img 640 --batch 16 --epochs 50 --data datasets/data.yaml --weights yolov5n.pt --cache --optimizer AdamW --name yolov5_50_split_baseline --seed 42 --hyp hyp.no-augmentation.yaml`

Kick off training job.



In [None]:
!python train.py --img 640 --batch 32 --epochs 100 --data datasets/data.yaml --weights yolov5s.pt --cache --name yolov5_50_split_baseline_yolo_defaults_no_hyp --seed 42

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with torch.cuda.amp.autocast(amp):
  with tor

Save to github branch.

In [None]:
!git config --global user.email "jackie.malooly@gmail.com"
!git config --global user.name "jackiemalooly"
!git add .
!git commit -m "Training job for yolov5 baseline no hyp all yolo defaults used"

[jackie_finetune_job f948e82b] Training job for yolov5 baseline no hyp all yolo defaults used
 4 files changed, 4 insertions(+)
 create mode 120000 datasets/data.yaml
 create mode 120000 datasets/test
 create mode 120000 datasets/train
 create mode 120000 datasets/valid


Check the status to confirm what branch you are on before pushing changes.

In [None]:
!git status

In [None]:
!git push

Enumerating objects: 8, done.
Counting objects:  12% (1/8)Counting objects:  25% (2/8)Counting objects:  37% (3/8)Counting objects:  50% (4/8)Counting objects:  62% (5/8)Counting objects:  75% (6/8)Counting objects:  87% (7/8)Counting objects: 100% (8/8)Counting objects: 100% (8/8), done.
Delta compression using up to 12 threads
Compressing objects:  14% (1/7)Compressing objects:  28% (2/7)Compressing objects:  42% (3/7)Compressing objects:  57% (4/7)Compressing objects:  71% (5/7)Compressing objects:  85% (6/7)Compressing objects: 100% (7/7)Compressing objects: 100% (7/7), done.
Writing objects:  14% (1/7)Writing objects:  28% (2/7)Writing objects:  42% (3/7)Writing objects:  71% (5/7)Writing objects:  85% (6/7)Writing objects: 100% (7/7)Writing objects: 100% (7/7), 577 bytes | 577.00 KiB/s, done.
Total 7 (delta 4), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas:   0% (0/4)[Kremote: Resolving deltas:  25% (1/4)[Kremote: Resolving deltas:  50% (2/4)