# Retrain the YOLO model

To retrain the YOLO model we need a prepared dataset of car images with moderate and severe accident labels.  We have such a dataset (from RoboFlow) that has annotated images and split them into training and validation datasets.  We will use this training set to retrain our currentl YOLO model.

1. The encode classes of objects we want to teach our model to detect is 0-'moderate' and 1-'severe'.
2. We have created a folder for the dataset (data) and have have 2 subfolders in it: 'train' and 'valid'.  Within each subfolder we have created 2 subfolders:  'images' and 'labels'.
3. Each image has an annotation text file in the 'labels' subfolder. The annotation text files have the same names as the image files.

Once the images and associated annotations are ready, we create a dataset descriptor YAML file (data.yaml) that points to the created datasets and describes the object classes in them.  This YAML file is passed to the 'train' method of the model to start the training process.

Let's get started by installing ultralytics!

In [1]:
!pip install ultralytics 
from ultralytics import YOLO


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Next let's load a YOLO model 'yolo8m.pt'

In [2]:
# Load model
model = YOLO('yolov8m.pt')  # load a pretrained model (recommended for training)

Once we've loaded our model we are going to start a training loop.  'data' is the only required option.  You pass the YAML descriptor file to it.
Each cycle has a training phase and validation phase.  

## Get the training data
The training data is available as a 'zip file' (accident.zip).  We have placed this file into an S3 bucket.  You can download this data and use it to retrain the model.

In [5]:
%%bash

# Check if the directory exists, if not, create it
if [ ! -d "./datasets/" ]; then
    mkdir -p ./datasets/
fi

cd ./datasets/

URL="https://rhods-public.s3.amazonaws.com/sample-data/accident-data/accident.zip" 

# Check if the file exists, if not, download it
if [ ! -e "accident.zip" ]; then
    # curl $URL -o accident.zip
    echo "Downloading file"
    time curl -L -O -J \
        --retry 3 \
        --retry-delay 5 \
        --retry-max-time 30 \
        $URL
    ls -alh accident.zip    

    echo "unzipping file"
    time unzip -q accident.zip 
fi

## Re-training our YOLO model
If we were to truly 're-train' the model it would take close to 6 hours.  Therefore you can run the following code, but be prepared to stop it so that you can continue with the lab.  

In [15]:
# Train model

results = model.train(data='data.yaml', epochs=7, imgsz=640)
#results = model.train(data='datasets/data.yaml', epochs=1, imgsz=640)


Ultralytics YOLOv8.0.232 🚀 Python-3.8.6 torch-1.13.1+cpu CPU (AMD EPYC 7R32)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=yolov8m.pt, data=data.yaml, epochs=7, time=None, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train11, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, form

[34m[1mtrain: [0mScanning /opt/app-root/src/data/train/labels.cache... 9758 images, 7 backgrounds, 0 corrupt: 100%|██████████| 9758/9758 [00:00<?, ?it/s]
[34m[1mval: [0mScanning /opt/app-root/src/data/valid/labels.cache... 1347 images, 1 backgrounds, 0 corrupt: 100%|██████████| 1347/1347 [00:00<?, ?it/s]

Plotting labels to runs/detect/train11/labels.jpg... 





[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.001667, momentum=0.9) with parameter groups 77 weight(decay=0.0), 84 weight(decay=0.0005), 83 bias(decay=0.0)
7 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        1/7         0G      1.535       4.14      1.854         32        640:   0%|          | 2/610 [00:31<2:40:02, 15.79s/it]


KeyboardInterrupt: 

In [None]:
#export 'best' model to ONNX format
#ObjDetOXModel = YOLO("runs/detect/train6/weights/best.pt").export(format="onnx")


Now that we have retrained our model let's test it against images with car accidents!  <B> Please go to notebook '04-04-accident-recog.ipynb'</B>