# Retrain the YOLO model

We have a prepared dataset (from RoboFlow) that has annotated images and split them into training and validation datasets.  As before, we'll use the training set to teach the model and the validation set to test the results of the study and measure the quality of the trained model.

1. The encode classes of objects we want to teach our model to detect is 0-'moderate' and 1-'severe'.
2. We have created a folder for the dataset (data) and have have 2 subfolders in it: 'train' and 'valid'.  Within each subfolder we have created 2 subfolders:  'images' and 'labels'.
3. Each image has an annotation text file in the 'labels' subfolder. The annotation text files have the same names as the image files.

Once the images and associated annotations are ready, we create a dataset descriptor YAML file (data.yaml) that points to the created datasets and describes the object classes in them.  This YAML file is passed to the 'train' method of the model to start the training process.

Let's get started by installing ultralytics!

In [3]:
!pip install ultralytics 
from ultralytics import YOLO


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Next let's load a YOLO model 'yolo8m.pt'

In [4]:
# Load model
model = YOLO('yolov8m.pt')  # load a pretrained model (recommended for training)

Once we've loaded our model we are going to start a training loop.  'data' is the only required option.  You pass the YAML descriptor file to it.
Each cycle has a training phase and validation phase.  

## Get the training data


In [5]:
%%bash

# Check if the directory exists, if not, create it
if [ ! -d "./datasets/" ]; then
    mkdir -p ./datasets/
fi

cd ./datasets/

URL="https://rhods-public.s3.amazonaws.com/sample-data/accident-data/accident.zip" 

# Check if the file exists, if not, download it
if [ ! -e "accident.zip" ]; then
    # curl $URL -o accident.zip
    echo "Downloading file"
    time curl -L -O -J \
        --retry 3 \
        --retry-delay 5 \
        --retry-max-time 30 \
        $URL
    ls -alh accident.zip    

    echo "unzipping file"
    time unzip -q accident.zip 
fi

Downloading file


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  901M  100  901M    0     0  48.5M      0  0:00:18  0:00:18 --:--:-- 43.5M

real	0m18.591s
user	0m0.694s
sys	0m1.678s


-rw-r--r--. 1 1000830000 1000830000 902M Dec 11 22:32 accident.zip
unzipping file



real	0m5.940s
user	0m4.442s
sys	0m1.430s


In [None]:
# Train model
# results = model.train(data='data.yaml', epochs=7, imgsz=640)
results = model.train(data='datasets/data.yaml', epochs=1, imgsz=640)


Ultralytics YOLOv8.0.226 🚀 Python-3.9.18 torch-2.0.1+cu118 CPU (Intel Xeon Platinum 8259CL 2.50GHz)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=yolov8m.pt, data=datasets/data.yaml, epochs=1, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=Non

[34m[1mtrain: [0mScanning /opt/app-root/src/insurance-claim-processing/lab-materials/03/datasets/train/labels... 9758 images, 7 backgrounds, 0 corrupt: 100%|██████████| 9758/9758 [00:05<00:00, 1820.95it/s]


[34m[1mtrain: [0mNew cache created: /opt/app-root/src/insurance-claim-processing/lab-materials/03/datasets/train/labels.cache


[34m[1mval: [0mScanning /opt/app-root/src/insurance-claim-processing/lab-materials/03/datasets/valid/labels... 1347 images, 1 backgrounds, 0 corrupt: 100%|██████████| 1347/1347 [00:00<00:00, 1801.04it/s]

[34m[1mval: [0mNew cache created: /opt/app-root/src/insurance-claim-processing/lab-materials/03/datasets/valid/labels.cache





Plotting labels to runs/detect/train/labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.001667, momentum=0.9) with parameter groups 77 weight(decay=0.0), 84 weight(decay=0.0005), 83 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to [1mruns/detect/train[0m
Starting training for 1 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        1/1         0G      1.676       4.71      2.009         28        640:   0%|          | 1/610 [00:11<1:51:51, 11.02s/it]

In [None]:
#export 'best' model to ONNX format
#ObjDetOXModel = YOLO("runs/detect/train6/weights/best.pt").export(format="onnx")

## Interpreting our Training Results

For each epoch it shows a summary for both the training and validation phases: lines 1 and 2 show results of the training phase and lines 3 and 4 show the results of the validation phase for each epoch.

The training phase includes a calculation of the amount of error in a loss function, so the most valuable metrics here are box_loss and cls_loss.

box_loss shows the amount of error in detected bounding boxes.
cls_loss shows the amount of error in detected object classes.

If the model really learns something from the data, then you should see that these values decrease from epoch to epoch. 
In a previous screenshot the box_loss decreased: 1.271, 1.113, 0.8679 and the cls_loss decreased too: 1.893, 1.404, 0.9703.

The most valuable quality metric is mAP50-95, which is Mean Average Precision. If the model learns and improves, the precision should grow from epoch to epoch.  In a previous screenshot mAP50-95 increased: 0.314 (epoch1), 0.663 (epoch4), 0.882 (epoch7)

If after the last epoch you did not get acceptable precision, you can increase the number of epochs and run the training again. Also, you can tune other parameters like batch, lr0, lrf or change the optimizer you're using.

During training we export the trained model, after each epoch, to the /runs/detect/train/weights/last.pt file and the model with the highest precision to the /runs/detect/train/weights/best.pt file. So, after training is finished, you can get the best.pt file to use in production.

Note:  In real world problems, you need to run much more epochs (then we have shown here) and be prepared to wait hours or days (like we did!) until training finishes.




### Now that we have retrained our model let's test it!  Open notebook:  4-test-retrained-model.ipynb