# Retrain the YOLO model

We have a prepared dataset (from RoboFlow) that has annotated images and split them into training and validation datasets.  As before, we'll use the training set to teach the model and the validation set to test the results of the study and measure the quality of the trained model.

1. The encode classes of objects we want to teach our model to detect is 0-'moderate' and 1-'severe'.
2. We have created a folder for the dataset (data) and have have 2 subfolders in it: 'train' and 'valid'.  Within each subfolder we have created 2 subfolders:  'images' and 'labels'.
3. Each image has an annotation text file in the 'labels' subfolder. The annotation text files have the same names as the image files.

Once the images and associated annotations are ready, we create a dataset descriptor YAML file (data.yaml) that points to the created datasets and describes the object classes in them.  This YAML file is passed to the 'train' method of the model to start the training process.

Let's get started by installing ultralytics!

In [4]:
!pip install ultralytics
from ultralytics import YOLO


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Next let's load a YOLO model 'yolo8m.pt'

In [5]:
# Load model
model = YOLO('yolov8m.pt')  # load a pretrained model (recommended for training)

Once we've loaded our model we are going to start a training loop.  'data' is the only required option.  You pass the YAML descriptor file to it.
Each cycle has a training phase and validation phase.  

In [6]:
# Train model
results = model.train(data='data.yaml', epochs=7, imgsz=640)

Ultralytics YOLOv8.0.222 🚀 Python-3.8.6 torch-1.13.1+cpu CPU (AMD EPYC 7R32)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=yolov8m.pt, data=data.yaml, epochs=7, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train5, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=Fa

[34m[1mtrain: [0mScanning /opt/app-root/src/data/train/labels.cache... 9758 images, 7 backgrounds, 0 corrupt: 100%|██████████| 9758/9758 [00:00<?, ?it/s]
[34m[1mval: [0mScanning /opt/app-root/src/data/valid/labels.cache... 1347 images, 1 backgrounds, 0 corrupt: 100%|██████████| 1347/1347 [00:00<?, ?it/s]


Plotting labels to runs/detect/train5/labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.001667, momentum=0.9) with parameter groups 77 weight(decay=0.0), 84 weight(decay=0.0005), 83 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to [1mruns/detect/train5[0m
Starting training for 7 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        1/7         0G      1.271      1.893      1.645         27        640: 100%|██████████| 610/610 [2:29:54<00:00, 14.74s/it]  
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 43/43 [07:02<00:00,  9.82s/it]

                   all       1347       1406      0.304      0.387      0.314      0.148






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        2/7         0G      1.347      1.792      1.678         33        640: 100%|██████████| 610/610 [2:29:35<00:00, 14.71s/it]  
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 43/43 [07:02<00:00,  9.83s/it]

                   all       1347       1406      0.428      0.449      0.398      0.236






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        3/7         0G      1.226       1.58      1.583         26        640: 100%|██████████| 610/610 [2:29:27<00:00, 14.70s/it]  
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 43/43 [07:01<00:00,  9.80s/it]

                   all       1347       1406      0.533      0.556      0.549      0.359






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        4/7         0G      1.113      1.404      1.491         25        640: 100%|██████████| 610/610 [2:29:29<00:00, 14.70s/it]  
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 43/43 [07:02<00:00,  9.83s/it]

                   all       1347       1406      0.612      0.655      0.663      0.488






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        5/7         0G      1.048       1.26      1.438         31        640: 100%|██████████| 610/610 [2:29:32<00:00, 14.71s/it]  
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 43/43 [07:00<00:00,  9.77s/it]

                   all       1347       1406      0.698      0.662      0.728      0.545






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        6/7         0G     0.9534      1.112      1.364         22        640: 100%|██████████| 610/610 [2:29:42<00:00, 14.73s/it]  
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 43/43 [07:01<00:00,  9.80s/it]

                   all       1347       1406      0.775      0.738      0.813      0.657






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        7/7         0G     0.8679     0.9703      1.298         37        640: 100%|██████████| 610/610 [2:29:39<00:00, 14.72s/it]  
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 43/43 [07:06<00:00,  9.91s/it]

                   all       1347       1406      0.827       0.79      0.882      0.733






7 epochs completed in 18.286 hours.
Optimizer stripped from runs/detect/train5/weights/last.pt, 52.0MB
Optimizer stripped from runs/detect/train5/weights/best.pt, 52.0MB

Validating runs/detect/train5/weights/best.pt...
Ultralytics YOLOv8.0.222 🚀 Python-3.8.6 torch-1.13.1+cpu CPU (AMD EPYC 7R32)
Model summary (fused): 218 layers, 25840918 parameters, 0 gradients, 78.7 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 43/43 [06:44<00:00,  9.40s/it]


                   all       1347       1406      0.826       0.79      0.882      0.733
              moderate       1347        329      0.799      0.705      0.823      0.684
                severe       1347       1077      0.854      0.875      0.941      0.783
Speed: 0.6ms preprocess, 293.1ms inference, 0.0ms loss, 0.2ms postprocess per image
Results saved to [1mruns/detect/train5[0m


In [None]:
#export 'best' model to ONNX format
#ObjDetOXModel = YOLO("runs/detect/train6/weights/best.pt").export(format="onnx")

## Interpreting our Training Results

For each epoch it shows a summary for both the training and validation phases: lines 1 and 2 show results of the training phase and lines 3 and 4 show the results of the validation phase for each epoch.

The training phase includes a calculation of the amount of error in a loss function, so the most valuable metrics here are box_loss and cls_loss.

box_loss shows the amount of error in detected bounding boxes.
cls_loss shows the amount of error in detected object classes.

If the model really learns something from the data, then you should see that these values decrease from epoch to epoch. 
In a previous screenshot the box_loss decreased: 1.271, 1.113, 0.8679 and the cls_loss decreased too: 1.893, 1.404, 0.9703.

The most valuable quality metric is mAP50-95, which is Mean Average Precision. If the model learns and improves, the precision should grow from epoch to epoch.  In a previous screenshot mAP50-95 increased: 0.314 (epoch1), 0.663 (epoch4), 0.882 (epoch7)

If after the last epoch you did not get acceptable precision, you can increase the number of epochs and run the training again. Also, you can tune other parameters like batch, lr0, lrf or change the optimizer you're using.

During training we export the trained model, after each epoch, to the /runs/detect/train/weights/last.pt file and the model with the highest precision to the /runs/detect/train/weights/best.pt file. So, after training is finished, you can get the best.pt file to use in production.

Note:  In real world problems, you need to run much more epochs (then we have shown here) and be prepared to wait hours or days (like we did!) until training finishes.




### Now that we have retrained our model let's test it!  Open notebook:  4-test-retrained-model.ipynb