# Training Ultralytics YOLOv11 Object Detection on Google Colab

This notebook shows how to train Ultralytics YOLOv11 Object Detector using a custom dataset.

This is done using the Google Colab infrastructure, which provides us some key features like:

- ability to run processes on GPU, essential to be able to train computer vision models.

- storage in Google Drive, where our data will be stored, both the dataset and the checkpoint of the trained model.



# 1. Installing Ultralytics



In [2]:
# Install ultralytics package.

!pip install -qq ultralytics

In [3]:
# Check that the package has been installed
import ultralytics

print(ultralytics.__version__)

8.3.80


In [4]:
# When the package gets installed we can run the yolo cli command.
# the yolo settings shows us how is the package configured, and some
# important directories paths, like:
# - datasets_dir: where the datasets are expected to be stored, and
# - runs_dir: where the processes are going to store its results
#             (like the resulting training files)

!yolo settings

JSONDict("/root/.config/Ultralytics/settings.json"):
{
  "settings_version": "0.0.6",
  "datasets_dir": "/content/datasets",
  "weights_dir": "weights",
  "runs_dir": "runs",
  "uuid": "569f3ba64b326db489132663f79cd37279811de477381b83ac131e6cdd129cbb",
  "sync": true,
  "api_key": "",
  "openai_api_key": "",
  "clearml": true,
  "comet": true,
  "dvc": true,
  "hub": true,
  "mlflow": true,
  "neptune": true,
  "raytune": true,
  "tensorboard": true,
  "wandb": false,
  "vscode_msg": true
}
💡 Learn more about Ultralytics Settings at https://docs.ultralytics.com/quickstart/#ultralytics-settings


# 2. Preparing the dataset

In [5]:
# First of all, let's mount Google Drive.
# When running this cell, a pop up will appear asking
# you to grant access to your Google Drive drive.

# This way the training process will be able to store the progress
# into your google drive.

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [43]:

# TODO:
# Create a folder into your Google Drive.
# This folder will be the working directory.
# Then point the DRIVE_WORKING_DIR variable to the path of this directory.

# In my case is:
DRIVE_WORKING_DIR = "/content/drive/MyDrive/hsicity_rgb_dataset"


In [44]:
# Let's change the working directory
%cd {DRIVE_WORKING_DIR}

/content/drive/MyDrive/hsicity_rgb_dataset


## Custom dataset

I've created a Cats and Dogs dataset, as a subset of the MS COCO dataset, exporting only images with dogs and cats.

You can [download it from kaggle](https://www.kaggle.com/datasets/estebanuriz/cats-and-dogs-for-object-detection/data).

Or you can use your own dataset, the procedure is exactly the same, as long as it is formatted for YOLO.


In [8]:
# I've uploaded the zipped dataset file cats_dogs.zip into my Google Drive folder:
!ls -lh *.zip

ls: cannot access '*.zip': No such file or directory


In [9]:
# decompress the zip to the hard drive instance

# NOTE:
# for training it is faster to place the files into the Colab instance
# hard drive, avoiding network latencies.

!unzip -o -qq cats_dogs.zip -d /content/datasets

unzip:  cannot find or open cats_dogs.zip, cats_dogs.zip.zip or cats_dogs.zip.ZIP.


In [10]:
# dataset files should be on /content/datasets:

!ls /content/datasets/cats_dogs

ls: cannot access '/content/datasets/cats_dogs': No such file or directory


In [11]:
# This script prints the dataset structure in a tree-like format.

import os

def count_files(dir_path, extensions):
    return sum(1 for f in os.listdir(dir_path) if os.path.isfile(os.path.join(dir_path, f)) and os.path.splitext(f)[1].lower() in extensions)


def count_image_files(dir_path):
    # Define common image extensions
    image_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.gif'}
    return count_files(dir_path, image_extensions)

def count_label_files(dir_path):
    # Define common label extensions
    label_extensions = {'.txt', '.xml'}
    return count_files(dir_path, label_extensions)

def print_dataset_structure(root_dir, indent="", is_last=True):
    """
      Prints the dataset structure in a tree-like format.

      Inputs:
        root_dir: str (path where a dataset is placed)


    """
    # Print the root directory
    if indent == "":
        print(root_dir)

    # Get sorted directory contents
    items = sorted(os.listdir(root_dir))
    items_count = len(items)

    for i, item in enumerate(items):
        item_path = os.path.join(root_dir, item)
        is_item_last = (i == items_count - 1)  # Check if the item is the last one

        # Determine prefix
        prefix = "└── " if is_item_last else "├── "
        next_indent = indent + ("    " if is_item_last else "│   ")

        if item == ".DS_Store":
            pass

        elif os.path.isdir(item_path):
            # Print directory name
            print(f"{indent}{prefix}{item}")
            # Recurse into subdirectory
            subdirs = sorted(os.listdir(item_path))
            for j, sub in enumerate(subdirs):
                sub_path = os.path.join(item_path, sub)
                if os.path.isdir(sub_path):
                    img_count = count_image_files(sub_path)
                    lbl_count = count_label_files(sub_path)
                    sub_prefix = "└── " if j == len(subdirs) - 1 else "├── "
                    found = []
                    if img_count > 0:
                        found.append(f"{img_count} images")
                    if lbl_count > 0:
                        found.append(f"{lbl_count} labels")
                    if len(found) == 0:
                      count = ""
                    else:
                      count = "[" + ", ".join(found) + "]"
                    print(f"{next_indent}{sub_prefix}{sub} {count}")
        else:
            print(f"{indent}{prefix}{item}")


In [12]:
print_dataset_structure("/content/drive/MyDrive/hsicity_proposed_dataet")

/content/drive/MyDrive/hsicity_proposed_dataet
├── .ipynb_checkpoints
├── dataset.yaml
├── images
│   ├── train [1054 images]
│   └── val [276 images]
├── labels
│   ├── train [1030 labels]
│   ├── val [276 labels]
├── runs
│   └── segment 
├── yolo11n-seg.pt
└── yolo11n.pt


In [13]:
# The data.yaml file, defines some dataset parameters, such as
# split file paths, and classnames:

!cat /content/drive/MyDrive/hsicity_proposed_dataet/dataset.yaml

train: /content/drive/MyDrive/hsicity_proposed_dataet/images/train
val: /content/drive/MyDrive/hsicity_proposed_dataet/images/val
names:
  0: road
  1: sidewalk
  2: building
  3: wall
  4: fence
  5: pole
  6: traffic light
  7: traffic sign
  8: vegetation
  9: terrain
  10: sky
  11: person
  12: rider
  13: car
  14: truck
  15: bus
  16: train
  17: motorcycle
  18: bicycle


## Visualizing Some Examples

When we work with a dataset it is always convenient to visualize some examples to see what they look like.

It is also advisable to check if the labels are correct and are in the format we expect.

The following functions allow us to plot some images and their corresponding labels:

In [14]:
import cv2
from matplotlib.patches import Rectangle
from matplotlib import pyplot as plt
import numpy as np
from glob import glob
import os



In [15]:
def read_labels(file_name):
  """
    Reads a label file
  """

  with open(file_name, "r") as f:
    lines = f.read().splitlines()

  labels = []
  for line in lines:
    label = [float(n) for n in line.split(" ")]
    label = int(label[0]), label[1:]
    labels.append(label)
  return labels


def plot(rgb, labels, class_names=['road','sidewalk','building','wall','fence','pole',
                                   'traffic light','traffic sign','vegetation','terrain',
                                   'sky','person','rider','car','truck','bus','train','motorcycle','bicycle']):
  """
    Given an RGB image and its labels,
    plots the image with the bounding boxes.
  """

  plt.figure(figsize=(18, 8))
  plt.imshow(rgb)

  img_w, img_h = rgb.shape[1], rgb.shape[0]

  ax = plt.gca()
  for label in labels:
    class_id, bbox = label
    cx, cy, w, h = bbox

    cx = img_w * cx
    cy = img_h * cy
    w = img_w * w
    h = img_h * h

    hw = w / 2
    hh = h / 2

    pt = (cx - hw, cy - hh)

    if class_id == 0:
      color = 'blue'
    else:
      color = 'red'

    ax.add_patch(Rectangle(
        pt, w, h,
        edgecolor = color,
        fill=None
    ))

    # Add class label above the bounding box
    label_text = class_names[int(class_id)]
    label_position = (cx - hw, cy - hh - 5)  # Slightly above the bbox
    ax.text(
      label_position[0], label_position[1],
      label_text,
      color=color,
      fontsize=12,
      fontweight='bold',
      bbox=dict(facecolor='white', edgecolor=color, alpha=0.7)
    )


  plt.show()

In [16]:
import glob
import numpy as np

def sample_some(
    data_root,
    split=None,
    n=10,
    seed=1234
):
  """

    Randomly samples a specified number of label and corresponding image files from a dataset.

    Parameters:
    -----------
    data_root : str.
        The root directory of the dataset.
    split : str, optional
        The split or subfolder within the `labels` directory to sample from. Defaults to `None`,
        which matches all subdirectories (i.e., "**").
    n : int, optional
        The number of label files to sample. Defaults to 10.
    seed : int, optional
        The random seed for reproducibility. Defaults to 1234.

    Returns:
    --------
    img_files : list of str
        A list of file paths to the corresponding image files in the `images` subdirectory.
    lbl_files : list of str
        A list of file paths to the sampled label files in the `labels` subdirectory.

    """

  np.random.seed(seed)

  if split is None:
    split = "**"

  pattern = os.path.join(data_root, f"labels/{split}/*.txt")
  label_files = sorted(glob.glob(pattern))

  lbl_files = np.random.choice(label_files, n, replace=False)
  lbl_files = list(lbl_files)

  img_files = [
      f.replace("labels", "images").replace(".txt", ".png")
      for f in lbl_files
  ]

  return img_files, lbl_files


In [17]:
img_files, lbl_files = sample_some(
    data_root="/content/drive/MyDrive/hsicity_proposed_dataet",
    split="val",
    n=3,
    seed=42
)

In [18]:
img_files

['/content/drive/MyDrive/hsicity_proposed_dataet/images/val/20210409_143420_01.png',
 '/content/drive/MyDrive/hsicity_proposed_dataet/images/val/20210409_170324_03.png',
 '/content/drive/MyDrive/hsicity_proposed_dataet/images/val/20210410_113423_02.png']

In [19]:
lbl_files

['/content/drive/MyDrive/hsicity_proposed_dataet/labels/val/20210409_143420_01.txt',
 '/content/drive/MyDrive/hsicity_proposed_dataet/labels/val/20210409_170324_03.txt',
 '/content/drive/MyDrive/hsicity_proposed_dataet/labels/val/20210410_113423_02.txt']

In [20]:
# choose some random files
# data_root = "/content/drive/MyDrive/hsicity_proposed_dataet"
# samples = sample_some(data_root, split='val')

# for image_file, label_file in zip(*samples):
#   image = cv2.imread(image_file)
#   rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
#   labels = read_labels(label_file)

#   for label in labels:
#     class_id, bbox = label
#     print(f"class_id: {class_id}, bbox: {bbox}")

#   plot(rgb, labels)


# 3. Training the model

Training the model requires intensive computation provided by GPUs. Theoretically we can train on CPU but it would take a long time. Therefore it is important to have a GPU to train. However, for data preparation and evaluation of results we can use CPU which is much less expensive. Therefore, the recommendation is to use GPU only during training.

In [21]:
import torch

cuda_available = torch.cuda.is_available()

if cuda_available:

  print("CUDA is available")
  !nvidia-smi

else:

  message = """
    WARNING: In order to train the model, it is advisable to use GPU.
    Change runtime type to GPU from:
      menu Runtime -> Change runtime type -> Hardware accelerator -> GPU.
      And run all the cells again.
  """
  print(message)

CUDA is available
Wed Feb 26 16:59:39 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   39C    P8             11W /   70W |       2MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                              

In [22]:
!pwd

/content/drive/MyDrive/hsicity_proposed_dataet


In [23]:
def serialized_model_file(
    checkpoint="best",
    use_run="train",
):
  """
    Returns the serialized file path.
  """
  return f"runs/detect/{use_run}/weights/{checkpoint}.pt"


In [52]:
import os
from ultralytics import YOLO

def train(
    data,
    use_run="train",
    fallback="yolo11n-seg.yaml",
    #fallback="yolo11n-seg.pt",
    epochs=100,
    augment=True,
):

  cuda_available = torch.cuda.is_available()
  if not cuda_available:
    print("CUDA is not available, skipping train.")
    return

  model_file = serialized_model_file("last", use_run)

  if os.path.exists(model_file):
    resume_training = True
    use_model = model_file
  else:
    resume_training = False
    use_model=fallback


  model = YOLO(
      use_model
  )

  model.train(
      data=data,
      resume=resume_training,
      epochs=epochs,
      optimizer="AdamW",
      lr0=0.0001,
      imgsz=320,
      batch=64,
      augment=augment
  )


In [None]:
train(data="/content/drive/MyDrive/hsicity_rgb_dataset/dataset.yaml", use_run=None)

Ultralytics 8.3.80 🚀 Python-3.11.11 torch-2.5.1+cu124 CUDA:0 (Tesla T4, 15095MiB)
[34m[1mengine/trainer: [0mtask=segment, mode=train, model=yolo11n-seg.yaml, data=/content/drive/MyDrive/hsicity_rgb_dataset/dataset.yaml, epochs=100, time=None, patience=100, batch=64, imgsz=320, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train6, exist_ok=False, pretrained=True, optimizer=AdamW, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=True, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=

[34m[1mtrain: [0mScanning /content/drive/MyDrive/hsicity_rgb_dataset/labels/train.cache... 0 images, 225 backgrounds, 51 corrupt: 100%|██████████| 276/276 [00:00<?, ?it/s]

[34m[1malbumentations: [0mBlur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, num_output_channels=3, method='weighted_average'), CLAHE(p=0.01, clip_limit=(1.0, 4.0), tile_grid_size=(8, 8))



[34m[1mval: [0mScanning /content/drive/MyDrive/hsicity_rgb_dataset/labels/val.cache... 276 images, 0 backgrounds, 0 corrupt: 100%|██████████| 276/276 [00:00<?, ?it/s]


Plotting labels to runs/segment/train6/labels.jpg... 
zero-size array to reduction operation maximum which has no identity
[34m[1moptimizer:[0m AdamW(lr=0.0001, momentum=0.937) with parameter groups 90 weight(decay=0.0), 101 weight(decay=0.0005), 100 bias(decay=0.0)
[34m[1mTensorBoard: [0mmodel graph visualization added ✅
Image sizes 320 train, 320 val
Using 2 dataloader workers
Logging results to [1mruns/segment/train6[0m
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


      1/100      2.73G          0          0      123.3          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.56it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:04<00:00,  1.62s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


      2/100      2.68G          0          0      122.1          0          0        320: 100%|██████████| 4/4 [00:01<00:00,  2.08it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.68s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


      3/100      2.68G          0          0        122          0          0        320: 100%|██████████| 4/4 [00:01<00:00,  2.21it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:10<00:00,  3.54s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


      4/100       2.7G          0          0        106          0          0        320: 100%|██████████| 4/4 [00:01<00:00,  2.01it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:07<00:00,  2.57s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


      5/100       2.7G          0          0      89.45          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.93it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:11<00:00,  3.84s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


      6/100       2.7G          0          0      58.52          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.85it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:06<00:00,  2.20s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


      7/100       2.7G          0          0      41.33          0          0        320: 100%|██████████| 4/4 [00:03<00:00,  1.21it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.86s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


      8/100       2.7G          0          0      28.84          0          0        320: 100%|██████████| 4/4 [00:03<00:00,  1.27it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:07<00:00,  2.40s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


      9/100       2.7G          0          0      20.01          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.69it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:10<00:00,  3.46s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     10/100       2.7G          0          0      14.03          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.79it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.96s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     11/100       2.7G          0          0      10.02          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.90it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:10<00:00,  3.34s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     12/100       2.7G          0          0      7.339          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.93it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.08s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     13/100       2.7G          0          0      5.492          0          0        320: 100%|██████████| 4/4 [00:01<00:00,  2.04it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:10<00:00,  3.34s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     14/100       2.7G          0          0      4.228          0          0        320: 100%|██████████| 4/4 [00:01<00:00,  2.19it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.15s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     15/100       2.7G          0          0      3.377          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.90it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.32s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     16/100       2.7G          0          0      2.753          0          0        320: 100%|██████████| 4/4 [00:01<00:00,  2.25it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.89s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     17/100       2.7G          0          0      2.259          0          0        320: 100%|██████████| 4/4 [00:01<00:00,  2.05it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.10s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     18/100       2.7G          0          0      1.862          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.78it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:07<00:00,  2.35s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     19/100       2.7G          0          0      1.557          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.42it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.77s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     20/100       2.7G          0          0      1.357          0          0        320: 100%|██████████| 4/4 [00:03<00:00,  1.32it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:06<00:00,  2.26s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     21/100       2.7G          0          0      1.202          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.50it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:10<00:00,  3.34s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     22/100       2.7G          0          0      1.071          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.69it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.82s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     23/100       2.7G          0          0     0.9563          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.94it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:10<00:00,  3.62s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     24/100       2.7G          0          0     0.8814          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.93it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.93s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     25/100       2.7G          0          0     0.8345          0          0        320: 100%|██████████| 4/4 [00:01<00:00,  2.04it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.33s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     26/100       2.7G          0          0     0.7911          0          0        320: 100%|██████████| 4/4 [00:01<00:00,  2.09it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.01s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     27/100       2.7G          0          0     0.7554          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.91it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.31s/it]


                   all        276      18647          0          0          0          0          0          0          0          0

      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     28/100       2.7G          0          0     0.7268          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.92it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.73s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     29/100       2.7G          0          0     0.6973          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.95it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.11s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     30/100       2.7G          0          0     0.6638          0          0        320: 100%|██████████| 4/4 [00:01<00:00,  2.27it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:07<00:00,  2.49s/it]


                   all        276      18647          0          0          0          0          0          0          0          0

      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     31/100       2.7G          0          0     0.6227          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.57it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:10<00:00,  3.55s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     32/100       2.7G          0          0     0.5873          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.70it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:07<00:00,  2.41s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     33/100       2.7G          0          0     0.5615          0          0        320: 100%|██████████| 4/4 [00:03<00:00,  1.32it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.97s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     34/100       2.7G          0          0     0.5414          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.43it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.95s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     35/100       2.7G          0          0     0.5242          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.46it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.15s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     36/100       2.7G          0          0       0.51          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.80it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.88s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     37/100       2.7G          0          0     0.4975          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.86it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.33s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     38/100       2.7G          0          0     0.4862          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.91it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.01s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     39/100       2.7G          0          0     0.4737          0          0        320: 100%|██████████| 4/4 [00:01<00:00,  2.23it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.20s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     40/100       2.7G          0          0      0.464          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.97it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.20s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     41/100       2.7G          0          0     0.4562          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.96it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.03s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     42/100       2.7G          0          0     0.4475          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.60it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.07s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     43/100       2.7G          0          0     0.4387          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.98it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.89s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     44/100       2.7G          0          0       0.43          0          0        320: 100%|██████████| 4/4 [00:01<00:00,  2.01it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.92s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     45/100       2.7G          0          0     0.4192          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.95it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:07<00:00,  2.53s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     46/100       2.7G          0          0     0.4067          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.47it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.74s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     47/100       2.7G          0          0      0.397          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.34it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:07<00:00,  2.59s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     48/100       2.7G          0          0     0.3857          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.52it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.93s/it]


                   all        276      18647          0          0          0          0          0          0          0          0

      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     49/100       2.7G          0          0     0.3773          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.64it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:08<00:00,  2.83s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     50/100       2.7G          0          0     0.3703          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.98it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.26s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     51/100       2.7G          0          0     0.3645          0          0        320: 100%|██████████| 4/4 [00:02<00:00,  1.99it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:09<00:00,  3.13s/it]

                   all        276      18647          0          0          0          0          0          0          0          0






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


     52/100       2.7G          0          0     0.3592          0          0        320: 100%|██████████| 4/4 [00:01<00:00,  2.10it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95):   0%|          | 0/3 [00:00<?, ?it/s]

In [30]:
import locale
def getpreferredencoding():
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding
!export LC_ALL=en_US.UTF-8
!export LANG=en_US.UTF-8

In [31]:
!yolo settings

JSONDict("/root/.config/Ultralytics/settings.json"):
{
  "settings_version": "0.0.6",
  "datasets_dir": "/content/datasets",
  "weights_dir": "weights",
  "runs_dir": "runs",
  "uuid": "569f3ba64b326db489132663f79cd37279811de477381b83ac131e6cdd129cbb",
  "sync": true,
  "api_key": "",
  "openai_api_key": "",
  "clearml": true,
  "comet": true,
  "dvc": true,
  "hub": true,
  "mlflow": true,
  "neptune": true,
  "raytune": true,
  "tensorboard": true,
  "wandb": false,
  "vscode_msg": true
}
💡 Learn more about Ultralytics Settings at https://docs.ultralytics.com/quickstart/#ultralytics-settings


# 4. Reviewing Training Results

After training we will find some resulting files into the runs directory:

## checking results into the "runs" directory

Ultralytics package creates the "runs" directory, where the results are stored. Since we are training an object detector, the detect/train directory is created:

If we train again with different configurations, new directories train2, train3, train4… will be created, unless we resume training. If we resume training, the corresponding directories are re-used.

In [35]:
!ls -l runs/segment/train

total 6
-rw------- 1 root root 1563 Feb 26 16:54 args.yaml
drwx------ 2 root root 4096 Feb 26 16:54 weights


### Training history

we can observe, a sharp worsening jump in the loss curves, and after that the model starts improving. One possible explanation is as follows: My dataset is a sub-set of COCO, the pre-trained weights at start were created  using the same images, so the model already knew how to detect cats and dogs. Surely I started with a very high learning rate, destroying the previous learned knowledge? Should I start with a much lower learning rate? or perhaps should I freeze the first layers, to avoid this effect and achieve much faster convergence?

In [36]:

from PIL import Image

Image.open("runs/segment/train/results.png")

FileNotFoundError: [Errno 2] No such file or directory: 'runs/segment/train/results.png'

### args.yaml

Into the `args.yaml` file we can check the running arguments, like total number of epochs, enabled augmentations, etc.

Check its contents to see which parameters were used for this experiment.

In [37]:
!cat runs/segment/train/args.yaml

task: segment
mode: train
model: yolo11n-seg.pt
data: /content/datasets/cats_dogs/data.yaml
epochs: 100
time: null
patience: 100
batch: 64
imgsz: 320
save: true
save_period: -1
cache: false
device: null
workers: 8
project: null
name: train
exist_ok: false
pretrained: true
optimizer: AdamW
verbose: true
seed: 0
deterministic: true
single_cls: false
rect: false
cos_lr: false
close_mosaic: 10
resume: false
amp: true
fraction: 1.0
profile: false
freeze: null
multi_scale: false
overlap_mask: true
mask_ratio: 4
dropout: 0.0
val: true
split: val
save_json: false
save_hybrid: false
conf: null
iou: 0.7
max_det: 300
half: false
dnn: false
plots: true
source: null
vid_stride: 1
stream_buffer: false
visualize: false
augment: true
agnostic_nms: false
classes: null
retina_masks: false
embed: null
show: false
save_frames: false
save_txt: false
save_conf: false
save_crop: false
show_labels: true
show_conf: true
show_boxes: true
line_width: null
format: torchscript
keras: false
optimize: false
int8: fa

### checkpoint files

On the weights directory we can find the PyTorch serialized checkpoints files:
- best.pt
- last.pt

These files contain the model weights as well as the training state, so that we can continue training from these serialized files. Use last.pt for continue training, and best.pt for production.

In [38]:
!ls -lh runs/segment/train/weights

total 0


### Batch visualizations

The files like `train_batch#.jpg` and `val_batch_pred#.jpg` are grids of batch images and its corresponding bounding boxes:

#### Training Visualization
here is a training batch visualization.
Note the applied augmentation transformations like mosaic, flip, and intensities shifts.

In [39]:

Image.open("runs/segment/train/train_batch0.jpg")


FileNotFoundError: [Errno 2] No such file or directory: 'runs/segment/train/train_batch0.jpg'

#### Validation Visualization
here is a validation batch visualization. We can see the predicted bounding boxes, each one with its corresponding class and prediction confidence.

In [40]:
Image.open("runs/segment/train/val_batch0_pred.jpg")

FileNotFoundError: [Errno 2] No such file or directory: 'runs/segment/train/val_batch0_pred.jpg'

### PR_curve.png

Precision-Recall plot which helps us evaluate the object detector by showing the trade-off between precision and recall across different thresholds.

By analyzing the PR curve, we can choose confidence thresholds that align with the specific goals of your application, such as prioritizing precision (e.g., for safety-critical tasks) or recall (e.g., for exhaustive search tasks).


In [None]:
Image.open("runs/segment/train/PR_curve.png")

### confusion_matrix.png

A confusion matrix in object detection is useful for analyzing how well the model distinguishes between object classes (e.g., cat vs. dog) and background, highlighting misclassifications and false positives/negatives at a glance.

By analyzing the confusion matrix, you can pinpoint which classes are most problematic, evaluate the balance between precision and recall, and identify areas where the model might need improvement, such as adjusting thresholds, augmenting the dataset, or fine-tuning the training process.

In this case, with a simple glance we see that the weakest point of the model is its False Negatives, that is, cases where the model claims that there are cats or dogs, but according to the labels, they are not really there.

In [None]:
Image.open("runs/segment/train/confusion_matrix.png")

## Testing the model

In [None]:
!pwd

### Predicting via cli


In [None]:
# !yolo detect predict model=runs/detect/train/weights/best.pt source=/content/datasets/cats_dogs/images/val/000000547502.jpg

In [None]:
# from PIL import Image
# Image.open("runs/segment/predict4/000000547502.jpg")

### Predicting using Python

#### Load the best.pt model

In [None]:
from ultralytics import YOLO

# Load the best model we have so far:
model_file = serialized_model_file("best")
print(f"loading checkpoint {model_file}")
model = YOLO(model_file)


#### Ploting the results using matlab

Ultralytics provide tools for plotting the results, but in a real-world application we'll want to use the prediction results programmatically.

The following code reads prediction results, its bounding boxes and plots them using matplotlib.


In [None]:
from matplotlib import pyplot as plt
from matplotlib.patches import Rectangle

def convert_to_pixel_coords(box, img_width, img_height):
    """
    Convert YOLO format box (cx, cy, w, h) to pixel coordinates.
    """
    return [
        box[0] * img_width,  # cx
        box[1] * img_height, # cy
        box[2] * img_width,  # w
        box[3] * img_height  # h
    ]

def calculate_iou(box1, box2):
    """
    Calculate Intersection over Union (IoU) for two bounding boxes.
    Boxes are in the format (cx, cy, w, h).
    """
    # Convert (cx, cy, w, h) to (x1, y1, x2, y2)
    x1_box1, y1_box1 = box1[0] - box1[2] / 2, box1[1] - box1[3] / 2
    x2_box1, y2_box1 = box1[0] + box1[2] / 2, box1[1] + box1[3] / 2

    x1_box2, y1_box2 = box2[0] - box2[2] / 2, box2[1] - box2[3] / 2
    x2_box2, y2_box2 = box2[0] + box2[2] / 2, box2[1] + box2[3] / 2

    # Calculate intersection
    inter_x1 = max(x1_box1, x1_box2)
    inter_y1 = max(y1_box1, y1_box2)
    inter_x2 = min(x2_box1, x2_box2)
    inter_y2 = min(y2_box1, y2_box2)

    inter_area = max(0, inter_x2 - inter_x1) * max(0, inter_y2 - inter_y1)

    # Calculate union
    box1_area = (x2_box1 - x1_box1) * (y2_box1 - y1_box1)
    box2_area = (x2_box2 - x1_box2) * (y2_box2 - y1_box2)

    union_area = box1_area + box2_area - inter_area

    return inter_area / union_area if union_area > 0 else 0

def plot_result(rgb, result, label=None, iou_threshold=0.5):
    """
    Plot YOLO prediction results and compare with ground truth labels if provided.

    Parameters:
      - rgb: numpy array of the RGB image.
      - result: YOLO prediction result.
      - label: Optional ground truth labels [(class_id, [cx, cy, w, h]), ...].
      - iou_threshold: IoU threshold to match predictions with ground truth.
    """
    fig, ax = plt.subplots(figsize=(10, 10))
    ax.imshow(rgb)

    class_names = result.names
    height, width, _ = rgb.shape

    predictions = result.boxes.xywh.cpu().numpy()
    pred_classes = result.boxes.cls.cpu().numpy()

    gt_used = [False] * len(label) if label else []

    for i, pred_box in enumerate(predictions):
        pred_class_id = int(pred_classes[i])
        pred_box = pred_box.tolist()
        matched = False

        if label:
            for j, (gt_class_id, gt_box) in enumerate(label):
                if not gt_used[j] and gt_class_id == pred_class_id:
                    # Convert ground truth box to pixel values
                    gt_box_pixel = convert_to_pixel_coords(gt_box, width, height)
                    iou = calculate_iou(pred_box, gt_box_pixel)
                    if iou >= iou_threshold:
                        matched = True
                        gt_used[j] = True
                        break

        color = 'green' if matched else 'red'
        cx, cy, w, h = pred_box
        hw, hh = w / 2, h / 2

        ax.add_patch(Rectangle(
            (cx - hw, cy - hh), w, h,
            edgecolor=color,
            fill=None,
            linewidth=2
        ))

        label_text = f"{class_names[pred_class_id]} ({iou:.2f})" if matched else class_names[pred_class_id]
        ax.text(
            cx - hw, cy - hh - 5,
            label_text,
            color=color,
            fontsize=10,
            fontweight='bold',
            bbox=dict(facecolor='white', edgecolor=color, alpha=0.7)
        )

    if label:
        for gt_class_id, gt_box in label:
            # Convert ground truth box to pixel values
            gt_box_pixel = convert_to_pixel_coords(gt_box, width, height)
            cx, cy, w, h = gt_box_pixel
            hw, hh = w / 2, h / 2

            ax.add_patch(Rectangle(
                (cx - hw, cy - hh), w, h,
                edgecolor='blue',
                fill=None,
                linestyle='--',
                linewidth=1
            ))

    plt.show()


In [None]:
samples = sample_some(
    data_root="/content/datasets/cats_dogs",
    split="val",
    seed=42,
    n=16
)


In [None]:
for image_file, labels_file in zip(*samples):

    img = cv2.imread(image_file)
    rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    labels = read_labels(labels_file)
    result = model.predict(rgb, conf=0.5, iou=0.3)[0]

    plot_result(rgb, result, labels)