# COMP9517 Group Project

### Model 4 : YOLOv8-NANO Segmentation

State-of-the-Art Benchmark

In order to train this state-of-the-art pretrained model as a benchmark for our turtles dataset, we will need to convert our dataset (currently in COCO format) into YOLO format. There are a lot of pre-existing methods which already cover this as YOLO is trained on the COCO dataset, so this is quite straightforward to process. As Ultralytics (the creators of YOLO) already have a tool that does this for us, we will use their tool for this model. Link to this tool can be found here https://github.com/ultralytics/JSON2YOLO

First we will migrate all the labels and images from the subfolders into a single folder. This code is based on Ultralytic's JSON2YOLO conversion which has been edited to load our updated_annotations.json and change to YOLO format.

Expected output : Run the following cell with the correct repository structure. That will then create a new folder with a file of all the new yolo annotations.

```
.
├── seaturtle-seg.YAML
└── yolo-dataset
    ├── seaturtle-seg.YAML
    └── labels
```

In [1]:
import os
from pycocotools.coco import COCO
from pycocotools import mask as maskUtils
from pathlib import Path
import numpy as np
import cv2

coco = COCO("./turtles-data/data/updated_annotations.json")
labels_dir = "./yolo-dataset/labels/"
Path(labels_dir).mkdir(
    parents=True, exist_ok=True
)

# Get category IDs and create a mapping for YOLO class IDs
category_ids = coco.getCatIds()
categories = coco.loadCats(category_ids)
category_mapping = {cat["id"]: idx for idx, cat in enumerate(categories)}

# Iterate over all images in the dataset
image_ids = coco.getImgIds()
for image_id in image_ids:
    # Load image information
    image_info = coco.loadImgs(image_id)[0]
    img_width, img_height = image_info["width"], image_info["height"]
    image_name = Path(image_info["file_name"]).stem

    # Get all annotations for the current image
    ann_ids = coco.getAnnIds(imgIds=image_id)
    annotations = coco.loadAnns(ann_ids)

    # Initialize a list to hold YOLO-format data for this image
    yolo_annotations = []

    for ann in annotations:
        category_id = ann["category_id"]
        yolo_class_id = category_mapping[category_id]

        if "segmentation" in ann and ann["segmentation"]:
            if (
                isinstance(ann["segmentation"], dict)
                and "counts" in ann["segmentation"]
            ):
                # Handle unencoded RLE format by encoding it first
                rle = maskUtils.frPyObjects(ann["segmentation"], img_height, img_width)
                binary_mask = maskUtils.decode(rle)  # Decode the RLE to binary mask

                # Find contours from the binary mask
                contours, _ = cv2.findContours(
                    binary_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
                )

                for contour in contours:
                    if (
                        len(contour) >= 3
                    ):  # Ensure it has at least 3 points to form a polygon
                        yolo_format_points = []
                        for point in contour:
                            x, y = point[0]
                            x_normalized = x / img_width
                            y_normalized = y / img_height
                            yolo_format_points.extend([x_normalized, y_normalized])

                        # Add the class ID at the beginning of the line, followed by the points
                        yolo_annotations.append(
                            f"{yolo_class_id} " + " ".join(map(str, yolo_format_points))
                        )

        elif "bbox" in ann:
            # Fallback to bounding box if no segmentation (bbox format: [x_min, y_min, width, height])
            bbox = ann["bbox"]
            x_center = (bbox[0] + bbox[2] / 2) / img_width
            y_center = (bbox[1] + bbox[3] / 2) / img_height
            width = bbox[2] / img_width
            height = bbox[3] / img_height

            # Format as YOLO bounding box: [class_id, x_center, y_center, width, height]
            yolo_annotations.append(
                f"{yolo_class_id} {x_center} {y_center} {width} {height}"
            )

    # Write annotations to YOLO format file
    label_file_path = os.path.join(labels_dir, f"{image_name}.txt")
    with open(label_file_path, "w") as f:
        f.write("\n".join(yolo_annotations))

loading annotations into memory...
Done (t=3.77s)
creating index...
index created!


Next, run the following command to move all the images into the yolo-dataset folder as well under a subfolder called images.

In [3]:
# Run this to get all images from turtles-data and save them all in a single folder. Then we will partition
# the images into training and testing sets.

import os
import shutil
from tqdm import tqdm

# Source and destination directories
source_dir = "./turtles-data/data/images/"
destination_dir = "./yolo-dataset/images/"

# Create the destination directory if it doesn't exist
os.makedirs(destination_dir, exist_ok=True)

image_extensions = (".jpg", ".jpeg", ".png")
image_files = []

# Traverse through all directories and subdirectories in the source directory
for root, dirs, files in os.walk(source_dir):
    for file in files:
        if file.lower().endswith(image_extensions):
            image_files.append(os.path.join(root, file))

# Progress bar for visualisation instead of a lot of prints
for source_file_path in tqdm(image_files, desc="Copying images . . .", unit="file"):
    destination_file_path = os.path.join(destination_dir, os.path.basename(source_file_path))
    shutil.copy2(source_file_path, destination_file_path)

Copying images . . .: 100%|██████████| 8709/8709 [01:06<00:00, 131.17file/s]


Next, we will check and ensure the lengths of all images matches with labels (no missing labels or images)

In [4]:
# Check how many images are in folder
print(f"Number of images in folder: {len(os.listdir(destination_dir))}")

# Check how many labels are in the labels folder
print(f"Number of labels in folder: {len(os.listdir('./yolo-dataset/labels/'))}")

Number of images in folder: 8709
Number of labels in folder: 8709


Unfortunately the computing power of our devices cannot train all of these images. To make the training and validation a lot easier, we will purge 80% of the dataset to 1000 images, 800 for training and 200 for validation. The following script will randomly whittle the dataset away until we only have 1000 datapoints, in pairs.

In [12]:
# We will hardcode the deletion of the first 7709 random images sampled. Then we will be left with 1000...
import random
import os

image_dir = "./yolo-dataset/images/"
label_dir = "./yolo-dataset/labels/"

image_files = {os.path.splitext(img)[0]: img for img in os.listdir(image_dir) if img.lower().endswith((".jpg", ".jpeg", ".png"))}
label_files = {os.path.splitext(txt)[0]: txt for txt in os.listdir(label_dir) if txt.endswith(".txt")}
matching_files = list(image_files.keys() & label_files.keys())  # Match pairs

# print(len(file_pairs)) # 8709

# Randomly select 1000 pairs to keep
keep_pairs = set(random.sample(matching_files, 1000))

# Determine which pairs to delete
delete_pairs = set(matching_files) - keep_pairs

# Delete the selected 7709 pairs
for filename in tqdm(delete_pairs, desc="Deleting unselected pairs", unit="pair"):
    os.remove(os.path.join(image_dir, image_files[filename]))
    os.remove(os.path.join(label_dir, label_files[filename]))

Deleting unselected pairs: 100%|██████████| 7709/7709 [00:05<00:00, 1304.92pair/s]


In [13]:
# Check how many images are in folder
print(f"Number of images in folder: {len(os.listdir(destination_dir))}")

# Check how many labels are in the labels folder
print(f"Number of labels in folder: {len(os.listdir('./yolo-dataset/labels/'))}")

Number of images in folder: 1000
Number of labels in folder: 1000


Next we will perform the YOLO dataset split in a 80-20 configuration. The file structure looks like this:

```
.
├── seaturtle-seg.YAML
├── images
│   ├── train
│   └── test
└── labels
    ├── train               
    └── test
```

In [14]:
import os
import shutil
import random

# Paths for images and labels
images_dir = "./yolo-dataset/images/"
labels_dir = "./yolo-dataset/labels/"

# Paths for train/test split
train_images_dir = "./yolo-dataset/images/train"
train_labels_dir = "./yolo-dataset/labels/train"
test_images_dir = "./yolo-dataset/images/test"
test_labels_dir = "./yolo-dataset/labels/test"

# Create train/test directories if they don't exist
os.makedirs(train_images_dir, exist_ok=True)
os.makedirs(train_labels_dir, exist_ok=True)
os.makedirs(test_images_dir, exist_ok=True)
os.makedirs(test_labels_dir, exist_ok=True)

# Filter only image files (adjust extensions if needed)
image_extensions = (".jpg", ".jpeg", ".png")
image_files = [f for f in os.listdir(images_dir) if f.lower().endswith(image_extensions)]

# Shuffle and split
split_index = int(0.8 * len(image_files))
train_files = image_files[:split_index]
test_files = image_files[split_index:]


# Function to move files
def move_files(
    file_list, src_images_dir, src_labels_dir, dest_images_dir, dest_labels_dir
):
    for image_file in file_list:
        label_file = (
            os.path.splitext(image_file)[0] + ".txt"
        )  # Get corresponding label filename

        # Check if both image and label exist
        src_image_path = os.path.join(src_images_dir, image_file)
        src_label_path = os.path.join(src_labels_dir, label_file)

        if os.path.exists(src_image_path) and os.path.exists(src_label_path):
            # Move image and label to destination
            shutil.move(src_image_path, os.path.join(dest_images_dir, image_file))
            shutil.move(src_label_path, os.path.join(dest_labels_dir, label_file))


# Move train and test files
move_files(train_files, images_dir, labels_dir, train_images_dir, train_labels_dir)
move_files(test_files, images_dir, labels_dir, test_images_dir, test_labels_dir)

print("[COMPLETE]: Dataset split into train and test sets with an 80-20 ratio.")

# Check how many in each folder
print(f"Number of images in train folder: {len(os.listdir(train_images_dir))}")
print(f"Number of labels in train folder: {len(os.listdir(train_labels_dir))}")
print(f"Number of images in test folder: {len(os.listdir(test_images_dir))}")
print(f"Number of labels in test folder: {len(os.listdir(test_labels_dir))}")

[COMPLETE]: Dataset split into train and test sets with an 80-20 ratio.
Number of images in train folder: 800
Number of labels in train folder: 800
Number of images in test folder: 200
Number of labels in test folder: 200


In [15]:
# Check how many items are in folder
print(f"Number of images in folder: {len(os.listdir(destination_dir))}")

# Check how many items are in the labels folder
print(f"Number of labels in folder: {len(os.listdir('./yolo-dataset/labels/'))}")

Number of images in folder: 2
Number of labels in folder: 2


In [21]:
from ultralytics import YOLO
import torch

# Try to set gpu
# print(torch.cuda.is_available())
# torch.cuda.set_device(0)

# Load a model
model = YOLO("yolov8n-seg.pt")
print(model.device)

# Train the model
results = model.train(
    data="yolo-dataset/seaturtle-seg.yaml", 
    epochs=50, 
    imgsz=128)

# Validate the model
metrics = model.val()  

cpu
Ultralytics 8.3.28  Python-3.11.9 torch-2.0.1+cpu CPU (Intel Core(TM) i7-8700 3.20GHz)
[34m[1mengine\trainer: [0mtask=segment, mode=train, model=yolov8n-seg.pt, data=yolo-dataset/seaturtle-seg.yaml, epochs=50, time=None, patience=100, batch=16, imgsz=128, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train3, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels

[34m[1mtrain: [0mScanning D:\Dropbox\UNSW - MIT\COMP9517 - Computer Vision\Group Project\SeaTurtleID2022\yolo-dataset\labels\train.cache... 800 images, 0 backgrounds, 0 corrupt: 100%|██████████| 800/800 [00:00<?, ?it/s]

[34m[1malbumentations: [0mBlur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, num_output_channels=3, method='weighted_average'), CLAHE(p=0.01, clip_limit=(1.0, 4.0), tile_grid_size=(8, 8))



[34m[1mval: [0mScanning D:\Dropbox\UNSW - MIT\COMP9517 - Computer Vision\Group Project\SeaTurtleID2022\yolo-dataset\labels\test.cache... 200 images, 0 backgrounds, 0 corrupt: 100%|██████████| 200/200 [00:00<?, ?it/s]

Plotting labels to runs\segment\train3\labels.jpg... 





[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.001429, momentum=0.9) with parameter groups 66 weight(decay=0.0), 77 weight(decay=0.0005), 76 bias(decay=0.0)
[34m[1mTensorBoard: [0mmodel graph visualization added 
Image sizes 128 train, 128 val
Using 0 dataloader workers
Logging results to [1mruns\segment\train3[0m
Starting training for 50 epochs...

      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


       1/50         0G      1.633      3.284       2.83      1.157        109        128: 100%|██████████| 50/50 [00:45<00:00,  1.10it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:07<00:00,  1.05s/it]

                   all        200        902      0.896      0.271      0.353      0.181       0.73      0.146       0.15     0.0515






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


       2/50         0G       1.55      2.574      1.406      1.067        151        128: 100%|██████████| 50/50 [00:46<00:00,  1.08it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:07<00:00,  1.08s/it]

                   all        200        902      0.691      0.466      0.527      0.306      0.568      0.371      0.374      0.144






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


       3/50         0G      1.434      2.377       1.11      1.045        143        128: 100%|██████████| 50/50 [00:42<00:00,  1.17it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:07<00:00,  1.02s/it]

                   all        200        902      0.711      0.475      0.541      0.312      0.602      0.403      0.413      0.172






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


       4/50         0G      1.393      2.288      1.014      1.032        129        128: 100%|██████████| 50/50 [00:41<00:00,  1.20it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:06<00:00,  1.00it/s]

                   all        200        902      0.705      0.552      0.573      0.333      0.596      0.445      0.438      0.198






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


       5/50         0G      1.391      2.216     0.9924      1.032        113        128: 100%|██████████| 50/50 [00:40<00:00,  1.22it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:07<00:00,  1.02s/it]

                   all        200        902      0.721      0.565      0.602      0.333      0.692      0.469      0.513       0.23






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


       6/50         0G      1.328      2.148     0.9375       1.02        133        128: 100%|██████████| 50/50 [00:41<00:00,  1.20it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:07<00:00,  1.02s/it]

                   all        200        902      0.813       0.56      0.629      0.376      0.725      0.497       0.54      0.244






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


       7/50         0G      1.271      2.013     0.8785     0.9974        110        128: 100%|██████████| 50/50 [00:41<00:00,  1.21it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:06<00:00,  1.02it/s]

                   all        200        902       0.79      0.578      0.631      0.372      0.625      0.429      0.429      0.165






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


       8/50         0G      1.242      2.009     0.8607     0.9877        117        128: 100%|██████████| 50/50 [00:40<00:00,  1.24it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 7/7 [00:07<00:00,  1.04s/it]

                   all        200        902        0.8      0.607      0.653      0.405      0.749      0.533      0.566      0.269






      Epoch    GPU_mem   box_loss   seg_loss   cls_loss   dfl_loss  Instances       Size


       9/50         0G      1.251      1.981     0.8467     0.9952        139        128:  72%|███████▏  | 36/50 [00:30<00:12,  1.17it/s]


KeyboardInterrupt: 

Predictions: 

In [22]:
import cv2
from ultralytics import YOLO

img_path = "yolo-dataset/images/test/UVGxbpffXc.jpg"
model_path = "runs/segment/train/weights/best.pt"
model = YOLO(model_path)
results = model(img_path)


img = cv2.imread(img_path)
height, width, _ = img.shape

for result in results:
    for _, mask in enumerate(result.masks.data):
        mask = mask.numpy() * 255
        mask = cv2.resize(mask, (width, height))
        cv2.imwrite("sample_test_yolov8n_seg.jpg", mask)


image 1/1 d:\Dropbox\UNSW - MIT\COMP9517 - Computer Vision\Group Project\SeaTurtleID2022\yolo-dataset\images\test\UVGxbpffXc.jpg: 96x128 1 turtle, 2 flippers, 1 head, 15.0ms
Speed: 1.0ms preprocess, 15.0ms inference, 1.0ms postprocess per image at shape (1, 3, 96, 128)
