# Installations

In order to access the YOLO model, we need to first install Ultralytics.

In [2]:
%pip install ultralytics
%pip install opencv-python

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


# Image Registration (optional)

In [None]:
import os
import cv2
import shutil
from pathlib import Path

# Paths
root_dir = Path("/Users/ikmalbasirun/Documents/GitHub/G5_image_classification_project/WiSARD_dataset/registered")
dataset_splits = ["train", "val", "test"]

# Image registration function (thermal aligned to RGB)
def register_thermal_to_rgb(rgb_path, thermal_path, output_path):
    rgb_img = cv2.imread(str(rgb_path), cv2.IMREAD_COLOR)
    thermal_img = cv2.imread(str(thermal_path), cv2.IMREAD_GRAYSCALE)

    # Resize thermal to match RGB if needed
    thermal_resized = cv2.resize(thermal_img, (rgb_img.shape[1], rgb_img.shape[0]))

    # Save aligned thermal image
    output_path.parent.mkdir(parents=True, exist_ok=True)
    cv2.imwrite(str(output_path), thermal_resized)

# Copy labels
for split in dataset_splits:
    rgb_dir = root_dir / split / "rgb"
    thermal_dir = root_dir / split / "thermal"
    labels_dir = root_dir / split / "labels"

    # Create output labels dir
    labels_dir.mkdir(parents=True, exist_ok=True)

    for rgb_img_path in rgb_dir.glob("*.jpg"):
        base_name = rgb_img_path.stem

        # 1. Register thermal to RGB
        thermal_img_path = thermal_dir / f"{base_name}.jpg"
        if thermal_img_path.exists():
            register_thermal_to_rgb(rgb_img_path, thermal_img_path, thermal_img_path)

        # 2. Copy label file from source labels folder
        source_label = Path(f"data/{split}/labels/{base_name}.txt")  # adjust if your original labels are elsewhere
        dest_label = labels_dir / f"{base_name}.txt"
        if source_label.exists():
            shutil.copy(source_label, dest_label)

print("Thermal registration and label copying complete.")

Thermal registration and label copying complete.


# Model Training

From Ultralytics, now we can import the YOLO model.

In [2]:
from ultralytics import YOLO

YOLO has different types of model, so we need to specify which version we would like to load. In this case, we will be using a **pre-trained** yolo-version-11.

In [None]:
# Load a model
model = YOLO("yolo5n.pt")  # load a pretrained model

We're training our model via a YAML file. We would have used a csv file, but this model does not recognise this. YAML files are unfortunately more complicated, so we will discuss this further during the meeting.

In [12]:
# Train the model
results = model.train(data="wisard_dataset.yaml", epochs=3)

Ultralytics 8.3.91 🚀 Python-3.9.6 torch-2.6.0 CPU (Apple M4 Pro)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=yolo11n.pt, data=wisard_dataset.yaml, epochs=3, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train9, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line

[34m[1mtrain: [0mScanning /Users/ikmalbasirun/Documents/GitHub/G5_image_classification_project/WiSARD_dataset/train/labels.cache... 185 images, 0 backgrounds, 0 corrupt: 100%|██████████| 185/185 [00:00<?, ?it/s]
[34m[1mval: [0mScanning /Users/ikmalbasirun/Documents/GitHub/G5_image_classification_project/WiSARD_dataset/val/labels.cache... 79 images, 3 backgrounds, 0 corrupt: 100%|██████████| 79/79 [00:00<?, ?it/s]

Plotting labels to runs/detect/train9/labels.jpg... 





[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.002, momentum=0.9) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to [1mruns/detect/train9[0m
Starting training for 3 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        1/3         0G      2.865      3.268      1.002         57        640: 100%|██████████| 12/12 [01:03<00:00,  5.31s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:07<00:00,  2.65s/it]

                   all         79        301    0.00169      0.133    0.00122   0.000287

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size



        2/3         0G      2.654      2.853     0.9582         39        640: 100%|██████████| 12/12 [00:55<00:00,  4.61s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:07<00:00,  2.52s/it]

                   all         79        301    0.00321      0.252     0.0036   0.000723

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size



        3/3         0G      2.549      2.518     0.9465         69        640: 100%|██████████| 12/12 [00:52<00:00,  4.37s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:07<00:00,  2.51s/it]

                   all         79        301    0.00494      0.389     0.0114    0.00343

3 epochs completed in 0.054 hours.





Optimizer stripped from runs/detect/train9/weights/last.pt, 5.4MB
Optimizer stripped from runs/detect/train9/weights/best.pt, 5.4MB

Validating runs/detect/train9/weights/best.pt...
Ultralytics 8.3.91 🚀 Python-3.9.6 torch-2.6.0 CPU (Apple M4 Pro)
YOLO11n summary (fused): 100 layers, 2,582,347 parameters, 0 gradients, 6.3 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:07<00:00,  2.45s/it]


                   all         79        301    0.00489      0.385     0.0113    0.00342
Speed: 0.2ms preprocess, 60.5ms inference, 0.0ms loss, 0.8ms postprocess per image
Results saved to [1mruns/detect/train9[0m


Model.val() function gives us performance metrics.

In [13]:
model.val()

Ultralytics 8.3.91 🚀 Python-3.9.6 torch-2.6.0 CPU (Apple M4 Pro)
YOLO11n summary (fused): 100 layers, 2,582,347 parameters, 0 gradients, 6.3 GFLOPs


[34m[1mval: [0mScanning /Users/ikmalbasirun/Documents/GitHub/G5_image_classification_project/WiSARD_dataset/val/labels.cache... 79 images, 3 backgrounds, 0 corrupt: 100%|██████████| 79/79 [00:00<?, ?it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:08<00:00,  1.64s/it]


                   all         79        301    0.00489      0.385     0.0113    0.00342
Speed: 0.2ms preprocess, 71.7ms inference, 0.0ms loss, 0.8ms postprocess per image
Results saved to [1mruns/detect/train92[0m


ultralytics.utils.metrics.DetMetrics object with attributes:

ap_class_index: array([0])
box: ultralytics.utils.metrics.Metric object
confusion_matrix: <ultralytics.utils.metrics.ConfusionMatrix object at 0x174de96a0>
curves: ['Precision-Recall(B)', 'F1-Confidence(B)', 'Precision-Confidence(B)', 'Recall-Confidence(B)']
curves_results: [[array([          0,    0.001001,    0.002002,    0.003003,    0.004004,    0.005005,    0.006006,    0.007007,    0.008008,    0.009009,     0.01001,    0.011011,    0.012012,    0.013013,    0.014014,    0.015015,    0.016016,    0.017017,    0.018018,    0.019019,     0.02002,    0.021021,    0.022022,    0.023023,
          0.024024,    0.025025,    0.026026,    0.027027,    0.028028,    0.029029,     0.03003,    0.031031,    0.032032,    0.033033,    0.034034,    0.035035,    0.036036,    0.037037,    0.038038,    0.039039,     0.04004,    0.041041,    0.042042,    0.043043,    0.044044,    0.045045,    0.046046,    0.047047,
          0.048048,    

# Predicting Humans in An Image



In [19]:
from PIL import Image

im1 = Image.open("/Users/ikmalbasirun/Documents/GitHub/G5_image_classification_project/WiSARD_dataset/val/images/210417_MtErie_Enterprise_VIS_0003_00000193.jpeg")
results = model.predict(source=[im1])


0: 384x640 (no detections), 21.8ms
Speed: 2.1ms preprocess, 21.8ms inference, 0.2ms postprocess per image at shape (1, 3, 384, 640)


Here, (no detections) indicate that the model did not detect any humans in the image.