# Can YOLOv8 detect types of bears with a small custom dataset
First load yolov8 nano from scratch and train it on the bears with 10 epochs and see what happens.
The dataset is 38 images so it truly is very small. I labeled it myself.

In [2]:
from ultralytics import YOLO

# Load a model
model = YOLO("yolov8n.yaml")  # build a new model from scratch

# Use the model
model.train(data="config.yaml", epochs=10)  # train the model



                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]                
  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]             
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]                
  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]             
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]               
  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]           
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128

### prediction and actual labels (10 epochs)
<img src="./runs/detect/train3/val_batch0_pred.jpg" alt="prediction" width="500">

<img src="./runs/detect/train3/val_batch0_labels.jpg" alt="labels" width="500">

It didn't predict anything at all. Maybe we need more training. Let's try ONE HUNDRED EPOCHS.

In [3]:
model.train(data="config.yaml", epochs=100)  # train the model

Ultralytics YOLOv8.0.157  Python-3.11.3 torch-2.0.1+cpu CPU (Intel Core(TM) i7-6820HQ 2.70GHz)
[34m[1mengine\trainer: [0mtask=detect, mode=train, model=yolov8n.yaml, data=config.yaml, epochs=100, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=0, project=None, name=None, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, line_width=None, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, in

### prediction and actual labels (100 epochs)
<img src="./runs/detect/train4/val_batch0_pred.jpg" alt="prediction" width="500">

<img src="./runs/detect/train4/val_batch0_labels.jpg" alt="labels" width="500">
<br>
<img src="./runs/detect/train4/val_batch1_pred.jpg" alt="prediction" width="500">

<img src="./runs/detect/train4/val_batch1_labels.jpg" alt="labels" width="500">


It actually predicts some now, but not that great. Adding more data would probably help a lot but I don't want to do any more labelling. 


Let's try a pre-trained model and see how well it performs, first without fine tuning.

In [7]:
model = YOLO("yolov8n.pt")  # load a pretrained model
model.val(data="config.yaml") # try out the pretrained model

Ultralytics YOLOv8.0.157  Python-3.11.3 torch-2.0.1+cpu CPU (Intel Core(TM) i7-6820HQ 2.70GHz)
YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients
[34m[1mval: [0mScanning D:\proj\yolo_intro\bears\labels.cache... 38 images, 0 backgrounds, 0 corrupt: 100%|██████████| 38/38 [00:00<?, ?it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:07<00:00,  2.44s/it]
                   all         38         43          0          0          0          0
Speed: 1.3ms preprocess, 124.4ms inference, 0.0ms loss, 2.1ms postprocess per image
Results saved to [1mruns\detect\val2[0m


ultralytics.utils.metrics.DetMetrics object with attributes:

ap_class_index: []
box: ultralytics.utils.metrics.Metric object
confusion_matrix: <ultralytics.utils.metrics.ConfusionMatrix object at 0x0000028282939DD0>
fitness: 0.0
keys: ['metrics/precision(B)', 'metrics/recall(B)', 'metrics/mAP50(B)', 'metrics/mAP50-95(B)']
maps: array([], dtype=float64)
names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43

### prediction (pretrained model)
<img src="./runs/detect/val2/val_batch0_pred.jpg" alt="prediction" width="500">

It detects bears pretty well already with a couple misclassifications. Now let's see how well it can detect bear types with the custom data.

In [11]:
model.train(data="config.yaml", epochs=10)

Ultralytics YOLOv8.0.157  Python-3.11.3 torch-2.0.1+cpu CPU (Intel Core(TM) i7-6820HQ 2.70GHz)
[34m[1mengine\trainer: [0mtask=detect, mode=train, model=yolov8n.pt, data=config.yaml, epochs=10, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=None, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, line_width=None, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=

### prediction and actual labels (pretrained model, 10 epochs on custom set)
<img src="./runs/detect/train6/val_batch0_pred.jpg" alt="prediction" width="500">

<img src="./runs/detect/train6/val_batch0_labels.jpg" alt="labels" width="500">

Nothing, just like in the model from scratch. Now again we try ONE HUNDRED EPOCHS.

In [12]:
model.train(data="config.yaml", epochs=100)

Ultralytics YOLOv8.0.157  Python-3.11.3 torch-2.0.1+cpu CPU (Intel Core(TM) i7-6820HQ 2.70GHz)
[34m[1mengine\trainer: [0mtask=detect, mode=train, model=yolov8n.pt, data=config.yaml, epochs=100, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=0, project=None, name=None, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, line_width=None, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8

### prediction and actual labels (pretrained model, 100 epochs on custom set)
<img src="./runs/detect/train7/val_batch0_pred.jpg" alt="prediction" width="500">

<img src="./runs/detect/train7/val_batch0_labels.jpg" alt="labels" width="500">

<br>

<img src="./runs/detect/train7/confusion_matrix_normalized.png" alt="confusion matrix (normalized)" width="500">

<img src="./runs/detect/train7/results.png" alt="results" width="700">

It has absolutely perfect results.  But now I should let you know a secret: the validation set is part of the training set, so it's probably heavily overfitting.

Let's try inference on some cute videos to see if it actually works.

In [18]:
%pip install yt-dlp

Collecting yt-dlp
  Obtaining dependency information for yt-dlp from https://files.pythonhosted.org/packages/5c/da/ef08140cea3392288a8f6cd60f6f12510a4c5776eead53e90151f139af19/yt_dlp-2023.7.6-py2.py3-none-any.whl.metadata
  Downloading yt_dlp-2023.7.6-py2.py3-none-any.whl.metadata (157 kB)
     ---------------------------------------- 0.0/158.0 kB ? eta -:--:--
     -------------------------------------- 158.0/158.0 kB 4.8 MB/s eta 0:00:00
Collecting mutagen (from yt-dlp)
  Downloading mutagen-1.46.0-py3-none-any.whl (193 kB)
     ---------------------------------------- 0.0/193.6 kB ? eta -:--:--
     -------------------------------------- 193.6/193.6 kB 5.9 MB/s eta 0:00:00
Collecting websockets (from yt-dlp)
  Downloading websockets-11.0.3-cp311-cp311-win_amd64.whl (124 kB)
     ---------------------------------------- 0.0/124.7 kB ? eta -:--:--
     ---------------------------------------- 124.7/124.7 kB ? eta 0:00:00
Downloading yt_dlp-2023.7.6-py2.py3-none-any.whl (3.0 MB)
   ---

In [30]:
from yt_dlp import YoutubeDL

video_urls = [
    'https://www.youtube.com/watch?v=ylCIa-12ILk',
    'https://www.youtube.com/watch?v=oUle-4E1qoQ',
    'https://www.youtube.com/watch?v=UwbtyBEYiTQ',

]

options = {
    'outtmpl': 'bears/videos/%(id)s.%(ext)s',
}

with YoutubeDL(options) as ydl:
    ydl.download(video_urls)

[youtube] Extracting URL: https://www.youtube.com/watch?v=ylCIa-12ILk
[youtube] ylCIa-12ILk: Downloading webpage
[youtube] ylCIa-12ILk: Downloading ios player API JSON
[youtube] ylCIa-12ILk: Downloading android player API JSON
[youtube] ylCIa-12ILk: Downloading m3u8 information
[info] ylCIa-12ILk: Downloading 1 format(s): 22
[download] Destination: bears\videos\ylCIa-12ILk.mp4
[download] 100% of   15.76MiB in 00:00:02 at 6.69MiB/s     
[youtube] Extracting URL: https://www.youtube.com/watch?v=oUle-4E1qoQ
[youtube] oUle-4E1qoQ: Downloading webpage
[youtube] oUle-4E1qoQ: Downloading ios player API JSON
[youtube] oUle-4E1qoQ: Downloading android player API JSON
[youtube] oUle-4E1qoQ: Downloading m3u8 information
[info] oUle-4E1qoQ: Downloading 1 format(s): 22
[download] Destination: bears\videos\oUle-4E1qoQ.mp4
[download] 100% of    1.87MiB in 00:00:00 at 4.88MiB/s   
[youtube] Extracting URL: https://www.youtube.com/watch?v=UwbtyBEYiTQ
[youtube] UwbtyBEYiTQ: Downloading webpage
[youtube]

In [68]:
import os

bear_vids_path = os.path.join('bears', 'videos')
file_list = [file_name for file_name in os.listdir(bear_vids_path)]
file_list

['oUle-4E1qoQ.mp4', 'out', 'UwbtyBEYiTQ.mp4', 'ylCIa-12ILk.mp4']

In [72]:
for file in file_list:
    file_path = os.path.join(bear_vids_path, file) 
    out_path = os.path.join(bear_vids_path, "out", file) 

    cap = cv2.VideoCapture(file_path) 

    out = cv2.VideoWriter(
        out_path,
        cv2.VideoWriter_fourcc(*'avc1'), 
        cap.get(cv2.CAP_PROP_FPS), 
        (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), 
        int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    )

    while cap.isOpened(): 
        ret, frame = cap.read() 

        if not ret: 
            break

        results = model(frame) 
        annotated_frame = results[0].plot()

        out.write(annotated_frame)

    cap.release()
    out.release()



0: 640x384 1 brown, 118.7ms
Speed: 3.0ms preprocess, 118.7ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 384)

0: 640x384 1 brown, 109.1ms
Speed: 2.0ms preprocess, 109.1ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)

0: 640x384 1 brown, 112.5ms
Speed: 3.0ms preprocess, 112.5ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)

0: 640x384 (no detections), 107.6ms
Speed: 2.0ms preprocess, 107.6ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)

0: 640x384 1 brown, 111.5ms
Speed: 2.1ms preprocess, 111.5ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)

0: 640x384 (no detections), 108.8ms
Speed: 3.0ms preprocess, 108.8ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 384)

0: 640x384 (no detections), 104.3ms
Speed: 2.0ms preprocess, 104.3ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 384)

0: 640x384 (no detections), 106.6ms
Speed: 3.0ms preprocess, 106.6ms inference, 0.7ms 

<div style="display: flex; justify-content: space-between; max-width: 1200px; margin: 0 auto;">
    <video controls style="width: 30%; max-height: 100%; border: 1px solid #ccc; box-sizing: border-box;">
        <source src="https://files.catbox.moe/jour4q.mp4" type="video/mp4">
    </video>
    <video controls style="width: 30%; max-height: 100%; border: 1px solid #ccc; box-sizing: border-box;">
        <source src="https://files.catbox.moe/5hfnkd.mp4" type="video/mp4">
    </video>
    <video controls style="width: 30%; max-height: 100%; border: 1px solid #ccc; box-sizing: border-box;">
        <source src="https://files.catbox.moe/zx9wkb.mp4" type="video/mp4">
    </video>
</div>

# Conclusion

Pretty good for only having about 12 images per type of bear to train from. The middle video kept classifying the polar bear as a brown bear but I'm very impressed with the other two. 