# Can YOLOv8 detect types of bears with a small custom dataset
First load yolov8 nano from scratch and train it on the bears with 10 epochs and see what happens.
The dataset is 38 images so it truly is very small. I labeled it myself.

In [None]:
from ultralytics import YOLO

# Load a model
model = YOLO("yolov8n.yaml")  # build a new model from scratch

# Use the model
model.train(data="config.yaml", epochs=10)  # train the model


### prediction and actual labels (10)
<img src="notebook_resource/train0_pred.jpg" alt="prediction" width="500">

<img src="notebook_resource/train0_labels.jpg" alt="labels" width="500">

It didn't predict anything at all. Maybe we need more training. Let's try ONE HUNDRED EPOCHS.

In [None]:
model.train(data="config.yaml", epochs=100)  # train the model

### prediction and actual labels (100 epochs)
<img src="./runs/detect/train4/val_batch0_pred.jpg" alt="prediction" width="500">

<img src="./runs/detect/train4/val_batch0_labels.jpg" alt="labels" width="500">
<br>
<img src="./runs/detect/train4/val_batch1_pred.jpg" alt="prediction" width="500">

<img src="./runs/detect/train4/val_batch1_labels.jpg" alt="labels" width="500">


It actually predicts some now, but not that great. Adding more data would probably help a lot but I don't want to do any more labelling. 


Let's try a pre-trained model and see how well it performs, first without fine tuning.

In [None]:
model = YOLO("yolov8n.pt")  # load a pretrained model
model.val(data="config.yaml") # try out the pretrained model

### prediction (pretrained model)
<img src="./runs/detect/val2/val_batch0_pred.jpg" alt="prediction" width="500">

It detects bears pretty well already with a couple misclassifications. Now let's see how well it can detect bear types with the custom data.

In [None]:
model.train(data="config.yaml", epochs=10)

### prediction and actual labels (pretrained model, 10 epochs on custom set)
<img src="./runs/detect/train6/val_batch0_pred.jpg" alt="prediction" width="500">

<img src="./runs/detect/train6/val_batch0_labels.jpg" alt="labels" width="500">

Nothing, just like in the model from scratch. Now again we try ONE HUNDRED EPOCHS.

In [None]:
model.train(data="config.yaml", epochs=100)

### prediction and actual labels (pretrained model, 100 epochs on custom set)
<img src="./runs/detect/train7/val_batch0_pred.jpg" alt="prediction" width="500">

<img src="./runs/detect/train7/val_batch0_labels.jpg" alt="labels" width="500">

<br>

<img src="./runs/detect/train7/confusion_matrix_normalized.png" alt="confusion matrix (normalized)" width="500">

<img src="./runs/detect/train7/results.png" alt="results" width="700">

It has absolutely perfect results.  But now I should let you know a secret: the validation set is part of the training set, so it's probably heavily overfitting.

Let's try inference on some cute videos to see if it actually works.

In [1]:
%pip install yt-dlp

Note: you may need to restart the kernel to use updated packages.


In [30]:
from yt_dlp import YoutubeDL

video_urls = [
    'https://www.youtube.com/watch?v=ylCIa-12ILk',
    'https://www.youtube.com/watch?v=oUle-4E1qoQ',
    'https://www.youtube.com/watch?v=UwbtyBEYiTQ',

]

options = {
    'outtmpl': 'bears/videos/%(id)s.%(ext)s',
}

with YoutubeDL(options) as ydl:
    ydl.download(video_urls)

[youtube] Extracting URL: https://www.youtube.com/watch?v=ylCIa-12ILk
[youtube] ylCIa-12ILk: Downloading webpage
[youtube] ylCIa-12ILk: Downloading ios player API JSON
[youtube] ylCIa-12ILk: Downloading android player API JSON
[youtube] ylCIa-12ILk: Downloading m3u8 information
[info] ylCIa-12ILk: Downloading 1 format(s): 22
[download] Destination: bears\videos\ylCIa-12ILk.mp4
[download] 100% of   15.76MiB in 00:00:02 at 6.69MiB/s     
[youtube] Extracting URL: https://www.youtube.com/watch?v=oUle-4E1qoQ
[youtube] oUle-4E1qoQ: Downloading webpage
[youtube] oUle-4E1qoQ: Downloading ios player API JSON
[youtube] oUle-4E1qoQ: Downloading android player API JSON
[youtube] oUle-4E1qoQ: Downloading m3u8 information
[info] oUle-4E1qoQ: Downloading 1 format(s): 22
[download] Destination: bears\videos\oUle-4E1qoQ.mp4
[download] 100% of    1.87MiB in 00:00:00 at 4.88MiB/s   
[youtube] Extracting URL: https://www.youtube.com/watch?v=UwbtyBEYiTQ
[youtube] UwbtyBEYiTQ: Downloading webpage
[youtube]

In [68]:
import os

bear_vids_path = os.path.join('bears', 'videos')
file_list = [file_name for file_name in os.listdir(bear_vids_path)]
file_list

['oUle-4E1qoQ.mp4', 'out', 'UwbtyBEYiTQ.mp4', 'ylCIa-12ILk.mp4']

In [None]:
for file in file_list:
    file_path = os.path.join(bear_vids_path, file) 
    out_path = os.path.join(bear_vids_path, "out", file) 

    cap = cv2.VideoCapture(file_path) 

    out = cv2.VideoWriter(
        out_path,
        cv2.VideoWriter_fourcc(*'avc1'), 
        cap.get(cv2.CAP_PROP_FPS), 
        (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), 
        int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    )

    while cap.isOpened(): 
        ret, frame = cap.read() 

        if not ret: 
            break

        results = model(frame) 
        annotated_frame = results[0].plot()

        out.write(annotated_frame)

    cap.release()
    out.release()


<div style="display: flex; justify-content: space-between; max-width: 1200px; margin: 0 auto;">
    <video controls style="width: 30%; max-height: 100%; border: 1px solid #ccc; box-sizing: border-box;">
        <source src="https://files.catbox.moe/jour4q.mp4" type="video/mp4">
    </video>
    <video controls style="width: 30%; max-height: 100%; border: 1px solid #ccc; box-sizing: border-box;">
        <source src="https://files.catbox.moe/5hfnkd.mp4" type="video/mp4">
    </video>
    <video controls style="width: 30%; max-height: 100%; border: 1px solid #ccc; box-sizing: border-box;">
        <source src="https://files.catbox.moe/zx9wkb.mp4" type="video/mp4">
    </video>
</div>

# Conclusion

Pretty good for only having about 12 images per type of bear to train from. The middle video kept classifying the polar bear as a brown bear but I'm very impressed with the other two. 