<a href="https://colab.research.google.com/github/nyp-sit/iti121-2025s2/blob/main/L6/yolo_custom_train.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Object Detection using YOLO

Welcome to this week's hands-on lab. In this lab, we are going to learn how to train a gold fish detector!

At the end of this exercise, you will be able to:

- create an object detection dataset in YOLO format
- fine-tune a YOLO pretrained model with the custom dataset
- monitor the training progress and evaluation metrics
- deploy the trained model for object detection

## Create an object detection dataset

We will use a goldfish dataset to illustrate the process of annotation and packaging the dataset into different format for object detection (e.g. YOLO, Pascal VOC, COCO, etc).

To annotate, there are many different tools available, such as the very basic [LabelImg](https://github.com/HumanSignal/labelImg) , or the more feature-packed tool such as [Label Studio](https://labelstud.io/), or online service such as [Roboflow](https://roboflow.com/).

### Raw Image Dataset

You can download the goldfish images (without annotations) from this link:

https://github.com/nyp-sit/iti121-2025s2/raw/refs/heads/main/L6/data/goldfish_v1_raw.zip


Unzip the file to a local folder.

There are total of 74 images. You should divide the images into both training and validation set (e.g. 80%-20%, i.e. 59 images for train, and 15 for test).


### Option 1: Label Studio

You can follow the [steps](https://labelstud.io/guide/quick_start) here to setup Label Studio on your PC. It is recommended to setup a conda environment before you install the Label Studio.  

Here are the steps that need to be done:
1. Create a new Project
2. Import the images into Label Studio
3. Set up the Labelling UI tempalte (choose Object Detection with Bounding Box template)
4. Export the dataset in YOLO format.

The exported dataset will have the following folder structure:
```
<root folder>
classes.txt    --> contains the labels, with each class label on a new line
--images --> contains the images
--labels --> contains the annotations (i.e. bbox coordinates)
notes.json --> some info about this dataset (i.e. not used)
```

For training with YOLOv11 (from Ultralytics), you need to organize the files into `train` and `validate` (and optionally `test`) folders, and to create a `data.yaml` file to provide information about the folder location of test and validation set:

```
<root folder>
--train
----images
----labels
--valid
----images
----labels
data.yaml
```

The data.yaml file should specify the following:
```
train: ../train/images
val: ../valid/images
test: ../test/images

names:
    0: goldfish
```

If you have more than one class of object to detect, specify the rest of the names under the names field.


## Option 2: Roboflow

Alternatively, you can use the online service Roboflow to do annotation. Roboflow integrates very well with Ultralytics and you can easily export the dataset in a format recognized by Ultralytics trainer (for YOLO model)

You can create a new account with [Roboflow](https://roboflow.com/).

After logging in, you can create a new project, upload all the raw images, annotate them and then export.

You can choose the format to be YOLOv11 and choose local directory to download the dataset locally instead of pushing it to the Roboflow universal wish.

Here is a [introductory blog](https://blog.roboflow.com/getting-started-with-roboflow/) on using the Roboflow to annotate.





## Auto Labelling using Grounding DINO

Both Label Studio and Roboflow supports the use of Grounding DINO to auto label the dataset.

[Grounding DINO](https://github.com/IDEA-Research/GroundingDINO) is open-set object detector, marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs (prompts) such as category names or referring expressions.

### Using Grounding DINO with Label Studio

You can follow the instruction [here](https://labelstud.io/blog/using-text-prompts-for-image-annotation-with-grounding-dino-and-label-studio/)  to setup the Grounding DINO ML backend to integrate with your label studio.

### Using Grounding DINO with Roboflow

Here is a [video tutorial](https://youtu.be/SDV6Gz0suAk) on using Grounding DINO with Roboflow.


### Download Annotated Dataset

To save you time for this lab, you can download a pre-annotated goldfish_v1 dataset [here](https://github.com/nyp-sit/iti121-2025s2/raw/refs/heads/main/L6/data/goldfish_v1.zip).

We download and unzip to the directory called `datasets/goldfish_v1`



In [None]:
%%capture
%%bash

wget https://github.com/nyp-sit/iti121-2025s2/raw/refs/heads/main/L6/data/goldfish_v1.zip
mkdir -p datasets/goldfish_v1/
unzip goldfish_v1.zip -d datasets/

In [1]:
%%capture
!pip install ultralytics

## Training the Model

YOLOv11 comes with different sizes of pretrained models: yolo11n, yolo11s, .... They differs in terms of their sizes, inference speeds and mean average precision:

<img src="https://github.com/nyp-sit/iti121-2025s2/blob/main/L6/assets/yolo11-models.png?raw=true" width="70%"/>


We will use the small pretrained model yolo11s and finetune it on our custom dataset.


### Setup the logging

Ultralytics support logging to `wandb`, `comet.ml` and `tensorboard` and `mlflow` out of the box. Here we only enable wandb.

You need to create an account at [`wandb`](https://wandb.ai) and get the API key from https://wandb.ai/authorize.

*For mlflow users, you can refer to Ultralytics's mlflow integration here: https://docs.ultralytics.com/integrations/mlflow/*


In [2]:
pip install wandb

Collecting wandb
  Downloading wandb-0.24.0-py3-none-win_amd64.whl.metadata (12 kB)
Collecting click>=8.0.1 (from wandb)
  Using cached click-8.3.1-py3-none-any.whl.metadata (2.6 kB)
Collecting gitpython!=3.1.29,>=1.0.0 (from wandb)
  Using cached gitpython-3.1.46-py3-none-any.whl.metadata (13 kB)
Collecting protobuf!=4.21.0,!=5.28.0,<7,>=3.19.0 (from wandb)
  Downloading protobuf-6.33.4-cp310-abi3-win_amd64.whl.metadata (593 bytes)
Collecting pydantic<3 (from wandb)
  Using cached pydantic-2.12.5-py3-none-any.whl.metadata (90 kB)
Collecting sentry-sdk>=2.0.0 (from wandb)
  Using cached sentry_sdk-2.49.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting annotated-types>=0.6.0 (from pydantic<3->wandb)
  Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.41.5 (from pydantic<3->wandb)
  Using cached pydantic_core-2.41.5-cp313-cp313-win_amd64.whl.metadata (7.4 kB)
Collecting typing-inspection>=0.4.2 (from pydantic<3->wandb)
  Using cached typing

In [3]:
import wandb

# Replace 'YOUR_LONG_API_KEY_HERE' with the actual key you copied
wandb.login(key="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")

from ultralytics import settings

settings.update({"wandb": True,
                 "clearml": False,
                 "comet": False})

[34m[1mwandb[0m: [wandb.login()] Using explicit session credentials for https://api.wandb.ai.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: C:\Users\jzhong\_netrc
[34m[1mwandb[0m: Currently logged in as: [33mjzhng[0m ([33mjzhng-nanyang-polytechnic[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


### Training

We specify the path to `data.yaml` file, and train with a batch size of 64 (reduce accordingly if you encounter out-of-memory error), and we also save the checkpoint at each epoch (save_period=1). We assume here you are connected to a GPU, hence we can specify the device to use as `device=0` to select the first GPU.  We specify the project name as `goldfish_v1`, this will create a folder called `goldfish_v1` to store the weights and various training artifacts such as F1, PR curves, confusion matrics, training results (loss, mAP, etc).

For a complete listing of train settings, you can see [here](https://docs.ultralytics.com/modes/train/#train-settings).

You can also specify the type of data [augmentation](https://docs.ultralytics.com/modes/train/#augmentation-settings-and-hyperparameters)  you want as part of the train pipeline.

You can monitor your training progress at wandb (the link is given in the train output below)


In [None]:
!ls -la datasets/goldfish_v1/train/images | wc -l

In [4]:
import torch
from ultralytics import YOLO
from ultralytics import settings

# Load a pre-trained YOLO model
model = YOLO("yolo11s.pt")

# choose device and a safe batch size (use smaller batch on CPU)
device = 0 if torch.cuda.is_available() else "cpu"
batch_size = 64 if torch.cuda.is_available() else 8

# make sure the correct data.yaml path is used
result = model.train(data="datasets\data.yaml",
                     epochs=30,
                     save_period=1,
                     batch=batch_size,
                     device=device,
                     project='fries_burger_v1',
                     plots=True)

  result = model.train(data="datasets\data.yaml",


New https://pypi.org/project/ultralytics/8.4.0 available  Update with 'pip install -U ultralytics'
Ultralytics 8.3.253  Python-3.13.7 torch-2.9.1+cpu CPU (13th Gen Intel Core i7-13800H)
[34m[1mengine\trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=8, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=datasets\data.yaml, degrees=0.0, deterministic=True, device=cpu, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=30, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolo11s.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=train, nbs=64, nms=False, opset=No

Overriding model.yaml nc=80 with nc=2

                   from  n    params  module                                       arguments                     
  0                  -1  1       928  ultralytics.nn.modules.conv.Conv             [3, 32, 3, 2]                 
  1                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]                
  2                  -1  1     26080  ultralytics.nn.modules.block.C3k2            [64, 128, 1, False, 0.25]     
  3                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              
  4                  -1  1    103360  ultralytics.nn.modules.block.C3k2            [128, 256, 1, False, 0.25]    
  5                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
  6                  -1  1    346112  ultralytics.nn.modules.block.C3k2            [256, 256, 1, True]           
  7                  -1  1   1180672  ultralytics

[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
lr/pg0,▁▂▃▄▅▅▆▆▇▇████▇▇▆▆▆▅▅▄▄▃▃▃▂▂▁▁
lr/pg1,▁▂▃▄▅▅▆▆▇▇████▇▇▆▆▆▅▅▄▄▃▃▃▂▂▁▁
lr/pg2,▁▂▃▄▅▅▆▆▇▇████▇▇▆▆▆▅▅▄▄▃▃▃▂▂▁▁
metrics/mAP50(B),▆▆▆▄▅▄▄▃▂▂▃▂▁▁▅▄▄▅▆▆▇▆▆▆▇█████
metrics/mAP50-95(B),▆▄▅▃▅▄▃▃▁▂▂▂▁▁▃▃▄▄▅▆▆▆▅▆▇█████
metrics/precision(B),▆▆▅▄▆▄▄▃▂▃▄▃▁▂▄▅▃▅▅▅▆▅▅▇▇▇▇▇██
metrics/recall(B),▆▇▆▄▆▄▅▆▃▄▂▃▂▁▆▅▇▇▇█▇▇▇▅▆▇██▇▇
model/GFLOPs,▁
model/parameters,▁
model/speed_PyTorch(ms),▁

0,1
lr/pg0,7e-05
lr/pg1,7e-05
lr/pg2,7e-05
metrics/mAP50(B),0.88629
metrics/mAP50-95(B),0.53268
metrics/precision(B),0.87993
metrics/recall(B),0.78864
model/GFLOPs,21.551
model/parameters,9428566
model/speed_PyTorch(ms),102.568


You can see the various graphs in your wandb dashboard, for example:

*metrics*

<img src="https://github.com/nyp-sit/iti121-2025s2/blob/main/L6/assets/wandb-metrics.png?raw=true" width="70%"/>

*Train and validation loss*

<img src="https://github.com/nyp-sit/iti121-2025s2/blob/main/L6/assets/wandb-loss.png?raw=true" width="70%"/>

You can go to the folder `goldfish_v1-->train-->weights` and you will files like epoch0.pt, epoch1.pt, .... and also best.pt.
The epoch0.pt, epoch1.pt are the checkpoints that are saved every period (in our case, we specify period as 1 epoch).  The best.pt contains the best checkpoint.

We can run the best model (using the best checkpoint) against the validation dataset to see the overall model performance on validation set.  

You should see around `0.95` for `mAP50`, and `0.45` for `mAP50-95`.

In [6]:
from ultralytics import YOLO

model = YOLO("fries_burger_v1/train/weights/best.pt")
# select GPU device 0 if available, otherwise use CPU
device = 0 if torch.cuda.is_available() else "cpu"
validation_results = model.val(data="datasets/data.yaml", device=device)

Ultralytics 8.3.253  Python-3.13.7 torch-2.9.1+cpu CPU (13th Gen Intel Core i7-13800H)
YOLO11s summary (fused): 100 layers, 9,413,574 parameters, 0 gradients, 21.3 GFLOPs
[34m[1mval: [0mFast image access  (ping: 0.10.0 ms, read: 831.5643.5 MB/s, size: 210.2 KB)
[K[34m[1mval: [0mScanning C:\Users\jzhong\Downloads\project-2Object\datasets\val\labels.cache... 39 images, 0 backgrounds, 0 corrupt: 100% ━━━━━━━━━━━━ 39/39 11.7Mit/s 0.0s
[K                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 3/3 1.2s/it 3.5s2.9s
                   all         39         53      0.756      0.869      0.886      0.565
          french fries         31         33      0.874      0.838      0.875      0.524
             hamburger         17         20      0.638        0.9      0.896      0.606
Speed: 0.4ms preprocess, 81.7ms inference, 0.0ms loss, 0.3ms postprocess per image
Results saved to [1mC:\Users\jzhong\Downloads\project-2Object\runs\detect\v

## Export and Deployment

Your model is in pytorch format (.pt). You can export the model to various format, e.g. TorchScript, ONNX, OpenVINO, TensorRT, etc. depending on your use case, and deployment platform (e.g. CPU or GPU, etc)

You can see the list of [supported formats](https://docs.ultralytics.com/modes/export/#export-formats)  and the option they support in terms of further optimization (such as imagesize, int8, half-precision, etc) in the ultralytics site.

Ultralytics provide a utility function to benchmark your model using different supported formats automatically. You can uncomment the code in the following code cell to see the benchmark result. If you are benchmarking for CPU only, the change the `device=0` to `device='cpu'`.  

**Beware: it will take quite a while to complete the benchmark**

In [None]:
# from ultralytics.utils.benchmarks import benchmark

# # Benchmark on GPU (device=0 means the 1st GPU device)
# benchmark(model="goldfish_v1/train/weights/best.pt", data="datasets/data.yaml", imgsz=640, half=False, device=0)


In the following code, we export it as OpenVINO. OpenVINO is optimized for inference on Intel CPUs and since we will use the model later on to do inference on local Windows machine (which runs Intel chip), we will export it as OpenVINO format. We also specify using int8 quantization, which results in faster inference, at the cost of accuracy.

For more information on OpenVINO, go to the [official documentation](https://docs.openvino.ai/2024/index.html).

After export, you can find the openvino model in `goldfish_v1\train\weights\best_openvino_model` directory.

In [7]:
model = YOLO("fries_burger_v1/train/weights/best.pt")
exported_path = model.export(format="openvino", int8=True)

Ultralytics 8.3.253  Python-3.13.7 torch-2.9.1+cpu CPU (13th Gen Intel Core i7-13800H)
YOLO11s summary (fused): 100 layers, 9,413,574 parameters, 0 gradients, 21.3 GFLOPs

[34m[1mPyTorch:[0m starting from 'fries_burger_v1\train\weights\best.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 6, 8400) (18.3 MB)

[34m[1mOpenVINO:[0m starting export with openvino 2025.4.1-20426-82bbf0292c5-releases/2025/4...
[34m[1mOpenVINO:[0m collecting INT8 calibration images from 'data=coco8.yaml'
Fast image access  (ping: 0.10.1 ms, read: 158.079.8 MB/s, size: 54.0 KB)
[KScanning C:\Users\jzhong\Downloads\project-2Object\coco8\labels\val.cache... 4 images, 0 backgrounds, 0 corrupt: 100% ━━━━━━━━━━━━ 4/4 2.4Mit/s 0.0s
INFO:nncf:15 ignored nodes were found by patterns in the NNCFGraph
INFO:nncf:1 ignored nodes were found by types in the NNCFGraph
INFO:nncf:Not adding activation input quantizer for operation: 168 __module.model.23.dfl/aten::view/Reshape
INFO:nncf:Not adding acti



[34m[1mOpenVINO:[0m export success  10.1s, saved as 'fries_burger_v1\train\weights\best_int8_openvino_model\' (9.8 MB)

Export complete (10.5s)
Results saved to [1mC:\Users\jzhong\Downloads\project-2Object\fries_burger_v1\train\weights[0m
Predict:         yolo predict task=detect model=fries_burger_v1\train\weights\best_int8_openvino_model imgsz=640 int8 
Validate:        yolo val task=detect model=fries_burger_v1\train\weights\best_int8_openvino_model imgsz=640 data=datasets\data.yaml int8 
Visualize:       https://netron.app


## Inference

Let's test our model on some sample pictures. You can optionally specify the confidence threshold (e.g. `conf=0.5`), and the IoU (e.g. `iou=0.6`) for the NMS. The model will only output the bounding boxes of those detection that exceeds the confidence threshould and the IoU threshold.  

In [13]:
import ultralytics
from ultralytics import YOLO
from PIL import Image
import os

source = 'test\\Untitled3.jpg'
model = YOLO("fries_burger_v1\\train\\weights\\best_int8_openvino_model", task='detect')

# Use predict() with save=True to save to runs\detect\predictXX folder
result = model.predict(source, conf=0.5, iou=0.6, save=True)

# Visualize the results
for i, r in enumerate(result):
    print(r)
    print(f"\nResults saved to: {r.save_dir}")
    
    # The annotated image is automatically saved to save_dir by predict() with save=True
    # You can find it at: {save_dir}/{image_name}
    saved_image_path = os.path.join(r.save_dir, os.path.basename(source))
    print(f"Annotated image saved at: {saved_image_path}")
    
    # Plot results image
    im_bgr = r.plot()  # BGR-order numpy array
    im_rgb = Image.fromarray(im_bgr[..., ::-1])  # RGB-order PIL image

    # Show results to screen (in supported environments)
    r.show()

Loading fries_burger_v1\train\weights\best_int8_openvino_model for OpenVINO inference...
Using OpenVINO LATENCY mode for batch=1 inference on (CPU)...

image 1/1 c:\Users\jzhong\Downloads\project-2Object\test\Untitled3.jpg: 640x640 1 hamburger, 37.7ms
Speed: 2.4ms preprocess, 37.7ms inference, 0.6ms postprocess per image at shape (1, 3, 640, 640)
Results saved to [1mC:\Users\jzhong\Downloads\project-2Object\runs\detect\predict21[0m
ultralytics.engine.results.Results object with attributes:

boxes: ultralytics.engine.results.Boxes object
keypoints: None
masks: None
names: {0: 'french fries', 1: 'hamburger'}
obb: None
orig_img: array([[[30, 28, 28],
        [29, 27, 27],
        [28, 26, 26],
        ...,
        [45, 38, 41],
        [46, 39, 42],
        [43, 36, 39]],

       [[32, 30, 30],
        [32, 30, 30],
        [31, 29, 29],
        ...,
        [46, 39, 42],
        [46, 39, 42],
        [48, 41, 44]],

       [[34, 32, 32],
        [34, 32, 32],
        [34, 32, 32],
    

## Download the Model

If you are training your model on Google Colab, you will download the exported OpenVINO model to a local PC. If you are training your model locally, then the exported model should already be on your local PC.

Run the following code to zip up the OpenVINO folder and download to local PC.

*Note: If you encountered error message "NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968", uncomment the following cell and run it.*


In [None]:
# import locale
# locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
%%bash
mv ./goldfish_v1/train/weights/best_int8_openvino_model/ .
zip -r goldfish_v1_openvino_model.zip best_int8_openvino_model

# Now go to best_openvino_model to download the best_openvino_model.zip file

## Streaming

We can also do real-time detection on a video or camera steram.

The code below uses openCV library to display video in a window, and can only be run locally on a local laptop.




### Video File

You need `OpenCV` to run the following code.  In your conda environment, install `opencv` for python using the following command:

```
pip3 install opencv-python
```
or
```
conda install opencv
```

Let's donwload the sample video file.

In [None]:
!wget https://raw.githubusercontent.com/nyp-sit/iti121-2025S2/refs/heads/main/L6/samples/goldfish_480p_10s.mp4

### Streaming and display video

In [15]:
from ultralytics import YOLO
import cv2

# Load the YOLO model
model = YOLO("fries_burger_v1\\train\\weights\\best_int8_openvino_model", task="detect")

# Open the video file
video_path = "test\\BURGERS.mp4"
cap = cv2.VideoCapture(video_path)

# Loop through the video frames
while cap.isOpened():
    # Read a frame from the video
    success, frame = cap.read()

    if success:
        # Run YOLO inference on the frame on GPU Device 0
        results = model(frame, device="cpu")

        # Visualize the results on the frame
        annotated_frame = results[0].plot()

        # Display the annotated frame
        cv2.imshow("YOLO Inference", annotated_frame)

        # Break the loop if 'q' is pressed
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    else:
        # Break the loop if the end of the video is reached
        break

# Release the video capture object and close the display window
cap.release()
cv2.destroyAllWindows()

Loading fries_burger_v1\train\weights\best_int8_openvino_model for OpenVINO inference...
Using OpenVINO LATENCY mode for batch=1 inference on (CPU)...

0: 640x640 2 french friess, 2 hamburgers, 42.5ms
Speed: 2.0ms preprocess, 42.5ms inference, 0.8ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 french fries, 1 hamburger, 31.6ms
Speed: 2.4ms preprocess, 31.6ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 french fries, 1 hamburger, 22.5ms
Speed: 2.2ms preprocess, 22.5ms inference, 0.6ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 french fries, 2 hamburgers, 31.8ms
Speed: 2.2ms preprocess, 31.8ms inference, 0.6ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 french fries, 2 hamburgers, 30.2ms
Speed: 2.4ms preprocess, 30.2ms inference, 0.5ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 french fries, 3 hamburgers, 36.8ms
Speed: 2.1ms preprocess, 36.8ms inference, 0.6ms postprocess per image 

### Detect and write to a video file

In [18]:
%pip install tqdm

Collecting tqdm
  Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Using cached tqdm-4.67.1-py3-none-any.whl (78 kB)
Installing collected packages: tqdm
Successfully installed tqdm-4.67.1
Note: you may need to restart the kernel to use updated packages.


In [4]:
# ensure tqdm is installed for progress bars

from ultralytics import YOLO
import cv2
from tqdm.auto import tqdm

def write_video(video_in_filepath, video_out_filepath, model):
    # Open the video file
    video_reader = cv2.VideoCapture(video_in_filepath)

    nb_frames = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))
    frame_h = int(video_reader.get(cv2.CAP_PROP_FRAME_HEIGHT))
    frame_w = int(video_reader.get(cv2.CAP_PROP_FRAME_WIDTH))
    fps = video_reader.get(cv2.CAP_PROP_FPS)

    print(f"Video info: {nb_frames} frames, {frame_w}x{frame_h}, {fps} fps")

    if nb_frames == 0:
        print(f"Error: Could not read video from {video_in_filepath}")
        video_reader.release()
        return

    video_writer = cv2.VideoWriter(video_out_filepath,
                            cv2.VideoWriter_fourcc(*'mp4v'),
                            fps,
                            (frame_w, frame_h))

    # Loop through the video frames
    for i in tqdm(range(nb_frames), desc="Processing frames"):
        # Read a frame from the video
        success, frame = video_reader.read()

        if success:
            # Run YOLO inference on the frame using CPU
            results = model(frame, conf=0.6, device="cpu", verbose=False)

            # Visualize the results on the frame
            annotated_frame = results[0].plot()

            # Write the annotated frame
            video_writer.write(annotated_frame)
        else:
            print(f"Warning: Failed to read frame {i}")
            break

    video_reader.release()
    video_writer.release()
    cv2.destroyAllWindows()
    cv2.waitKey(1)
    print(f"\nVideo processing complete! Saved to: {video_out_filepath}")


  from .autonotebook import tqdm as notebook_tqdm


In [None]:
from pathlib import Path
import os

# INPUT: Your original video file
video_in_file = "test\\burgerking2.mp4"

# OUTPUT: The detected video file
video_out_file = "burgerking2_label.mp4"

model = YOLO("fries_burger_v1\\train\\weights\\best_int8_openvino_model", task="detect")
write_video(video_in_file, video_out_file, model)

Video info: 1357 frames, 360x640, 30.0 fps


Processing frames:   0%|          | 0/1357 [00:00<?, ?it/s]

Loading fries_burger_v1\train\weights\best_int8_openvino_model for OpenVINO inference...
Using OpenVINO LATENCY mode for batch=1 inference on (CPU)...


Processing frames: 100%|██████████| 1357/1357 [00:25<00:00, 53.84it/s]



Video processing complete! Saved to: McDonald_label.mp4


## Improving the object detection model

You will notice that the trained goldfish detector did not perform well in detecting the goldfishes in the fish tank.  

Can you give a reason why?

### Exercise

Now take a look at goldfish_v2 dataset.  What do you observe? Do you think this is a better dataset to use for the domain that was set.

Now train a goldfish detector, using the goldfish_v2 dataset.  Evaluate the model on the same sample image and video. Compare the results and discuss.