## Vision Bootcamp - Day 2: Object Detection Using AI Techniques

The second day of the **Vision Bootcamp** delves into modern approaches to **object detection** using **artificial intelligence (AI)** techniques. Participants will gain a deeper understanding of how machine learning algorithms, such as deep learning and convolutional neural networks (CNNs), can be applied to specific machine vision tasks. Practical examples will demonstrate how these technologies enable efficient and accurate object detection in real-world scenarios.


## Importing Required Libraries

In this first step, we import the necessary libraries that will be used throughout this workshop:

- **OpenCV (`cv2`)**: This is the main library we’ll use for image processing tasks, such as reading, displaying, and manipulating images.
- **NumPy (`np`)**: NumPy is a powerful library that helps with numerical operations, including matrix and array manipulations, which are essential in computer vision tasks.
- **OS (`os`)**: This helps with file and directory operations, allowing us to load and save images from specific locations on your computer.

This is the foundation for a smooth YOLO-based object detection workflow, and with these tools, we can start detecting objects in images using AI in an interactive and efficient way!

In [1]:
# OpenCV library for computer vision tasks
import cv2

# Numpy for numerical operations and handling arrays
import numpy as np

# OS module for file handling and accessing directories
import os

# User-defined utilities for smoother work with data
import Utilities.General

# Initialization of Constants

This script declares global constants and sets up the path to a specific dataset for training, validation, and testing in an object detection task using YOLO.

### Machine Vision Stands

Two different machine vision stands are available:

- **Stand_1**:  
  - Camera: Basler a2A2448-23gcPRO GigE Camera  
  - Lighting: VMR-11566 Multi-Angle Ring Light  

- **Stand_2**:  
  - Camera: Basler a2A1920-51gcPRO GigE Camera  
  - Lighting: EFFI-FD-200-200-000 High-Power Flat Light  

### Configuration Parameters

- **Camera Stand Name**: `Stand_1`, `Stand_2`
- **Dataset Name**: `Dataset_v1`, `Dataset_v2`  


In [2]:
# HW setup of machine vision stands
#   Stand_1: Basler a2A2448-23gcPRO GigE Camera; VMR-11566 Multi Angle Ring Light
#   Stand_2: Basler a2A1920-51gcPRO GigE Camera; EFFI-FD-200-200-000 High-Power Flat Light

# Name of the camera stand
CONST_CAMERA_STAND_NAME = 'Stand_1'
# Name of the dataset
CONST_DATASET_NAME = 'Dataset_v1'

# Get the full path to the dataset
full_path = Utilities.General.Get_Full_Path(CONST_CAMERA_STAND_NAME, CONST_DATASET_NAME)

# Ensure the dataset directory exists
os.makedirs(full_path, exist_ok=True)

# Update the YAML configuration with the new path
yaml_config_file = '../YOLO/Configuration/config_tmp.yaml'
Utilities.General.Update_Yaml(yaml_config_file, full_path)

# Select the desired size of YOLOv* to build the model
#   Note:
#     Detection Model
#   Nano: 'yolov8n', Small: 'yolov8s', Medium: 'yolov8m', Large: 'yolov8l', XLarge: 'yolov8x'}
CONST_YOLO_SIZE = 'yolov8n'
# An indication of whether the backbone layers of the model should be frozen
CONST_FREEZE_BACKBONE = True

YAML file <../YOLO/Configuration/config_tmp.yaml> successfully updated and saved.


## Testing Labeling Accuracy for Dataset Creation

In this section, we test if the labeled data was successfully annotated by visualizing the bounding boxes on the image that is read from the file.

In this section, we will:

1. **Define paths**: Locate the project and dataset folders.
2. **Create directories**: Ensure the dataset directory exists.
3. **Load image and label data**: Read the image and corresponding label file.
4. **Check image validity**: Verify the image exists and raise an error if not.
5. **Convert image format**: Convert the image from BGR to RGB.
6. **Draw bounding boxes**: Visualize YOLO-style bounding boxes on the image.
7. **Save result**: Save and display the annotated image.

This script helps verify that the image labeling was done correctly by visualizing the bounding boxes.

In [None]:
# Locate the path to the project folder
project_folder = os.getcwd().split('Vision-Bootcamp')[0] + 'Vision-Bootcamp'

# Define file path for the dataset location
file_path = f'../Data/{CONST_CAMERA_STAND_NAME}/{CONST_DATASET_NAME}/'

# Create the directory if it doesn't already exist
os.makedirs(file_path, exist_ok=True)

# Define the path to the image and label file
img_path_in = os.path.join(file_path, 'images/train/Image_1.png')
label_path_in = os.path.join(file_path, 'labels/train/Image_1')

# Load label data from the file
label_data = Utilities.General.Load_Data(label_path_in, 'txt', ' ')

# Load the image using OpenCV
img_raw = cv2.imread(img_path_in)

# Check if the image exists
if img_raw is None:
    raise FileNotFoundError(f'Image not found at {img_path_in}')

# Convert the image from BGR to RGB color format
img_rgb = cv2.cvtColor(img_raw, cv2.COLOR_BGR2RGB)

# Get image dimensions
img_height, img_width, _ = img_rgb.shape

# Define a dictionary for class_id to color mapping
class_colors = {0: (0, 255, 255), 1: (255, 0, 0)}  # Yellow and Blue in BGR

# Draw bounding boxes
for label_data_i in label_data:
    # YOLO bounding box (class_id, x_center, y_center, width, height)
    class_id, x_center, y_center, width, height = label_data_i

    # Convert YOLO coordinates to pixel coordinates
    x_c = int(x_center * img_width); y_c = int(y_center * img_height)
    w = int(width * img_width); h = int(height * img_height)

    # Calculate the top-left corner of the bounding box
    x_top_left = int(x_c - w / 2); y_top_left = int(y_c - h / 2)
    x_bottom_right = int(x_c + w / 2); y_bottom_right = int(y_c + h / 2)

    # Get the bounding box color based on the class_id
    class_id_color = class_colors.get(class_id, (0, 255, 0))  # Green as default

    # Draw the bounding box
    cv2.rectangle(img_raw, (x_top_left, y_top_left), (x_bottom_right, y_bottom_right), class_id_color, 2)

# Define output image path
output_path = os.path.join(f'../Data/{CONST_CAMERA_STAND_NAME}/Eval/', 'Image_1_result.png')

# Save the image with bounding boxes
cv2.imwrite(output_path, img_raw)

print(f'Result saved at: {output_path}')

## Data Augmentation for Object Detection

In this section, we apply data augmentation techniques to enhance the dataset for training an object detection model. The script performs the following tasks:

1. **Define augmentation transformations**: Apply a series of transformations such as rotation, color jitter, Gaussian blur, and random resized crop to the images and their corresponding bounding boxes.
2. **Generate augmented data**: The images and their annotations are processed and augmented multiple times based on a defined scaling factor (`CONST_SCALE_DATASET`).
3. **Save augmented data**: The augmented images and their updated labels are saved in the specified dataset directory.
4. **Progress display**: A progress bar (`tqdm`) is used to display the status of image augmentation.

This script ensures that the dataset is enriched with augmented variations to improve the robustness of the model.

In [None]:
# Albumentations (Library for image augmentation) [pip3 install albumentations]
import albumentations as A

# tqdm (Progress bar library) for displaying the progress of iterations
from tqdm import tqdm

# Scaling factor for data augmentation of the dataset
CONST_SCALE_DATASET = 10

# Number of data in each partition of the dataset
CONS_NUM_OF_DATA_IN_PARTITION = {'train': 24, 'valid': 6}

# Define file path for the dataset location
file_path = f'../Data/{CONST_CAMERA_STAND_NAME}/Dataset_v2/'

# Create the directory if it doesn't already exist
os.makedirs(file_path, exist_ok=True)  # Ensures the folder exists for storing the image

# Transformation declaration with rotation
transformation = A.Compose([
    A.Affine(
        translate_px={'x': (-50, 50), 'y': (-50, 50)},
        rotate=(-10, 10),
        p=0.75
    ),
    A.GaussianBlur(blur_limit=(5, 5), sigma_limit=(0.01, 1.0), p=0.5),
    A.RandomResizedCrop(height=1544, width=2064, scale=(0.95, 1.0), p=0.5)
], bbox_params=A.BboxParams(format='yolo', label_fields=['class_labels']))

counter = 0
while counter < CONST_SCALE_DATASET:
    # Progress bar to show the current processing status
    for key, value in CONS_NUM_OF_DATA_IN_PARTITION.items():
        for j in tqdm(range(value), desc=f'Iteration <{counter+1}>; Augmenting <{key}> partition', unit='image'):
            index = (list(CONS_NUM_OF_DATA_IN_PARTITION.keys()).index(key) * CONS_NUM_OF_DATA_IN_PARTITION['train']) + j + 1
            img_path_in = os.path.join(file_path, f'images/{key}/Image_{index}.png')
            label_path_in = os.path.join(file_path, f'labels/{key}/Image_{index}')
            
            try:
                # Load label data from the file
                label_data = Utilities.General.Load_Data(label_path_in, 'txt', ' ')

                # Load the image using OpenCV
                img_raw = cv2.imread(img_path_in)

                # Check if the image exists
                if img_raw is None:
                    print(f'Warning: Image not found at {img_path_in}')  # Add more informative error
                    continue  # Skip this iteration if the image is not found

                # Get image dimensions
                img_height, img_width, _ = img_raw.shape

                # Apply augmentation to image and bounding boxes
                augmented = transformation(image=img_raw, bboxes=label_data[:, 1:], class_labels=label_data[:, 0].T)

                # Save the augmented image
                cv2.imwrite(os.path.join(file_path, f'images/{key}/Image_{index}_Aug_{counter+1}.png'), augmented['image'])

                # Prepare label data for saving
                v = np.array(augmented['class_labels']).reshape(-1, 1)  # Reshape to column vector
                M = np.array(augmented['bboxes'])

                # Convert NumPy array to space-separated string, ensuring the first value is an integer
                formatted_output = '\n'.join(
                    f'{int(row[0])} ' + ' '.join(f'{val:.6f}' for val in row[1:])
                    for row in np.hstack((v, M))
                )

                # Save the label data (bounding box) to a file
                Utilities.General.Save_Data(os.path.join(file_path, f'labels/{key}/Image_{index}_Aug_{counter+1}'), formatted_output, 'txt', '')

            except Exception as e:
                print(f'Error processing {img_path_in}: {e}')

    counter += 1

Augmenting <train> partition: 100%|██████████| 24/24 [00:06<00:00,  3.44image/s]
Augmenting <valid> partition: 100%|██████████| 6/6 [00:02<00:00,  2.87image/s]
Augmenting <train> partition: 100%|██████████| 24/24 [00:07<00:00,  3.40image/s]
Augmenting <valid> partition: 100%|██████████| 6/6 [00:01<00:00,  3.24image/s]
Augmenting <train> partition: 100%|██████████| 24/24 [00:07<00:00,  3.32image/s]
Augmenting <valid> partition: 100%|██████████| 6/6 [00:01<00:00,  3.15image/s]
Augmenting <train> partition: 100%|██████████| 24/24 [00:06<00:00,  3.56image/s]
Augmenting <valid> partition: 100%|██████████| 6/6 [00:01<00:00,  3.19image/s]
Augmenting <train> partition: 100%|██████████| 24/24 [00:17<00:00,  1.41image/s]
Augmenting <valid> partition: 100%|██████████| 6/6 [00:04<00:00,  1.28image/s]
Augmenting <train> partition: 100%|██████████| 24/24 [00:17<00:00,  1.37image/s]
Augmenting <valid> partition: 100%|██████████| 6/6 [00:04<00:00,  1.45image/s]
Augmenting <train> partition: 100%|█████

## Importing Required Libraries

In this step, we import the necessary libraries that will be used throughout this workshop:

- **Ultralytics (`YOLO`)**: This library provides real-time object detection and image segmentation capabilities. It enables us to use the YOLO model for accurate and efficient object detection.

This is the foundation for a smooth OpenCV workflow, and with these tools, we can start manipulating images in an interactive and fun way!

In [None]:
# Ultralytics (Real-time object detection and image segmentation 
# model) [pip install ultralytics]
from ultralytics import YOLO

# Function to freeze the backbone layers of the model
from Utilities.Model import Freeze_Backbone

## Training a Custom YOLO Model

In this section, we set up and train a custom YOLOv8 model on a specific dataset. The script performs the following steps:

1. **Check and remove existing YOLO model**: If a pre-trained model already exists, it is removed to ensure a fresh training session.
2. **Load pre-trained YOLO model**: A pre-trained YOLOv8 model is loaded from the specified directory.
3. **Freeze backbone (optional)**: If the `CONST_FREEZE_BACKBONE` flag is set to `True`, the backbone of the model is frozen during training.
4. **Train the model**: The YOLO model is trained using the provided dataset configuration, with specified parameters like image size (`640x640`), number of epochs (`100`), and other training settings.

This script prepares the model and trains it on the custom dataset to perform object detection tasks.

Training the YOLOv8 model on a custom dataset allows the model to learn specific object detection tasks using the dataset and hyperparameters defined for the task.

For more information, see: [YOLOv8 Training Documentation](https://docs.ultralytics.com/modes/train/#arguments)

In [None]:

# Locate the path to the project folder
project_folder = os.getcwd().split('Vision-Bootcamp')[0] + 'Vision-Bootcamp'

# Remove the YOLO model, if it already exists
if os.path.isfile(f'../YOLO/Model/{CONST_CAMERA_STAND_NAME}_{CONST_DATASET_NAME}/{CONST_YOLO_SIZE}.pt'):
    print(f'[INFO] Removing the YOLO model.')
    os.remove(f'../YOLO/Model/{CONST_CAMERA_STAND_NAME}_{CONST_DATASET_NAME}/{CONST_YOLO_SIZE}.pt')

# Load a pre-trained YOLO model
model = YOLO(f'../YOLO/Model/{CONST_CAMERA_STAND_NAME}_{CONST_DATASET_NAME}/{CONST_YOLO_SIZE}.pt')

if CONST_FREEZE_BACKBONE == True:
    # Triggered when the training starts
    model.add_callback('on_train_start', Freeze_Backbone)

# Training the model on a custom dataset with additional dependencies (number of epochs, image size, etc.)
model.train(data=f'{project_folder}/YOLO/Configuration/config_tmp.yaml', batch=-1, imgsz=640, epochs=100, patience=0,
            rect=True, name=f'{project_folder}/YOLO/Results/{CONST_CAMERA_STAND_NAME}_{CONST_DATASET_NAME}/train_fb_{CONST_FREEZE_BACKBONE}')

## Validation of a Custom YOLO Model on Test Dataset

In this section, we validate a pre-trained YOLOv8 model on a test dataset. The script performs the following steps:

1. **Load pre-trained YOLO model**: The best weights of the trained YOLOv8 model are loaded from the specified checkpoint directory.
2. **Evaluate performance**: The model is evaluated on the test dataset using a confidence threshold (`0.001`) and an IoU threshold (`0.6`), measuring its accuracy and generalization performance.
3. **Save results**: The results of the evaluation are saved, including image outputs, bounding box coordinates, confidence scores, and other relevant metrics.

This script helps measure the accuracy of the trained model on the test set and stores the results for further analysis.

Validation of the YOLOv8 model after training allows us to assess its performance and fine-tune the model if necessary.

For more information, see: [YOLOv8 Validation Documentation](https://docs.ultralytics.com/modes/val/)

In [None]:
# Load a pre-trained custom YOLO model.
model = YOLO(f'{project_folder}/YOLO/Results/{CONST_CAMERA_STAND_NAME}_{CONST_DATASET_NAME}/train_fb_{CONST_FREEZE_BACKBONE}/weights/best.pt')

# Evaluate the performance of the model on the validation dataset.
model.val(data=f'{project_folder}/YOLO/Configuration/config_tmp.yaml', batch=32, imgsz=640, conf=0.001, iou=0.6, rect=True, 
          save_txt=True, save_conf=True, save_json=False, split='test', name=f'{project_folder}/YOLO/Results/{CONST_CAMERA_STAND_NAME}_{CONST_DATASET_NAME}/valid_fb_{CONST_FREEZE_BACKBONE}')

## Prediction (Testing) using the YOLOv8 Model on Test Dataset

In this section, we perform prediction (testing) using a trained YOLOv8 model on new images. The script performs the following steps:

1. **Load pre-trained YOLO model**: The best weights of the trained YOLOv8 model are loaded from the specified checkpoint file.
2. **Make predictions**: The model makes predictions on images from the test dataset, predicting both the classes and locations of objects in the input images. The confidence threshold is set to `0.5`, and the IoU threshold is `0.7`.
3. **Save results**: The predictions are saved as image outputs with bounding boxes, and the corresponding class labels and confidence scores are stored in text files.

For more information, see: [YOLOv8 Prediction Documentation](https://docs.ultralytics.com/modes/predict/)

This script helps evaluate the model's performance on the test set and stores the prediction results for further analysis.

In [None]:
# Load a pre-trained custom YOLO model.
model = YOLO(f'{project_folder}/YOLO/Results/{CONST_CAMERA_STAND_NAME}_{CONST_DATASET_NAME}/train_fb_{CONST_FREEZE_BACKBONE}/weights/best.pt')

# Predict (test) the model on a test dataset.
model.predict(source=f'{project_folder}/Data/{CONST_CAMERA_STAND_NAME}/{CONST_DATASET_NAME}/images/test', save=True, save_txt=True, save_conf=True, 
              imgsz=[480, 640], conf=0.5, iou=0.7, name=f'{project_folder}/YOLO/Results/{CONST_CAMERA_STAND_NAME}_{CONST_DATASET_NAME}/predict_fb_{CONST_FREEZE_BACKBONE}')