<div align="center">

# YOLOv8 Custom Object Detection : Model Training on Custom Data

</div>

Welcome to this Kaggle notebook for YOLOv8 Custom Object Detection! In this notebook, we will cover the process of training a custom object detection model on a dataset of ants and insects. Using YOLOv8, we will explore the steps involved in custom object detection. With the tools provided in this Colab, you'll be able to follow along and gain valuable insights into the world of object detection. Let's get started!

In [19]:
import os
import shutil

def delete_directory_contents(directory_path = os.getcwd()):
    for filename in os.listdir(directory_path):
        file_path = os.path.join(directory_path, filename)
        try:
            if os.path.isfile(file_path) or os.path.islink(file_path):
                os.unlink(file_path)
            elif os.path.isdir(file_path):
                shutil.rmtree(file_path)
        except Exception as e:
            print('Failed to delete %s. Reason: %s' % (file_path, e))

delete_directory_contents()


## Importing libraries, modules and files

### Downloading annotations and metadata for training, validation and (optional) testing.

In [20]:
# training annotations and metadata
!wget https://storage.googleapis.com/openimages/v6/oidv6-train-annotations-bbox.csv

--2023-04-18 14:07:43--  https://storage.googleapis.com/openimages/v6/oidv6-train-annotations-bbox.csv
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.141.128, 142.251.2.128, 74.125.137.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.141.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2258447590 (2.1G) [text/csv]
Saving to: ‘oidv6-train-annotations-bbox.csv’


2023-04-18 14:07:58 (140 MB/s) - ‘oidv6-train-annotations-bbox.csv’ saved [2258447590/2258447590]



In [21]:
# validation annotations and metadata
!wget https://storage.googleapis.com/openimages/v5/validation-annotations-bbox.csv

--2023-04-18 14:07:58--  https://storage.googleapis.com/openimages/v5/validation-annotations-bbox.csv
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.141.128, 142.251.2.128, 74.125.137.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.141.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 25105048 (24M) [text/csv]
Saving to: ‘validation-annotations-bbox.csv’


2023-04-18 14:07:59 (175 MB/s) - ‘validation-annotations-bbox.csv’ saved [25105048/25105048]



In [22]:
# testing annotations and metadata
!wget https://storage.googleapis.com/openimages/v5/test-annotations-bbox.csv

--2023-04-18 14:07:59--  https://storage.googleapis.com/openimages/v5/test-annotations-bbox.csv
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.141.128, 142.251.2.128, 74.125.137.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.141.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 77484237 (74M) [text/csv]
Saving to: ‘test-annotations-bbox.csv’


2023-04-18 14:07:59 (148 MB/s) - ‘test-annotations-bbox.csv’ saved [77484237/77484237]



### Importing the github repository of the project 

In [None]:
!git clone https://github.com/mohamedamine99/YOLOv8-custom-object-detection

### Downloading the `downloader.py` file used later to download the dataset from OpenImages

In [None]:
!wget https://raw.githubusercontent.com/openimages/dataset/master/downloader.py

### Importing and installing required libraries and modules

In [None]:
# required by the downloader.py file
!pip install boto3

In [26]:
import os
import shutil

# install and import pyyaml used to create custom config files for training
%pip install pyyaml
import yaml


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## Data preprocessing

In [27]:
# This fuction is used to create a custom YAML file which is used later to configure training data
def create_yolo_yaml_config(yaml_filepath, dataset_path, dataset_labels):

    data = {'path':dataset_path,
            'train': os.path.join('images', 'train'),
            'val': os.path.join('images', 'validation'),
            'names':{i:label for i, label in enumerate(dataset_labels)}
            }

    # Save the changes to the file
    with open(yaml_filepath, 'w') as fp:
    # set sort_keys = False to preserve the order of keys
        yaml.dump(data, fp, sort_keys=False)



In [28]:
# This function returns a dict with the names and IDs of selected object we want to train our model to detect
def get_class_id(classes_file, class_names):
    id_name_dict = {}
    with open(classes_file, 'r') as f:
        for line in f:
            id, label = line.split(',')
            label = label.strip()
            #print(label)
            if label in class_names:
                print(label)
                id_name_dict[label] = id
        
    return id_name_dict

### Downloading Images and Labels for YOLOv8 Model Training on Target Objects

In [29]:
# Target objects to be detected
names = ['Ant', 'Insect']

# path to the dataset directory (recommended to use absolute path)
dataset_path = os.path.abspath(os.path.join('.', 'data')) 

# path to the YAML file that contains training configuration
yaml_filepath = os.path.join('.', 'config.yaml')

# Create a custom YAML config file based on the above selected target objects and dataset path
create_yolo_yaml_config(yaml_filepath, dataset_path, names)

In [30]:
# get the class IDs of the target objects, all detectable objects and their IDs 
# are pre-determined by OpenImage in class-descriptions-boxable.csv file
class_ids = get_class_id('./YOLOv8-custom-object-detection/class-descriptions-boxable.csv'
                         ,names)
print(class_ids)
print(names)

Insect
Ant
{'Insect': '/m/03vt0', 'Ant': '/m/0_k2'}
['Ant', 'Insect']


In [31]:
# Create a list of annotated images to be downloaded by the downloader script download.py
# the list is a text file in the following format : $SPLIT/$IMAGE_ID
# example : 
# train/f9e0434389a1d4dd
# train/1a007563ebc18664


train_bboxes_filename = os.path.join('.', 'oidv6-train-annotations-bbox.csv')
validation_bboxes_filename = os.path.join('.', 'validation-annotations-bbox.csv')
test_bboxes_filename = os.path.join('.', 'test-annotations-bbox.csv')

image_list_file_path = os.path.join('.', 'image_list_file.txt')

image_list_file_list = []
for j, filename in enumerate([train_bboxes_filename, validation_bboxes_filename, test_bboxes_filename]):
    print(filename)
    with open(filename, 'r') as f:
        line = f.readline()
        while len(line) != 0:
            id, _, class_name, _, x1, x2, y1, y2, _, _, _, _, _ = line.split(',')[:13]
            if class_name in list(class_ids.values()) and id not in image_list_file_list:
                image_list_file_list.append(id)
                with open(image_list_file_path, 'a') as fw:
                    fw.write('{}/{}\n'.format(['train', 'validation', 'test'][j], id))
            line = f.readline()

        f.close()

./oidv6-train-annotations-bbox.csv
./validation-annotations-bbox.csv
./test-annotations-bbox.csv


In [32]:
DATA_ALL_DIR = os.path.join('.', 'data_all') # directory that contains all downloaded data selected from the list
DATA_OUT_DIR = os.path.join('.', 'data') # directory that contains data reorganized in YOLO format
os.makedirs(DATA_ALL_DIR)
os.makedirs(DATA_OUT_DIR)

In [33]:
# run the donwloader script in order to download data related to the target objects 
# and according to the image_list_file.txt
!python downloader.py ./image_list_file.txt --download_folder=./data_all --num_processes=5

Downloading images: 100% 7430/7430 [05:38<00:00, 21.94it/s]


### Re-structuring the Dataset in YOLO-Compatible Format

In [34]:
# Create a dataset in the yolo format with two main directories : images and labels
# each containing train and validation data 


# Create the empty dataset in yolo format
for dir_ in ['images', 'labels']:
    for set_ in ['train', 'validation', 'test']:
        new_dir = os.path.join(DATA_OUT_DIR, dir_, set_)
        if os.path.exists(new_dir):
            shutil.rmtree(new_dir)
        os.makedirs(new_dir)


In [35]:

# Fill the dataset with the appropriate images and labels in the appropriate format

for j, filename in enumerate([train_bboxes_filename, validation_bboxes_filename, test_bboxes_filename]):
    set_ = ['train', 'validation', 'test'][j]
    print(filename)
    with open(filename, 'r') as f:
        line = f.readline()
        while len(line) != 0:
            id, _, class_name, _, x1, x2, y1, y2, _, _, _, _, _ = line.split(',')[:13]
            if class_name in list(class_ids.values()):

                if not os.path.exists(os.path.join(DATA_OUT_DIR, 'images', set_, '{}.jpg'.format(id))):

                    shutil.copy(os.path.join(DATA_ALL_DIR, '{}.jpg'.format(id)),
                                os.path.join(DATA_OUT_DIR, 'images', set_, '{}.jpg'.format(id)))
                    
                with open(os.path.join(DATA_OUT_DIR, 'labels', set_, '{}.txt'.format(id)), 'a') as f_ann:
                    # class_id, xc, yx, w, h
                    # 
                    x1, x2, y1, y2 = [float(j) for j in [x1, x2, y1, y2]]
                    xc = (x1 + x2) / 2
                    yc = (y1 + y2) / 2
                    w = x2 - x1
                    h = y2 - y1
                    
                    # class id = 0 if 'Ant' and 1 if 'Insect'
                    name = [k for k, v in class_ids.items() if v == class_name][0]
                    class_id = names.index(name)
                    
                    #*****
                    f_ann.write('{} {} {} {} {}\n'.format(class_id, xc, yc, w, h))
                    f_ann.close()

            line = f.readline()

./oidv6-train-annotations-bbox.csv
./validation-annotations-bbox.csv
./test-annotations-bbox.csv


## Training the object detection model

In [None]:
%pip install ultralytics
import ultralytics
ultralytics.checks()

In [None]:
from ultralytics import YOLO


# Load a model
model = YOLO("yolov8n.pt")  

# Use the model for training
results = model.train(data='./config.yaml', 
                      epochs=150)  # train the model


## Preparing results for download

In [None]:
!zip -r train_results.zip ./runs