## Example of annotation creation in YOLOv8 format

You need to set dataset directory, path to annotation folder, and path to image folder.

The target classes that are selected for format convertation can be found in file "metadata/metadata.py". For instance, we can set the following classes for Tomato detection dataset:

- "leaf": 2 
- "fruit": 3

In [1]:
dataset_dir = './datasets/Tomato detection/Tomato detection/' 
ann_folder = 'ann'
img_folder = 'img'

obj_classes = [2, 3] # 2 -- leaf, 3 -- fruit

Then, we need to read the file 'part_statistics.csv'. It includes information of object location and class.

In [2]:
import pandas as pd

dataset_statistics = pd.read_csv(dataset_dir + 'part_statistics.csv') 

To train YOLOv8 model dataset should be presented in the following format where each line in the file corresponds to object class and location:

{object_class_id} {x_center} {y_center} {width} {height}

In [6]:
def convert_annotation_yolov8(dataset_statistics):
    # compute statistics with object location in yolov8 format for object detection
    
    file_dict = {}
    for row in dataset_statistics.iterrows():
        file_name = row[1][0].split('.')[0]
        centroids_norm_x = row[1][13]
        centroids_norm_y = row[1][14]
        width_norm = row[1][28]
        height_norm = row[1][27]
        class_type = row[1][10]

        obj_class = class_type
        x_center = centroids_norm_x
        y_center = centroids_norm_y
        width = width_norm
        height = height_norm

        if obj_class in obj_classes:   
            if file_name not in file_dict.keys():
                file_dict[file_name] = [' '.join(str(x) for x in [obj_class, x_center, y_center, width, height])] # obj_class --> 0
            else:
                file_dict[file_name] += [' '.join(str(x) for x in [obj_class, x_center, y_center, width, height])]
                
    return file_dict

In [7]:
file_dict = convert_annotation_yolov8(dataset_statistics)

In [8]:
# check files that are preprocessed to create annotation
file_dict.keys()

dict_keys(['tomato0', 'tomato102', 'tomato108', 'tomato116', 'tomato130', 'tomato139', 'tomato142', 'tomato161', 'tomato170', 'tomato190', 'tomato199', 'tomato212', 'tomato221', 'tomato23', 'tomato233', 'tomato238', 'tomato240', 'tomato241', 'tomato246', 'tomato268', 'tomato272', 'tomato28', 'tomato292', 'tomato301', 'tomato303', 'tomato31', 'tomato317', 'tomato321', 'tomato330', 'tomato333', 'tomato357', 'tomato360', 'tomato362', 'tomato375', 'tomato388', 'tomato391', 'tomato414', 'tomato415', 'tomato450', 'tomato453', 'tomato491', 'tomato50', 'tomato501', 'tomato503', 'tomato513', 'tomato521', 'tomato525', 'tomato528', 'tomato556', 'tomato558', 'tomato560', 'tomato596', 'tomato601', 'tomato613', 'tomato615', 'tomato616', 'tomato620', 'tomato630', 'tomato64', 'tomato68', 'tomato650', 'tomato667', 'tomato681', 'tomato687', 'tomato706', 'tomato728', 'tomato738', 'tomato77', 'tomato781', 'tomato793', 'tomato804', 'tomato819', 'tomato829', 'tomato83', 'tomato830', 'tomato831', 'tomato838'

For each image file, we create separate annotation file:

In [9]:
for key in file_dict.keys():
    with open(dataset_dir + ann_folder + '/' + key + '.txt', 'w') as f:
        for line in file_dict[key]:
            f.write(f"{line}\n")