# EN3160: Image Processing and Machine Vision

## Project on Deep Learning for Vision

### Selected Project: ICIP 2022 Challenge Parasitic Egg Detection and Classification in Microscopic Images

> **Team Oculus**

> 200462U: N.W.P.R.A. Perera

> 200558U: A.M.P.S. Samarasekera

Our solution to this challenge is mainly based upon Ultralytics YOLOv8 model.

Ultralytics YOLOv8 is a cutting-edge, state-of-the-art model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks.

Reference: https://docs.ultralytics.com/

**Installing Ultralytics**

In [1]:
! pip install ultralytics

Collecting ultralytics
  Downloading ultralytics-8.0.201-py3-none-any.whl (644 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m644.7/644.7 kB[0m [31m24.0 MB/s[0m eta [36m0:00:00[0m
Collecting thop>=0.1.1 (from ultralytics)
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Installing collected packages: thop, ultralytics
Successfully installed thop-0.1.1.post2209072238 ultralytics-8.0.201


**Setting up the WANDB library** 

In [2]:
! pip install wandb

# Logging into the WANDB library with my API key
! wandb login f10c532bfd2239b23439cbb8c1bd31fe647f8c7e   

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


**Importing Other Dependancies**

In [3]:
import os 
import shutil
import json
import pandas as pd
from sklearn.model_selection import train_test_split
from IPython.display import FileLink
from ultralytics import YOLO



**Data Pre-Processing (Preparing the Chula Parasite Dataset for Training and Validation)**

In [4]:
#-------------------------------------------------------------------------------------------------------------#

### Loading the dataset labels from a JSON file.

file = open('/kaggle/input/chula-parasite-dataset/Chula-ParasiteEgg-11/Chula-ParasiteEgg-11/Chula-ParasiteEgg-11/labels.json')
data = json.load(file)
print()
print('Data from labels.json file')
print(data)
print()
print('SUCCESSFULLY LOADED THE DATASET LABELS!')
print()

#-------------------------------------------------------------------------------------------------------------#

### Converting the JSON file into a pandas DataFrame.

# Convert parts of the JSON file into 2 pandas DataFrames for easier manipulation
image_dataframe = pd.DataFrame.from_dict(pd.json_normalize(data['images']), orient='columns')
print()
print('Image DataFrame')
display(image_dataframe)
annotations_dataframe = pd.DataFrame.from_dict(pd.json_normalize(data['annotations']), orient='columns')
print()
print('Annotations DataFrame')
display(annotations_dataframe)
duplicate_values = annotations_dataframe['image_id'].duplicated()
print()
print('Duplicate Values in Annotations DataFrame')
display(duplicate_values)

# Merges the 2 DataFrames based on the 'id' column of image_dataframe and 'image_id' column of annotations_dataframe
merged_dataframe = pd.merge(image_dataframe, annotations_dataframe, left_on='id', right_on='image_id', how='inner')
# Drop the extra 'image_id' column as it's now redundant
merged_dataframe.drop(columns=['image_id'], inplace=True)  
print()
print('Merged DataFrame')
display(merged_dataframe)

#-------------------------------------------------------------------------------------------------------------#

### Calculating the YOLO bounding box values for each image.

"""
In COCO, a bounding box is defined by four values in pixels [x_min, y_min, width, height].

These are coordinates of the top-left corner along with the width and height of the bounding box.


In YOLO, a bounding box is represented by four values [x_center, y_center, width, height].

x_center and y_center are the normalized coordinates of the center of the bounding box. 
To make coordinates normalized, we take pixel values of x and y, which marks the center 
of the bounding box on the x-axis and y-axis. 
Then we divide the value of x by the width of the image and value of y by the height of the image. 

width and height represent the width and the height of the bounding box. 
They are normalized as well.
"""

# Computes YOLO-style bounding box coordinates and dimensions based on original bounding box data and merges them into a new column
merged_dataframe['bbox_yolo'] = merged_dataframe.apply(lambda row: [
    ((row['bbox'][0] + row['bbox'][2] / 2) / row['width']),
    ((row['bbox'][1] + row['bbox'][3] / 2) / row['height']),
    (row['bbox'][2] / row['width']),
    (row['bbox'][3] / row['height'])
], axis=1)

# Display the new DataFrame with the bbox_yolo field
print()
print('New DataFrame with bbox_yolo field')
display(merged_dataframe)     
print('SUCCESSFULLY CONVERTED THE JSON FILE INTO A PANDAS DATAFRAME WITH YOLO BOUNDING BOX VALUES!')
print()

#-------------------------------------------------------------------------------------------------------------#

### Splitting the merged DataFrame into training and validation sets.

# Splits the merged DataFrame into training and validation sets using a 80-20 split
training_dataframe, validation_dataframe = train_test_split(merged_dataframe, test_size=0.2, random_state=42)
print()
print('Training DataFrame')
display(training_dataframe)
print()
print('Validation DataFrame')
display(validation_dataframe)
print('SUCCESSFULLY SPLIT THE MERGED DATAFRAME INTO TRAINING AND VALIDATION SETS!')
print()

#-------------------------------------------------------------------------------------------------------------#

### Copying the training and validation images to their respective directories.

# Specify the source and destination paths
source_path = r"/kaggle/input/chula-parasite-dataset/Chula-ParasiteEgg-11/Chula-ParasiteEgg-11/Chula-ParasiteEgg-11/data"
training_set_destination_path   = r"/kaggle/working/Chula-ParasiteEgg-11/images/training_set"
validation_set_destination_path = r"/kaggle/working/Chula-ParasiteEgg-11/images/validation_set"

# Create destination directories if they don't exist
os.makedirs(training_set_destination_path,   exist_ok=True)
os.makedirs(validation_set_destination_path, exist_ok=True)

# Copy files for the training set
for index, row in training_dataframe.iterrows():
    filename = row['file_name']
    src_file = os.path.join(source_path, filename)
    dst_file = os.path.join(training_set_destination_path, filename)
    if os.path.exists(src_file):
        shutil.copy(src_file, dst_file)
    else:
        print(f"Source file {src_file} does not exist.")

# Copy files for the validation set
for index, row in validation_dataframe.iterrows():
    filename = row['file_name']
    src_file = os.path.join(source_path, filename)
    dst_file = os.path.join(validation_set_destination_path, filename)
    shutil.copy(src_file, dst_file)

print('SUCCESSFULLY COPIED THE TRAINING SET AND VALIDATION SET IMAGES TO KAGGLE WORKING DIRECTORY!')
print()

#-------------------------------------------------------------------------------------------------------------#

### Creating text files for the training and validation labels.
### These text files will be used by the yolov8n model for training and validation.

# Define the output directories
training_set_labels_dir   = '/kaggle/working/Chula-ParasiteEgg-11/labels/training_set'
validation_set_labels_dir = '/kaggle/working/Chula-ParasiteEgg-11/labels/validation_set'

# Create output directories if they don't exist
os.makedirs(training_set_labels_dir,   exist_ok=True)
os.makedirs(validation_set_labels_dir, exist_ok=True)

# Creating text files for training_dataframe
for index, row in training_dataframe.iterrows():
    category_id = row['category_id']
    bbox = row['bbox_yolo']
    filename = os.path.splitext(row['file_name'])[0]    # Remove the '.jpg' extension
    text_content = f"{category_id} {' '.join(map(str, bbox))}"   # Create the text content
    output_filename = os.path.join(training_set_labels_dir, f"{filename}.txt")   
    with open(output_filename, 'w') as text_file:
        text_file.write(text_content)   # Write the text content to a file in the training output directory

# Creating text files for validation_dataframe
for index, row in validation_dataframe.iterrows():
    category_id = row['category_id']
    bbox = row['bbox_yolo']
    filename = os.path.splitext(row['file_name'])[0]   # Remove the '.jpg' extension
    text_content = f"{category_id} {' '.join(map(str, bbox))}"   # Create the text content
    output_filename = os.path.join(validation_set_labels_dir, f"{filename}.txt")
    with open(output_filename, 'w') as text_file:
        text_file.write(text_content)   # Write the text content to a file in the validation output directory

#-------------------------------------------------------------------------------------------------------------#


Data from labels.json file
{'info': {'year': '2022', 'version': '1.00', 'description': 'Chula-ParasiteEgg-11', 'contributor': '', 'url': 'https://icip2022challenge.piclab.ai/', 'date_created': '2022-01-30T08:01:37'}, 'licenses': [{'id': 1, 'name': 'Attribution 4.0 International License', 'url': 'https://creativecommons.org/licenses/by/4.0/'}], 'categories': [{'id': 0, 'name': 'Ascaris lumbricoides', 'supercategory': None}, {'id': 1, 'name': 'Capillaria philippinensis', 'supercategory': None}, {'id': 2, 'name': 'Enterobius vermicularis', 'supercategory': None}, {'id': 3, 'name': 'Fasciolopsis buski', 'supercategory': None}, {'id': 4, 'name': 'Hookworm egg', 'supercategory': None}, {'id': 5, 'name': 'Hymenolepis diminuta', 'supercategory': None}, {'id': 6, 'name': 'Hymenolepis nana', 'supercategory': None}, {'id': 7, 'name': 'Opisthorchis viverrine', 'supercategory': None}, {'id': 8, 'name': 'Paragonimus spp', 'supercategory': None}, {'id': 9, 'name': 'Taenia spp. egg', 'supercategory':

Unnamed: 0,id,file_name,height,width,license,coco_url
0,1,Hymenolepis nana_0001.jpg,672,1280,1,
1,2,Hymenolepis nana_0002.jpg,960,896,1,
2,3,Hymenolepis nana_0003.jpg,672,1280,1,
3,4,Hymenolepis nana_0004.jpg,960,1280,1,
4,5,Hymenolepis nana_0005.jpg,960,1280,1,
...,...,...,...,...,...,...
10995,10996,Hookworm egg_0996.jpg,3264,1714,1,
10996,10997,Hookworm egg_0997.jpg,3264,2448,1,
10997,10998,Hookworm egg_0998.jpg,3264,2448,1,
10998,10999,Hookworm egg_0999.jpg,2285,2448,1,



Annotations DataFrame


Unnamed: 0,id,image_id,category_id,bbox,area
0,1,1,6,"[555.0, 76.0, 177.0, 188.0]",33276.0
1,2,2,6,"[549.0, 459.0, 178.0, 151.0]",26878.0
2,3,3,6,"[538.0, 449.0, 206.0, 170.0]",35020.0
3,4,4,6,"[542.0, 384.0, 173.0, 166.0]",28718.0
4,5,5,6,"[483.0, 373.0, 224.0, 190.0]",42560.0
...,...,...,...,...,...
11026,11027,10996,4,"[523.0, 1611.0, 198.0, 262.0]",51876.0
11027,11028,10997,4,"[1353.0, 1482.0, 218.0, 267.0]",58206.0
11028,11029,10998,4,"[1366.0, 1754.0, 171.0, 298.0]",50958.0
11029,11030,10999,4,"[1132.0, 1323.0000000000002, 211.0, 242.0]",51062.0



Duplicate Values in Annotations DataFrame


0        False
1        False
2        False
3        False
4        False
         ...  
11026    False
11027    False
11028    False
11029    False
11030    False
Name: image_id, Length: 11031, dtype: bool


Merged DataFrame


Unnamed: 0,id_x,file_name,height,width,license,coco_url,id_y,category_id,bbox,area
0,1,Hymenolepis nana_0001.jpg,672,1280,1,,1,6,"[555.0, 76.0, 177.0, 188.0]",33276.0
1,2,Hymenolepis nana_0002.jpg,960,896,1,,2,6,"[549.0, 459.0, 178.0, 151.0]",26878.0
2,3,Hymenolepis nana_0003.jpg,672,1280,1,,3,6,"[538.0, 449.0, 206.0, 170.0]",35020.0
3,4,Hymenolepis nana_0004.jpg,960,1280,1,,4,6,"[542.0, 384.0, 173.0, 166.0]",28718.0
4,5,Hymenolepis nana_0005.jpg,960,1280,1,,5,6,"[483.0, 373.0, 224.0, 190.0]",42560.0
...,...,...,...,...,...,...,...,...,...,...
11026,10996,Hookworm egg_0996.jpg,3264,1714,1,,11027,4,"[523.0, 1611.0, 198.0, 262.0]",51876.0
11027,10997,Hookworm egg_0997.jpg,3264,2448,1,,11028,4,"[1353.0, 1482.0, 218.0, 267.0]",58206.0
11028,10998,Hookworm egg_0998.jpg,3264,2448,1,,11029,4,"[1366.0, 1754.0, 171.0, 298.0]",50958.0
11029,10999,Hookworm egg_0999.jpg,2285,2448,1,,11030,4,"[1132.0, 1323.0000000000002, 211.0, 242.0]",51062.0



New DataFrame with bbox_yolo field


Unnamed: 0,id_x,file_name,height,width,license,coco_url,id_y,category_id,bbox,area,bbox_yolo
0,1,Hymenolepis nana_0001.jpg,672,1280,1,,1,6,"[555.0, 76.0, 177.0, 188.0]",33276.0,"[0.502734375, 0.25297619047619047, 0.13828125,..."
1,2,Hymenolepis nana_0002.jpg,960,896,1,,2,6,"[549.0, 459.0, 178.0, 151.0]",26878.0,"[0.7120535714285714, 0.5567708333333333, 0.198..."
2,3,Hymenolepis nana_0003.jpg,672,1280,1,,3,6,"[538.0, 449.0, 206.0, 170.0]",35020.0,"[0.50078125, 0.7946428571428571, 0.1609375, 0...."
3,4,Hymenolepis nana_0004.jpg,960,1280,1,,4,6,"[542.0, 384.0, 173.0, 166.0]",28718.0,"[0.491015625, 0.4864583333333333, 0.13515625, ..."
4,5,Hymenolepis nana_0005.jpg,960,1280,1,,5,6,"[483.0, 373.0, 224.0, 190.0]",42560.0,"[0.46484375, 0.4875, 0.175, 0.19791666666666666]"
...,...,...,...,...,...,...,...,...,...,...,...
11026,10996,Hookworm egg_0996.jpg,3264,1714,1,,11027,4,"[523.0, 1611.0, 198.0, 262.0]",51876.0,"[0.3628938156359393, 0.5337009803921569, 0.115..."
11027,10997,Hookworm egg_0997.jpg,3264,2448,1,,11028,4,"[1353.0, 1482.0, 218.0, 267.0]",58206.0,"[0.5972222222222222, 0.49494485294117646, 0.08..."
11028,10998,Hookworm egg_0998.jpg,3264,2448,1,,11029,4,"[1366.0, 1754.0, 171.0, 298.0]",50958.0,"[0.5929330065359477, 0.5830269607843137, 0.069..."
11029,10999,Hookworm egg_0999.jpg,2285,2448,1,,11030,4,"[1132.0, 1323.0000000000002, 211.0, 242.0]",51062.0,"[0.5055147058823529, 0.6319474835886215, 0.086..."


SUCCESSFULLY CONVERTED THE JSON FILE INTO A PANDAS DATAFRAME WITH YOLO BOUNDING BOX VALUES!


Training DataFrame


Unnamed: 0,id_x,file_name,height,width,license,coco_url,id_y,category_id,bbox,area,bbox_yolo
3101,3092,Capillaria philippinensis_0092.jpg,3264,1714,1,,3102,1,"[325.0, 1675.0, 148.0, 146.0]",21608.0,"[0.23278879813302217, 0.5355392156862745, 0.08..."
10982,10953,Hookworm egg_0953.jpg,3264,2448,1,,10983,4,"[1118.0, 1930.0, 206.00000000000003, 281.0]",57886.0,"[0.4987745098039216, 0.6343443627450981, 0.084..."
2122,2113,Enterobius vermicularis_0113.jpg,672,1280,1,,2123,2,"[630.0, 94.0, 203.0, 182.0]",36946.0,"[0.571484375, 0.27529761904761907, 0.15859375,..."
8935,8926,Opisthorchis viverrine_0926.jpg,3264,2448,1,,8936,7,"[1232.0, 1822.9999999999998, 124.0, 68.0]",8432.0,"[0.5285947712418301, 0.5689338235294117, 0.050..."
1921,1913,Ascaris lumbricoides_0913.jpg,2285,2448,1,,1922,0,"[1000.0, 457.0, 241.0, 260.0]",62660.0,"[0.4577205882352941, 0.25689277899343543, 0.09..."
...,...,...,...,...,...,...,...,...,...,...,...
5734,5725,Paragonimus spp_0725.jpg,960,1280,1,,5735,8,"[553.0, 302.0, 232.0, 358.0]",83056.0,"[0.52265625, 0.5010416666666667, 0.18125, 0.37..."
5191,5182,Paragonimus spp_0182.jpg,1080,1344,1,,5192,8,"[803.0, 457.0, 252.0, 158.0]",39816.0,"[0.6912202380952381, 0.4962962962962963, 0.187..."
5390,5381,Paragonimus spp_0381.jpg,2822,3024,1,,5391,8,"[1408.0, 690.0, 308.0, 381.0]",117348.0,"[0.5165343915343915, 0.3120127569099929, 0.101..."
860,860,Hymenolepis nana_0860.jpg,4032,3024,1,,861,6,"[1515.0, 2045.0000000000002, 224.0, 236.0]",52864.0,"[0.5380291005291006, 0.5364583333333334, 0.074..."



Validation DataFrame


Unnamed: 0,id_x,file_name,height,width,license,coco_url,id_y,category_id,bbox,area,bbox_yolo
8054,8045,Opisthorchis viverrine_0045.jpg,960,1280,1,,8055,7,"[514.0, 463.0, 100.0, 69.0]",6900.0,"[0.440625, 0.5182291666666666, 0.078125, 0.071..."
1595,1588,Ascaris lumbricoides_0588.jpg,4032,2117,1,,1596,0,"[466.0, 1971.0, 364.0, 243.0]",88452.0,"[0.3060935285781767, 0.5189732142857143, 0.171..."
8029,8020,Opisthorchis viverrine_0020.jpg,960,896,1,,8030,7,"[155.0, 456.0, 121.99999999999999, 64.0]",7808.0,"[0.24107142857142858, 0.5083333333333333, 0.13..."
4862,4853,Hymenolepis diminuta_0853.jpg,672,1280,1,,4863,5,"[527.0, 31.0, 250.0, 257.0]",64250.0,"[0.509375, 0.23735119047619047, 0.1953125, 0.3..."
7189,7180,Fasciolopsis buski_0180.jpg,4032,2117,1,,7190,3,"[265.0, 1606.0, 640.0, 504.0]",322560.0,"[0.27633443552196507, 0.46081349206349204, 0.3..."
...,...,...,...,...,...,...,...,...,...,...,...
9100,9088,Taenia spp. egg_0088.jpg,960,896,1,,9101,9,"[118.0, 394.0, 146.0, 167.0]",24382.0,"[0.21316964285714285, 0.4973958333333333, 0.16..."
1020,1020,Ascaris lumbricoides_0020.jpg,960,1280,1,,1021,0,"[529.0, 366.0, 213.0, 267.0]",56871.0,"[0.496484375, 0.5203125, 0.16640625, 0.278125]"
705,705,Hymenolepis nana_0705.jpg,1080,1920,1,,706,6,"[890.0, 485.0, 142.0, 116.0]",16472.0,"[0.5005208333333333, 0.5027777777777778, 0.073..."
2948,2939,Enterobius vermicularis_0939.jpg,2285,2448,1,,2949,2,"[1001.0, 1815.0, 145.0, 216.0]",31320.0,"[0.43852124183006536, 0.8415754923413566, 0.05..."


SUCCESSFULLY SPLIT THE MERGED DATAFRAME INTO TRAINING AND VALIDATION SETS!

SUCCESSFULLY COPIED THE TRAINING SET AND VALIDATION SET IMAGES TO KAGGLE WORKING DIRECTORY!



**Training the Model**

In [5]:
# Load and initializes the yolov8n model using the configuration file 'yolov8n.yaml'
# yolov8n model is used for object detection
# This configuration file specifies the architecture, hyperparameters, and other settings of the YOLO model
model = YOLO('yolov8n.yaml').load('yolov8n.pt')  

# Specify the number of epochs for training
n_epochs = 5

# Training the model
results  = model.train(data="/kaggle/input/parasite-configuration/parasite_configure.yaml", epochs=n_epochs)
print('SUCCESSFULLY TRAINED THE MODEL!')


                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]                
  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]             
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]                
  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]             
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]               
  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]           
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128

SUCCESSFULLY TRAINED THE MODEL!


**Evaluating the Model**

In [6]:
# Evaluate the model's performance on the validation set
metrics = model.val()
print('SUCCESSFULLY EVALUATED THE MODEL ON THE VALIDATION SET!')

Ultralytics YOLOv8.0.201 🚀 Python-3.10.12 torch-2.0.0 CUDA:0 (Tesla P100-PCIE-16GB, 16281MiB)
YOLOv8n summary (fused): 168 layers, 3007793 parameters, 0 gradients, 8.1 GFLOPs
[34m[1mval: [0mScanning /kaggle/working/Chula-ParasiteEgg-11/labels/validation_set.cache... 2205 images, 0 backgrounds, 3 corrupt: 100%|██████████| 2205/2205 [00:00<?, ?it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 138/138 [01:05<00:00,  2.09it/s]
                   all       2202       2202      0.951      0.955      0.982      0.876
               class_0       2202        201      0.989      0.891      0.982       0.92
               class_1       2202        213      0.989      0.848       0.98      0.837
               class_2       2202        182      0.894      0.929      0.933      0.794
               class_3       2202        210      0.983      0.957       0.99      0.906
               class_4       2202        190      0.974      0

SUCCESSFULLY EVALUATED THE MODEL ON THE VALIDATION SET!


**Packaging the results as ZIP archives for ease of downloading**

In [7]:
source_directory = '/kaggle/working/wandb'
zip_file_path    = '/kaggle/working/wandb.zip'
shutil.make_archive(zip_file_path.split(".")[0], 'zip', source_directory)

'/kaggle/working/wandb.zip'

In [8]:
source_directory = '/kaggle/working/runs'
zip_file_path    = '/kaggle/working/runs.zip'
shutil.make_archive(zip_file_path.split(".")[0], 'zip', source_directory)

'/kaggle/working/runs.zip'

In [9]:
print('SUCCESSFULLY CREATED THE ZIP FILES!')

SUCCESSFULLY CREATED THE ZIP FILES!
