# EN3160: Image Processing and Machine Vision

## Project on Deep Learning for Vision

### Selected Project: ICIP 2022 Challenge Parasitic Egg Detection and Classification in Microscopic Images

> **Team Oculus**

> 200462U: N.W.P.R.A. Perera

> 200558U: A.M.P.S. Samarasekera

Our solution to this challenge is mainly based upon Ultralytics YOLOv8 model.

Ultralytics YOLOv8 is a cutting-edge, state-of-the-art model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks.

Reference: https://docs.ultralytics.com/

**Installing Ultralytics**

In [None]:
! pip install ultralytics

**Setting up the WANDB library** 

In [None]:
! pip install wandb

# Logging into the WANDB library with my API key
! wandb login f10c532bfd2239b23439cbb8c1bd31fe647f8c7e   

**Importing Other Dependancies**

In [None]:
import os 
import shutil
import json
import pandas as pd
from sklearn.model_selection import train_test_split
from IPython.display import FileLink
from ultralytics import YOLO

**Data Pre-Processing (Preparing the Chula Parasite Dataset for Training and Validation)**

In [None]:
#-------------------------------------------------------------------------------------------------------------#

### Loading the dataset labels from a JSON file.

file = open('/kaggle/input/chula-parasite-dataset/Chula-ParasiteEgg-11/Chula-ParasiteEgg-11/Chula-ParasiteEgg-11/labels.json')
data = json.load(file)
print()
print('SUCCESSFULLY LOADED THE DATASET LABELS!')
print()

#-------------------------------------------------------------------------------------------------------------#

### Converting the JSON file into a pandas DataFrame.

# Convert parts of the JSON file into 2 pandas DataFrames for easier manipulation
image_dataframe = pd.DataFrame.from_dict(pd.json_normalize(data['images']), orient='columns')
print()
print('Image DataFrame')
display(image_dataframe)
annotations_dataframe = pd.DataFrame.from_dict(pd.json_normalize(data['annotations']), orient='columns')
print()
print('Annotations DataFrame')
display(annotations_dataframe)
duplicate_values = annotations_dataframe['image_id'].duplicated()
print()
print('Duplicate Values in Annotations DataFrame')
display(duplicate_values)

# Merges the 2 DataFrames based on the 'id' column of image_dataframe and 'image_id' column of annotations_dataframe
merged_dataframe = pd.merge(image_dataframe, annotations_dataframe, left_on='id', right_on='image_id', how='inner')
# Drop the extra 'image_id' column as it's now redundant
merged_dataframe.drop(columns=['image_id'], inplace=True)  
print()
print('Merged DataFrame')
display(merged_dataframe)

#-------------------------------------------------------------------------------------------------------------#

### Calculating the YOLO bounding box values for each image.

"""
In COCO, a bounding box is defined by four values in pixels [x_min, y_min, width, height].

These are coordinates of the top-left corner along with the width and height of the bounding box.


In YOLO, a bounding box is represented by four values [x_center, y_center, width, height].

x_center and y_center are the normalized coordinates of the center of the bounding box. 
To make coordinates normalized, we take pixel values of x and y, which marks the center 
of the bounding box on the x-axis and y-axis. 
Then we divide the value of x by the width of the image and value of y by the height of the image. 

width and height represent the width and the height of the bounding box. 
They are normalized as well.
"""

# Computes YOLO-style bounding box coordinates and dimensions based on original bounding box data and merges them into a new column
merged_dataframe['bbox_yolo'] = merged_dataframe.apply(lambda row: [
    ((row['bbox'][0] + row['bbox'][2] / 2) / row['width']),
    ((row['bbox'][1] + row['bbox'][3] / 2) / row['height']),
    (row['bbox'][2] / row['width']),
    (row['bbox'][3] / row['height'])
], axis=1)

# Display the new DataFrame with the bbox_yolo field
print()
print('New DataFrame with bbox_yolo field')
display(merged_dataframe)     
print('SUCCESSFULLY CONVERTED THE JSON FILE INTO A PANDAS DATAFRAME WITH YOLO BOUNDING BOX VALUES!')
print()

#-------------------------------------------------------------------------------------------------------------#

### Splitting the merged DataFrame into training and validation sets.

# Splits the merged DataFrame into training and validation sets using a 80-20 split
training_dataframe, validation_dataframe = train_test_split(merged_dataframe, test_size=0.2, random_state=42)
print()
print('Training DataFrame')
display(training_dataframe)
print()
print('Validation DataFrame')
display(validation_dataframe)
print('SUCCESSFULLY SPLIT THE MERGED DATAFRAME INTO TRAINING AND VALIDATION SETS!')
print()

#-------------------------------------------------------------------------------------------------------------#

### Copying the training and validation images to their respective directories.

# Specify the source and destination paths
source_path = r"/kaggle/input/chula-parasite-dataset/Chula-ParasiteEgg-11/Chula-ParasiteEgg-11/Chula-ParasiteEgg-11/data"
training_set_destination_path   = r"/kaggle/working/Chula-ParasiteEgg-11/images/training_set"
validation_set_destination_path = r"/kaggle/working/Chula-ParasiteEgg-11/images/validation_set"

# Create destination directories if they don't exist
os.makedirs(training_set_destination_path,   exist_ok=True)
os.makedirs(validation_set_destination_path, exist_ok=True)

# Copy files for the training set
for index, row in training_dataframe.iterrows():
    filename = row['file_name']
    src_file = os.path.join(source_path, filename)
    dst_file = os.path.join(training_set_destination_path, filename)
    if os.path.exists(src_file):
        shutil.copy(src_file, dst_file)
    else:
        print(f"Source file {src_file} does not exist.")

# Copy files for the validation set
for index, row in validation_dataframe.iterrows():
    filename = row['file_name']
    src_file = os.path.join(source_path, filename)
    dst_file = os.path.join(validation_set_destination_path, filename)
    shutil.copy(src_file, dst_file)

print('SUCCESSFULLY COPIED THE TRAINING SET AND VALIDATION SET IMAGES TO KAGGLE WORKING DIRECTORY!')
print()

#-------------------------------------------------------------------------------------------------------------#

### Creating text files for the training and validation labels.
### These text files will be used by the yolov8n model for training and validation.

# Define the output directories
training_set_labels_dir   = '/kaggle/working/Chula-ParasiteEgg-11/labels/training_set'
validation_set_labels_dir = '/kaggle/working/Chula-ParasiteEgg-11/labels/validation_set'

# Create output directories if they don't exist
os.makedirs(training_set_labels_dir,   exist_ok=True)
os.makedirs(validation_set_labels_dir, exist_ok=True)

# Creating text files for training_dataframe
for index, row in training_dataframe.iterrows():
    category_id = row['category_id']
    bbox = row['bbox_yolo']
    filename = os.path.splitext(row['file_name'])[0]    # Remove the '.jpg' extension
    text_content = f"{category_id} {' '.join(map(str, bbox))}"   # Create the text content
    output_filename = os.path.join(training_set_labels_dir, f"{filename}.txt")   
    with open(output_filename, 'w') as text_file:
        text_file.write(text_content)   # Write the text content to a file in the training output directory

# Creating text files for validation_dataframe
for index, row in validation_dataframe.iterrows():
    category_id = row['category_id']
    bbox = row['bbox_yolo']
    filename = os.path.splitext(row['file_name'])[0]   # Remove the '.jpg' extension
    text_content = f"{category_id} {' '.join(map(str, bbox))}"   # Create the text content
    output_filename = os.path.join(validation_set_labels_dir, f"{filename}.txt")
    with open(output_filename, 'w') as text_file:
        text_file.write(text_content)   # Write the text content to a file in the validation output directory

#-------------------------------------------------------------------------------------------------------------#

**Training the Model**

In [None]:
# Load and initializes the yolov8n model using the configuration file 'yolov8n.yaml'
# yolov8n model is used for object detection
# This configuration file specifies the architecture, hyperparameters, and other settings of the YOLO model
model = YOLO('yolov8n.yaml').load('yolov8n.pt')  

# Specify the number of epochs for training
n_epochs = 30

# Training the model
results  = model.train(data="/kaggle/input/parasite-configuration/parasite_configure.yaml", epochs=n_epochs)
print('SUCCESSFULLY TRAINED THE MODEL!')

**Evaluating the Model**

In [None]:
# Evaluate the model's performance on the validation set
metrics = model.val()
print('SUCCESSFULLY EVALUATED THE MODEL ON THE VALIDATION SET!')

**Packaging the results as ZIP archives for ease of downloading**

In [None]:
source_directory = '/kaggle/working/wandb'
zip_file_path    = '/kaggle/working/wandb.zip'
shutil.make_archive(zip_file_path.split(".")[0], 'zip', source_directory)

In [None]:
source_directory = '/kaggle/working/runs'
zip_file_path    = '/kaggle/working/runs.zip'
shutil.make_archive(zip_file_path.split(".")[0], 'zip', source_directory)

In [None]:
print('SUCCESSFULLY CREATED THE ZIP FILES!')