<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Capstone Project - TheiaVision: Object Detection Technology for PMD Safety Alerts

> Authors: Ng Wei
---

**Problem Statement:**  
How can we enhance the safety of Personal Mobility Devices (PMDs) in urban environments by using object detection to improve PMD users' ability to perceive and respond to their surroundings?

**Target Audience:**
Management Team of PMD Maker

**Summary:**
This project aims to develop a object detection system to identifies obstacles such as pedestrians, vehicles, and traffic signs. By leveraging CNN algorithms YOLO model, this system would help Darren, a project manager, to lead the development of an alert system with object detection technology

There are a total of three notebooks for this project:  
 1. `01_EDA.ipynb`   
 2. `02_Modelling_Pytorch_SimpleCNN_SGTrafficSign.ipynb`   
 3. `03_Modelling_YOLOv8_labeled_SGTrafficSign.ipynb`
 4. `04_Merge_MultiDataset.ipynb`
 5. `05_Modelling_YOLOv8_combined_data.ipynb`
 6. `06_YOLOv8_Hyperparameter_Tuning.ipynb`

---
**This Notebook**
- We will get the additional dataset from Udacity and merge with labeled Singapore Traffic Sign dataset

# 4.1 Download Udacity Self Driving Car Dataset after labeled on Roboflow
This section of the code is responsible for importing the necessary libraries that will be used in the program.

## 4.1.1 Download additional dataset from Roboflow to merge with exisiting training dataset.
The public dataset link: [Roboflow](https://public.roboflow.com/object-detection/self-driving-car/)

In [None]:
!pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key="Uf5F9mzBsL27ozIYiWvW")
project = rf.workspace("roboflow-gw7yv").project("self-driving-car")
version = project.version(3)
dataset = version.download("yolov8")

## 4.1.2 Merge two data into Capstone/code as `Master_Dataset`

# 4.2 Modity label files in `Master_Dataset`

- We are merging Udacity dataset into Traffic Sign dataset. However, the label have similar classes. We need to change the labels.
- Since traffic sign dataset have 7 classes (class 0 to 6), we need to change the Udacity classes starting from class 7 (class 0 --> class 7)

## 4.2.1 Modify label files for Udacity dataset by increment the class number

In [None]:
import os

def adjust_label_indices(label_dir, adjustment):
    for label_file in os.listdir(label_dir):
        path = os.path.join(label_dir, label_file)
        with open(path, 'r+') as file:
            lines = file.readlines()
            file.seek(0)
            file.truncate()  # Clear existing content
            for line in lines:
                parts = line.split()
                parts[0] = str(int(parts[0]) + adjustment)  # Adjust the class index
                file.write(" ".join(parts) + "\n")

# Adjust labels for the self-driving car dataset (class indices start at 7)
adjust_label_indices('/home/mangguai/capstone/code/Master_Dataset/export/labels', 7)

The `count_classes` function reads label files from a specified directory to count the occurrences of each class index. It uses a `defaultdict` from Python's `collections` module for storing and updating these counts. Each label file is read line-by-line, extracting the class index, which is then incremented in the dictionary. The function returns a dictionary with the total counts of each class index and is demonstrated with an example that prints these counts and the number of unique classes detected.


In [None]:
import os
from collections import defaultdict

def count_classes(label_dir):
    class_counts = defaultdict(int)  # Dictionary to store counts of each class index
    
    for label_file in os.listdir(label_dir):
        path = os.path.join(label_dir, label_file)
        with open(path, 'r') as file:
            lines = file.readlines()
            for line in lines:
                class_index = line.split()[0]  # Get the class index from each line
                class_counts[int(class_index)] += 1  # Increment count for this class

    return class_counts

label_dir = '/home/mangguai/capstone/code/Master_Dataset/export/labels'
class_counts = count_classes(label_dir)
print("Class counts:")
for class_index in sorted(class_counts):
    print(f"Class {class_index}: {class_counts[class_index]} images")
print("Number of unique classes:", len(class_counts))


The `find_label_files_with_classes` function scans a specified directory for label files containing certain class indices, limiting the results to a maximum of five files per class. It employs a `defaultdict` from the `collections` module to organize the files by class index. The function reads each label file, extracts class indices from each line, and adds the file name to a list for the corresponding class if the class index falls within a user-defined range and the file hasn't been added previously. This method ensures that no more than five files are listed per class, providing a manageable dataset for analysis or machine learning model training. Example usage is provided to demonstrate how to call the function and print the files associated with each class index clearly, using new lines for readability.


In [None]:
import os
from collections import defaultdict

def find_label_files_with_classes(label_dir, class_range):
    class_files = defaultdict(list)  # Dictionary to store lists of files for each class index

    for label_file in os.listdir(label_dir):
        path = os.path.join(label_dir, label_file)
        with open(path, 'r') as file:
            lines = file.readlines()
            found_classes = set()  # To avoid adding the same file multiple times for the same class
            for line in lines:
                class_index = int(line.split()[0])  # Get the class index from each line
                if class_index in class_range and class_index not in found_classes:
                    if len(class_files[class_index]) < 5:  # Ensure no more than 5 files are stored
                        class_files[class_index].append(label_file)
                    found_classes.add(class_index)

    return class_files

# Define the class range you are interested in
class_range = set(range(7, 18))  # From 7 to 17

# Example usage
label_dir = '/home/mangguai/capstone/code/Master_Dataset/export/labels'
class_files = find_label_files_with_classes(label_dir, class_range)

# Printing the results with newline for each file name
for class_index in class_range:
    if class_index in class_files:  # Check if there are any files for the class
        files = "\n".join(class_files[class_index])  # Join files with newline character
        print(f"Class {class_index} has the following label files:\n{files}\n")
    else:
        print(f"Class {class_index} has no label files.\n")


# 4.2.2 Merge Traffic Light Left into Traffic Light class

Based on the need to consolidate traffic light classes with their "left" directional variants and to streamline class indices, the following mappings have been established:

- **Biker (7)** remains as class 7.
- **Car (8)** remains as class 8.
- **Pedestrians (9)** remains as class 9.
- **Traffic Light (10)** remains as class 10.
- **Traffic Light-Green (11)** merges with **Traffic Light-GreenLeft (12)** to form the new class 11.
- **Traffic Light-Red (13)** merges with **Traffic Light-RedLeft (14)** to form the new class 12.
- **Traffic Light-Yellow (15)** merges with **Traffic Light-YellowLeft (16)** to form the new class 13.
- **Truck (17)** is reassigned to class 14.

This approach reduces the total number of classes by merging standard and "left" variations of traffic lights into single classes and renumbering the subsequent classes. Here is the updated class structure:

- **7**: Biker
- **8**: Car
- **9**: Pedestrians
- **10**: Traffic Light
- **11**: Traffic Light-Green (merged with Traffic Light-GreenLeft)
- **12**: Traffic Light-Red (merged with Traffic Light-RedLeft)
- **13**: Traffic Light-Yellow (merged with Traffic Light-YellowLeft)
- **14**: Truck

In [None]:
import os

def adjust_classes(label_dir, mapping):
    for label_file in os.listdir(label_dir):
        path = os.path.join(label_dir, label_file)
        with open(path, 'r+') as file:
            lines = file.readlines()
            file.seek(0)
            file.truncate()  # Clear the file
            for line in lines:
                parts = line.split()
                class_index = int(parts[0])
                if class_index in mapping:
                    parts[0] = str(mapping[class_index])  # Update class index based on mapping
                file.write(" ".join(parts) + "\n")

# Define your class mapping
mapping = {
    7: 7,
    8: 8,
    9: 9,
    10: 10,
    11: 11,
    12: 11,
    13: 12,
    14: 12,
    15: 13,
    16: 13,
    17: 14
}

# Example usage
label_dir = '/home/mangguai/capstone/code/Master_Dataset/export/labels'
adjust_classes(label_dir, mapping)


To ensure the accuracy of the redefined class indices and their corresponding mergers, a thorough review was conducted. This verification process confirms that the above implementation of merging directional variations of traffic lights into single classes and the reassignment of indices is correctly applied. Here’s the final class structure after verification:

- **7**: Biker
- **8**: Car
- **9**: Pedestrians
- **10**: Traffic Light
- **11**: Traffic Light-Green (merged with Traffic Light-GreenLeft)
- **12**: Traffic Light-Red (merged with Traffic Light-RedLeft)
- **13**: Traffic Light-Yellow (merged with Traffic Light-YellowLeft)
- **14**: Truck

This confirmation ensures the implementation aligns with the intended design and functionality requirements.


In [None]:
import os
from collections import defaultdict

def count_classes(label_dir):
    class_counts = defaultdict(int)  # Dictionary to store counts of each class index
    
    for label_file in os.listdir(label_dir):
        path = os.path.join(label_dir, label_file)
        with open(path, 'r') as file:
            lines = file.readlines()
            for line in lines:
                class_index = line.split()[0]  # Get the class index from each line
                class_counts[int(class_index)] += 1  # Increment count for this class

    return class_counts

# Example usage
label_dir = '/home/mangguai/capstone/code/Master_Dataset/export/labels'
class_counts = count_classes(label_dir)
print("Class counts:")
for class_index in sorted(class_counts):
    print(f"Class {class_index}: {class_counts[class_index]} images")
print("Number of unique classes:", len(class_counts))


Now we can modify the `data.yaml` with the new modified classes above.

---

# 4.3 Combining `Self Driving Car` images file into `Singapore Traffic Sign` Train/Test/Valid folders

`Self Driving Car` image data under `export` folder, we need to merge into `Singapore Traffic Sign`'s train/test/valid folders

## 4.3.1 Dataset Splitting Summary

#### 4.3.1.1 Objective
To split the dataset into training (70%), validation (20%), and testing (10%) sets, prioritizing the representation of rarer classes to ensure even distribution across the splits.

#### 4.3.1.2 Process Overview
1. **Count Class Frequencies:** Calculate the frequency of each class appearing in the label files. This helps identify rare classes.
2. **Assign Images to Splits Based on Rarity:** Sort images based on the rarity of classes they contain. This ensures that images containing rarer classes are prioritized in the distribution, helping to maintain class balance across training, validation, and testing sets.
3. **Distribute Files:** Move images and their corresponding label files into the respective directories based on their assigned split.

#### 4.3.1.3 Implementation Details
- **Directories Involved:**
  - Source Directory: `/home/mangguai/capstone/code/Master_Dataset/export`
  - Target Directories:
    - Training: `/home/mangguai/capstone/code/Master_Dataset/train`
    - Validation: `/home/mangguai/capstone/code/Master_Dataset/valid`
    - Testing: `/home/mangguai/capstone/code/Master_Dataset/test`
- **File Management:** Ensure each image and its corresponding label file are moved together to maintain consistency between image data and annotations.

#### 4.3.1.4 Considerations
- **File Formats:** Adjust the script if your image or label files use formats other than .jpg for images or .txt for labels.
- **Random Seed:** Set a random seed (e.g., 42) for reproducibility of the splits.
- **Directory Checks:** Verify the existence of the target directories before running the script or incorporate directory creation logic within the script.

#### 4.3.1.5 Execution
Run the provided Python script to perform the split. Monitor the process to verify that files are appropriately distributed and that class distribution goals are met.


In [None]:
import os
import shutil
from collections import defaultdict
from sklearn.model_selection import train_test_split

def count_classes(label_dir):
    class_counts = defaultdict(int)
    image_classes = defaultdict(set)
    
    for label_file in os.listdir(label_dir):
        path = os.path.join(label_dir, label_file)
        with open(path, 'r') as file:
            for line in file:
                class_index = line.split()[0]
                class_counts[int(class_index)] += 1
                image_classes[label_file].add(int(class_index))
    return class_counts, image_classes

def split_datasets(base_dir, output_dirs, split_ratio=(0.7, 0.2, 0.1)):
    label_dir = os.path.join(base_dir, 'labels')
    image_dir = os.path.join(base_dir, 'images')
    
    class_counts, image_classes = count_classes(label_dir)
    
    # Sort images by the rarity of their classes
    sorted_images = sorted(image_classes.keys(), key=lambda x: min(class_counts[i] for i in image_classes[x]))
    
    # Split images into training, validation, and testing
    train_val, test = train_test_split(sorted_images, test_size=split_ratio[2], random_state=42)
    train, val = train_test_split(train_val, test_size=split_ratio[1] / (split_ratio[0] + split_ratio[1]), random_state=42)
    
    # Function to move files
    def move_files(files, dest_dir):
        for file_name in files:
            # Move label file
            shutil.move(os.path.join(label_dir, file_name), os.path.join(dest_dir, 'labels', file_name))
            # Move corresponding image file
            image_file_name = file_name.replace('.txt', '.jpg')  # Assuming image files are .jpg
            shutil.move(os.path.join(image_dir, image_file_name), os.path.join(dest_dir, 'images', image_file_name))
    
    # Move files to respective directories
    move_files(train, os.path.join(output_dirs['train']))
    move_files(val, os.path.join(output_dirs['valid']))
    move_files(test, os.path.join(output_dirs['test']))

# Example usage
base_dir = '/home/mangguai/capstone/code/Master_Dataset/export'
output_dirs = {
    'train': '/home/mangguai/capstone/code/Master_Dataset/train',
    'valid': '/home/mangguai/capstone/code/Master_Dataset/valid',
    'test': '/home/mangguai/capstone/code/Master_Dataset/test'
}
split_datasets(base_dir, output_dirs)


Out of the total dataset, **3500 images** were identified as having no associated labels. These images are considered "leftover" as they cannot be directly used in supervised machine learning tasks which require labeled data for training.