# Deep Learning for Deciphering Traffic Signs
## Fixing Test Data Notebook
_________________________________________________________________________________________________________________________________________________________________________________

##### Contributors:
 Victor Floriano, Yifan Fan, Jose Salerno

## Problem Statement & Motivation
As the world advances towards autonomous vehicles, our team has observed the remarkable efforts of large car manufacturers, who are working with data scientists to develop fully autonomous cars. Our team is excited to contribute to the development of this technology by creating a neural network model that will be able to classify different traffic signs. Our ultimate goal is to assist car makers in overcoming the challenges they may face in implementing neural network models that effectively read traffic signs and further their efforts toward a fully autonomous car or assisted driving. We believe autonomous driving to be an important problem to solve due to the great economic benefits it can generate for car manufacturers and the improvement of general driving safety.

## Data Preparation
 We've selected the German Traffic Sign Recognition Benchmark (GTSRB) as our primary dataset. It's renowned for its complexity, featuring over 50,000 images across more than 40 classes of traffic signs. The GTSRB is publicly accessible through two resources. To efficiently manage the extensive and complex GTSRB dataset, our strategy integrates preprocessing for uniformity, data augmentation for robustness, and batch processing for computational efficiency. We'll employ distributed computing to parallelize operations, enhancing processing speed, and use stratified sampling for quick experimentation without compromising representativeness.



---





# Fixing Test Dataset

 We encountered an issue where the test images were in one folder, which included 12629 images and needed to be organized into specific class folders like the Training dataset folders. We had the availability of a CSV file with the path location and class label for each test observation. To fix the issue, we iterated over values on the CSV file and created a loop where it would create a new folder directory for each different image and copy them into the correct class folder.
________________________________________________________________________________________________________________________________________________

Results: 
    - Created a new test folder that contained 43 class folders for the test data.

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
import os

#Image related
import cv2
from PIL import Image


import matplotlib.pyplot as plt
import seaborn as sns

#Time
import time
import datetime

In [None]:
data = []
labels = []

#Define total number of classes in the datasets
classes = 43

#Retrieves the current working directory - to be used later to load the data
cur_path = os.getcwd()

In [None]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [None]:
import cv2
import pandas as pd
import os
import shutil

#Load the CSV to get image paths and labels
csv_path = '/content/drive/MyDrive/BU_MSBA/BA865 - Neural Networks/BA865 - Group Project/GTSRBkaggle/Test.csv'
y_test_df = pd.read_csv(csv_path)

#Define the base path where the original test images are stored
base_path = '/content/drive/MyDrive/BU_MSBA/BA865 - Neural Networks/BA865 - Group Project/GTSRBkaggle/'

#Define the path for the new organized test images (without resizing)
new_base_path = os.path.join(base_path, 'Test_organized')

#Create the new base directory if it does not exist
if not os.path.exists(new_base_path):
    os.makedirs(new_base_path)

#Iterate over the DataFrame rows
for i, row in y_test_df.iterrows():
    img_path, label = row['Path'], row['ClassId']
    full_path = os.path.join(base_path, img_path)
    new_dir_path = os.path.join(new_base_path, str(label))

    #Create a directory for the class if it does not exist
    if not os.path.exists(new_dir_path):
        os.makedirs(new_dir_path)

    try:
        #Define the new image path
        new_img_path = os.path.join(new_dir_path, os.path.basename(img_path))

        #Copy the original image to the new path without altering its size
        shutil.copy(full_path, new_img_path)
    except Exception as e:
        print(f"Error processing image {full_path}: {e}")

    #Print progress
    if (i + 1) % 100 == 0:
        print(f"Processed {i + 1} images")

print("Image organization complete.")


Processed 100 images
Processed 200 images
Processed 300 images
Processed 400 images
Processed 500 images
Processed 600 images
Processed 700 images
Processed 800 images
Processed 900 images
Processed 1000 images
Processed 1100 images
Processed 1200 images
Processed 1300 images
Processed 1400 images
Processed 1500 images
Processed 1600 images
Processed 1700 images
Processed 1800 images
Processed 1900 images
Processed 2000 images
Processed 2100 images
Processed 2200 images
Processed 2300 images
Processed 2400 images
Processed 2500 images
Processed 2600 images
Processed 2700 images
Processed 2800 images
Processed 2900 images
Processed 3000 images
Processed 3100 images
Processed 3200 images
Processed 3300 images
Processed 3400 images
Processed 3500 images
Processed 3600 images
Processed 3700 images
Processed 3800 images
Processed 3900 images
Processed 4000 images
Processed 4100 images
Processed 4200 images
Processed 4300 images
Processed 4400 images
Processed 4500 images
Processed 4600 imag

-----

# References: 
- Generative AI was utilized for Debugging, code improvement, sentence structure and grammar.
- Chollet, Francois. “The Keras Blog.” The Keras Blog ATOM, blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html.  Accessed 24 Apr. 2024. 
- Elhamod, Mohammad. “Transfer_Learning.Ipynb.” GitHub, 2024, github.com/elhamod/BA865-2024.git.  
- Mykola. “GTSRB - German Traffic Sign Recognition Benchmark.” Kaggle, 25 Nov. 2018, www.kaggle.com/datasets/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign. 
- Poojahira, Hiranandani. “Poojahira/Gtsrb-Pytorch: Pytorch Implementation of Kaggle GTSRB Challenge with 99.8% Accuracy.” GitHub, 2018, github.com/poojahira/gtsrb-pytorch. 
- Psomas, Bill. “Billpsomas/Traffic_signs_classification: German Traffic Signs Classification Using Neural Networks (MLP, Lenet, Alexnet, Vggnet, RestrictNet) in Tensorflow Framework.” GitHub, 2019, github.com/billpsomas/Traffic_Signs_Classification. 
- Saglani, Vatsal. “Multi-Class Image Classification Using CNN over Pytorch, and the Basics of CNN.” Medium, Medium, 20 Apr. 2020, thevatsalsaglani.medium.com/multi-class-image-classification-using-cnn-over-pytorch-and-the-basics-of-cnn-fdf425a11dc0. 
- Stallkamp, Johannes, et al. “German Traffic Sign Recognition Benchmark GTSRB.” Public Archive: DAAEAC0D7CE1152AEA9B61D9F1E19370, May 2019, sid.erda.dk/public/archives/daaeac0d7ce1152aea9b61d9f1e19370/published-archive.html.
- Tantai, Hengtao. “Use Weighted Loss Function to Solve Imbalanced Data Classification Problems.” Medium, Medium, 27 Feb. 2023, medium.com/@zergtant/use-weighted-loss-function-to-solve-imbalanced-data-classification-problems-749237f38b75. 
- Weights & Bias. “Sweep Configuration Structure: Weights & Biases Documentation.” Define Sweep Configuration for Hyperparameter Tuning., 2024, docs.wandb.ai/guides/sweeps/define-sweep-configuration.