# Deep Learning for Deciphering Traffic Signs
# SVM Notebook
_________________________________________________________________________________________________________________________________________________________________________________

##### Contributors:
 Victor Floriano, Yifan Fan, Jose Salerno

## Problem Statement & Motivation
As the world advances towards autonomous vehicles, our team has observed the remarkable efforts of large car manufacturers, who are working with data scientists to develop fully autonomous cars. Our team is excited to contribute to the development of this technology by creating a neural network model that will be able to classify different traffic signs. Our ultimate goal is to assist car makers in overcoming the challenges they may face in implementing neural network models that effectively read traffic signs and further their efforts toward a fully autonomous car or assisted driving. We believe autonomous driving to be an important problem to solve due to the great economic benefits it can generate for car manufacturers and the improvement of general driving safety.

## Data Preparation
 We've selected the German Traffic Sign Recognition Benchmark (GTSRB) as our primary dataset. It's renowned for its complexity, featuring over 50,000 images across more than 40 classes of traffic signs. The GTSRB is publicly accessible through two resources. To efficiently manage the extensive and complex GTSRB dataset, our strategy integrates preprocessing for uniformity, data augmentation for robustness, and batch processing for computational efficiency. We'll employ distributed computing to parallelize operations, enhancing processing speed, and use stratified sampling for quick experimentation without compromising representativeness.



---





# SVM - Baseline Model 

 
________________________________________________________________________________________________________________________________________________

Results: 

    - 

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#Runtime ~22min
import os
import cv2
import numpy as np

def load_images_and_labels(base_path, max_images=100000):
    data = []
    labels = []
    image_count = 0  # Initialize a counter for the images
    classes = sorted(os.listdir(base_path))
    total_classes = len(classes)

    for label, cls in enumerate(classes):
        cls_folder = os.path.join(base_path, cls)
        if os.path.isdir(cls_folder):
            image_files = os.listdir(cls_folder)
            total_images = len(image_files)

            for idx, img_filename in enumerate(image_files):
                if image_count >= max_images:
                    print("Reached the maximum number of images to process.")
                    return np.array(data), np.array(labels)  # Return the data collected so far

                img_path = os.path.join(cls_folder, img_filename)
                img = cv2.imread(img_path)
                if img is not None:
                    img = cv2.resize(img, (32, 32))
                    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
                    img = img.flatten() /255 #Flattens and normalizes the images
                    data.append(img)
                    labels.append(label)
                    image_count += 1  # Increment the counter

                #Print progress for each class every 100 images
                if (idx + 1) % 1000 == 0 or idx == total_images - 1:
                    print(f"Processed {idx + 1}/{total_images} images in class {label + 1}/{total_classes} ({cls})")

    return np.array(data), np.array(labels)


base_path = '/content/drive/MyDrive/BU_MSBA/BA865 - Neural Networks/BA865 - Group Project/GTSRBkaggle/Train'

#Load data
train_data, train_labels = load_images_and_labels(base_path)


In [15]:
from sklearn.model_selection import train_test_split

#Split data into training and vaidation sets
X_train, X_validation, y_train, y_validation = train_test_split(train_data, train_labels, test_size=0.2, random_state=42)


In [16]:
#Runtime ~9min
from sklearn.svm import SVC

#Since your features are already scaled to [0,1] we skipped the StandardScaler()
svm_classifier = SVC(kernel='rbf', random_state=42) #rbf seems to be the standard kernell for image classification

#Train the model on test data
svm_classifier.fit(X_train, y_train)


In [22]:
#Runtime ~ 4min
from sklearn.metrics import accuracy_score, classification_report

#Genarate predictions on validation data
y_val_pred = svm_classifier.predict(X_validation)

#Calculate accuracy
accuracy = accuracy_score(y_validation, y_val_pred)
print(f"Validation Accuracy: {accuracy:.2f}")

In [23]:
#Check accuracy by class
class_report = classification_report(y_validation, y_val_pred)
print("Classification Report by Class:\n", class_report)

Classification Report by Class:
               precision    recall  f1-score   support

           0       1.00      0.71      0.83        38
           1       0.91      0.89      0.90       496
           2       0.86      0.85      0.86       451
           3       0.69      0.73      0.71       281
           4       0.85      0.83      0.84       417
           5       0.67      0.75      0.71       357
           6       0.84      0.75      0.80        65
           7       0.90      0.77      0.83       254
           8       0.57      0.83      0.68       303
           9       0.99      0.89      0.94       276
          10       0.65      0.92      0.76       395
          11       0.89      0.92      0.90       252
          12       0.83      0.93      0.87       442
          13       0.98      0.96      0.97       457
          14       0.99      0.92      0.96       143
          15       0.94      0.87      0.90       108
          16       1.00      0.81      0.90     

----

# Sources:
- Generative AI was utilized for Debugging, code improvement, sentence structure and grammar.
- 
- 
- 