# Capstone Project: German Traffic Sign Classification Using Convolutional Neural Networks

<img src="https://www.rac.co.uk/drive/_next/image/?url=https%3A%2F%2Fimages.contentstack.io%2Fv3%2Fassets%2Fblt1bd0ccbd0b7d1870%2Fbltc62cdaa8d7c55a69%2F68a30fa384813bd7029e48b1%2Fgerman-road-signs-header.jpg%3Fwidth%3D450%26quality%3D100%26crop%3D4%253A3%26gravity%3Dcenter&w=1920&q=75">

## üéØ Project Objective

The goal of this project is to design and evaluate a deep learning‚Äìbased image classification system capable of recognizing German traffic signs.  
The project compares a **custom Convolutional Neural Network (CNN)** with a **transfer learning approach using a pretrained ResNet model** in order to analyze performance differences.

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import cv2
import pandas as pd
import os
import numpy as np
import seaborn as sns
from PIL import Image
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from keras.models import Sequential
from keras.layers import Conv2D, Dense, Flatten, Input, MaxPooling2D, Dropout, BatchNormalization, Reshape
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.utils import to_categorical

## üìÅ Dataset Description

This project uses the **German Traffic Sign Recognition Benchmark (GTSRB)** dataset, which contains more than **50,000 images across 43 traffic sign classes**.

The dataset includes real-world challenges such as:
- Varying image resolutions
- Different lighting conditions
- Motion blur and partial occlusions
- Changes in viewing angles

These characteristics make GTSRB a realistic and widely used benchmark for traffic sign recognition tasks.

In [None]:
train=pd.read_csv('/kaggle/input/gtsrb-german-traffic-sign/Train.csv')
train_img_path='/kaggle/input/gtsrb-german-traffic-sign/Train/'

In [None]:
test=pd.read_csv('/kaggle/input/gtsrb-german-traffic-sign/Test.csv')
test_img_path='/kaggle/input/gtsrb-german-traffic-sign/Test/'

In [None]:
meta=pd.read_csv('/kaggle/input/gtsrb-german-traffic-sign/Meta.csv')
meta_img_path='/kaggle/input/gtsrb-german-traffic-sign/Meta/'

In [None]:
labels=['7', '17', '19', '22', '2', '35', '23', '10', '5', '36', '20', '27', '41', '39', '32', '25', '42', 
        '8', '38', '12', '0', '31', '34', '18', '28', '16', '13', '26', '15', '3', '1', '30', '14', '4', 
        '9', '21', '40', '6', '11', '37', '33', '29', '24']

In [None]:
img_list1=[]
label_list1=[]

for label in labels:
    for img_file in os.listdir(train_img_path+label):
        img_list1.append(train_img_path+label+"/"+img_file)
        label_list1.append(label)

In [None]:
train_df=pd.DataFrame({'img':img_list1,'label':label_list1})

In [None]:
counts = train_df["label"].value_counts().sort_index()

plt.figure(figsize=(14,4))
plt.bar(counts.index.astype(str), counts.values, color="pink")
plt.title("Train Label Distribution (ClassId)")
plt.xlabel("ClassId")
plt.ylabel("Count")
plt.xticks(rotation=90)
plt.show()

In [None]:
label_cod={0:'Speed limit (20km/h)',1:'Speed limit (30km/h)', 2:'Speed limit (50km/h)', 
           3:'Speed limit (60km/h)', 4:'Speed limit (70km/h)', 5:'Speed limit (80km/h)', 
           6:'End of speed limit (80km/h)', 7:'Speed limit (100km/h)', 8:'Speed limit (120km/h)', 
           9:'No passing', 10:'No passing veh over 3.5 tons', 11:'Right-of-way at intersection', 
           12:'Priority road', 13:'Yield', 14:'Stop', 15:'No vehicles', 16:'Veh > 3.5 tons prohibited', 
           17:'No entry', 18:'General caution', 19:'Dangerous curve left', 20:'Dangerous curve right', 
           21:'Double curve', 22:'Bumpy road', 23:'Slippery road', 24:'Road narrows on the right', 
           25:'Road work', 26:'Traffic signals', 27:'Pedestrians', 28:'Children crossing', 
           29:'Bicycles crossing', 30:'Beware of ice/snow',31:'Wild animals crossing', 
           32:'End speed + passing limits', 33:'Turn right ahead', 34:'Turn left ahead', 35:'Ahead only', 
           36:'Go straight or right', 37:'Go straight or left', 38:'Keep right', 39:'Keep left', 
           40:'Roundabout mandatory', 41:'End of no passing', 42:'End no passing veh > 3.5 tons' }

In [None]:
train_df['label']=train_df['label'].astype(int)

In [None]:
train_df['encode_label']=train_df['label'].map(label_cod)

In [None]:
train_df.head()

In [None]:
img_list2 = []
label_list2 = []

for index, row in test.iterrows():
    img_full_path = "/kaggle/input/gtsrb-german-traffic-sign/" + row["Path"]
    class_id = int(row["ClassId"])

    if os.path.exists(img_full_path):
        img_list2.append(img_full_path)
        label_list2.append(class_id)

In [None]:
test_df = pd.DataFrame({"img": img_list2, "label": label_list2})

In [None]:
counts_test = test_df["label"].value_counts().sort_index()

plt.figure(figsize=(14,4))
plt.bar(counts_test.index.astype(str), counts_test.values, color=plt.cm.plasma(counts_test.values / counts_test.values.max()))
plt.title("Test Label Distribution")
plt.xlabel("Class ID")
plt.ylabel("Count")
plt.xticks(rotation=90)
plt.show();

In [None]:
test_df["label"] = test_df["label"].astype(int)

In [None]:
test_df["encode_label"] = test_df["label"].map(label_cod)

In [None]:
test_df.head()

In [None]:
img_list3 = []
label_list3 = []

for index, row in meta.iterrows():
    img_full_path = "/kaggle/input/gtsrb-german-traffic-sign/" + row["Path"]
    class_id = row["ClassId"]

    if os.path.exists(img_full_path):
        img_list3.append(img_full_path)
        label_list3.append(class_id)

In [None]:
meta_df=pd.DataFrame({'img':img_list3,'label':label_list3})

In [None]:
counts_meta = meta_df["label"].value_counts().sort_index()

plt.figure(figsize=(14,3))
plt.bar(counts_meta.index.astype(str), counts_meta.values, color="purple")
plt.title("Meta Label Distribution")
plt.xlabel("ClassId")
plt.ylabel("Count")
plt.xticks(rotation=90)
plt.show()

In [None]:
meta_df['label']=meta_df['label'].astype(int)

In [None]:
meta_df['encode_label']=meta_df['label'].map(label_cod)

In [None]:
meta_df.head()

## ‚öôÔ∏è Data Preprocessing

Before training, the images were preprocessed using the following steps:

- Resizing all images to a fixed resolution (30√ó30)
- Normalizing pixel values to the range [0, 1]
- Converting images from BGR to RGB color space
- Encoding class labels for multi-class classification

These preprocessing steps help stabilize training and improve model convergence.

In [None]:
df=pd.concat([train_df, test_df, meta_df], ignore_index=True)

In [None]:
sample = train_df.sample(12, random_state=42).reset_index(drop=True)

plt.figure(figsize=(12,8))
for i in range(12):
    path = sample.loc[i, "img"]
    lab  = int(sample.loc[i, "label"])
    name = sample.loc[i, "encode_label"]

    img = cv2.imread(str(path))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    plt.subplot(3,4,i+1)
    plt.imshow(img)
    plt.title(f"{lab}: {name}")
    plt.axis("off")

plt.tight_layout()
plt.show()

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.shape

In [None]:
x = []
for img_path in df['img']: 
    img=cv2.imread(str(img_path))
    if img is None:
        print(f"Resim y√ºklenemedi: {img_path}") 
        continue
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img=cv2.resize(img, (30, 30))
    img=img / 255.0
    x.append(img)

In [None]:
x=np.array(x)

In [None]:
y=df[['label']]

In [None]:
img_path = train_df.loc[0, "img"]
label = train_df.loc[0, "encode_label"]

img = cv2.imread(str(img_path))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

plt.imshow(img)
plt.title(label)
plt.axis("off")
plt.show()

## üß† Baseline Model: Custom CNN

A custom Convolutional Neural Network (CNN) was implemented as a baseline model.  
The architecture consists of convolutional layers for feature extraction, followed by pooling layers and fully connected layers for classification.

This model demonstrates that even a relatively simple CNN can achieve high accuracy on traffic sign recognition tasks.

In [None]:
x_train,x_test,y_train,y_test=train_test_split(x,y, random_state=42, test_size=0.20)

In [None]:
model=Sequential()

model.add(Input(shape=(30,30,3)))

model.add(Conv2D(32,kernel_size=(3,3),activation='relu',padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64,kernel_size=(3,3),activation='relu',padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(128,kernel_size=(3,3),activation='relu',padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(256,kernel_size=(3,3),activation='relu',padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(512,kernel_size=(3,3),activation='relu',padding='same'))
model.add(BatchNormalization())

model.add(Flatten())
model.add(Dense(1024,activation='relu'))
model.add(Dense(512,activation='relu'))
model.add(Dense(256,activation='relu'))
model.add(Dense(128,activation='relu'))
model.add(Dense(43,activation='softmax'))
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])

In [None]:
early_stop = tf.keras.callbacks.EarlyStopping(
    monitor="val_loss",
    patience=5,
    restore_best_weights=True)

In [None]:
checkpoint = tf.keras.callbacks.ModelCheckpoint(
    "best_model.keras",
    monitor="val_loss",
    save_best_only=True)

In [None]:
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(
    monitor="val_loss",
    patience=3,
    factor=0.3,
    verbose=1)

In [None]:
callbacks = [early_stop, checkpoint, reduce_lr]

In [None]:
history = model.fit(
    x_train, y_train,
    validation_data=(x_test, y_test),
    epochs=30,
    batch_size=32,
    callbacks=callbacks)

In [None]:
model.save("traffic.keras")

In [None]:
labels = [label_cod[i] for i in range(43)]
import json
with open("labels.json", "w", encoding="utf-8") as f:
    json.dump(labels, f, ensure_ascii=False, indent=2)

In [None]:
import json
with open("history.json", "w") as f:
    json.dump(history.history, f)

## üìä Results & Evaluation

The final model achieved **very strong performance** on the GTSRB dataset:

- **Final Training Accuracy:** ~99.4%
- **Final Validation Loss:** ~0.03

The low validation loss indicates minimal overfitting and strong generalization to unseen data.  
These results demonstrate that deep learning models, especially when combined with transfer learning, are highly effective for traffic sign recognition.