# Traffic Sign Recognition using Histogram of Oriented Gradient and Convolutional Neural Network
This notebook will showcase the training process of the traffic sign recognizer using neural network. Before you click Run All, it is important to check that the following packages were installed on your host machine.
  1. opencv-python
  2. tensorflow
  3. numpy
  4. scikit-image
  5. scipy

## Move your dataset to the src folder
Note that you should rename the files of a type as "XXXX_" where XXXX is the label of the image,
the label should not be more than 4 characters!


## Data Acqusition 
Loading the dataset along with the respective labels
If the dataset has one __80 speed limit__ sign and another __stop__ sign,
the labels will be ['80', 'Stop']
__Note__: The size of data and label should be the same, report to us if you find the size was different.

In [1]:
import common as cm

cm.files_rename('src')
data, labels = cm.load_data('src')
print('Size of Data:', len(data))
print('Size of Label:', len(labels))

Size of Data: 4081
Size of Label: 4081


## Preprocessing Stage
Every image in the dataset will undergo the preprocessing stage which included the following operations:
  1. Histogram Normalization - to enhance the constrast and detail of an image
  2. Hough Transform - to detect the largest circle in the image and return the content within the circle, it was set to accept not so round shape as well.
  3. HOG feature descriptor - to extract the feature from the images, features as in the change in gradient.
  4. Reshaping - reshape the image to (*image.shape, 1) for the classifier
You may notice the data size was reduced to a lower number.
This is because the preprocessor will filter out those photos in awful quality, or when the Hough transform failed to detect round shape on the image.

In [2]:
import numpy as np
from parallel import preprocess_image_parallel as preprocess

dataset, label = preprocess(data, labels) # dataset
data_size = len(dataset)
data_type = type(dataset)
print(f'''
Data size: {data_size}
Data type: {data_type}
''')

Data size: 4081 and data chunk per core: 408

Data size: 3889
Data type: <class 'numpy.ndarray'>



The following code shows the sizes of each type of traffic signs.
Depending on the traffic signs dataset you downloaded, the sizes of each type of traffic signs can be different.
The counting was presented as the following format:
(name_of_sign, size_of_sign)

In [3]:
import collections

counter = collections.Counter(label)
counter.most_common()

[('80', 1258), ('TC', 989), ('NE', 703), ('SP', 528), ('TK', 300), ('TR', 111)]

## Preprocessing Stage 2
It is required to convert the image dataset to numpy array for the classifier to work with.
For the label, we first converted them into numeric form ('80' -> 0, 'SP' -> 1, etc.)
Then, the label was categorized into the CNN preferred format which is:
say there is 5 types of sign, and the first sign is __80 speed limit sign__ which is the index 0 after the conversion will become
  [1, 0, 0, 0, 0]
where the remaining 4 zeros are reserved for the other 4 signs.

In [4]:
from tensorflow.keras.utils import to_categorical
from json import dumps

X = np.array(dataset)

print('X shape:', X.shape)

label_set = set(label)
classes = { val: key for (key, val) in enumerate(label_set)}
# cache the label types for testing purpose
output_text = dumps(classes)
output = open('classes.txt', 'w')
output.write(output_text)
output.close()
print(classes)

Y = np.fromiter([classes[y] for y in label], dtype=np.int)
Y = to_categorical(Y)
dense = len(Y[0])

X shape: (3889, 512, 256, 1)
{'TR': 0, 'TK': 1, '80': 2, 'TC': 3, 'NE': 4, 'SP': 5}


## Data splitting
We need to separate part of the data for testing purpose.

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
X_train.shape

(3111, 512, 256, 1)

## Contruct the layers of our CNN model.

In [6]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Flatten

# initiate model
model = Sequential()

# add model layers
model.add(Conv2D(32, (4, 4), activation='relu', input_shape=(256, 256, 1)))
model.add(MaxPool2D((2, 2)))
model.add(Conv2D(64, (4, 4), activation='relu'))
model.add(MaxPool2D((2, 2)))
model.add(Conv2D(128, (4, 4), activation='relu'))
model.add(Flatten())
model.add(Dense(dense, activation='softmax'))
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 509, 253, 32)      544       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 254, 126, 32)      0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 251, 123, 64)      32832     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 125, 61, 64)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 122, 58, 128)      131200    
_________________________________________________________________
flatten (Flatten)            (None, 905728)            0         
_________________________________________________________________
dense (Dense)                (None, 6)                 5

In [7]:
# compile model using accuracy to measure the performance
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [8]:
# train the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=5)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x2e444140430>

## Never forget to save the model!

In [9]:
model.save('saved_models/my_model')

INFO:tensorflow:Assets written to: saved_models/my_model\assets


# Post-training
Now, let's run some testing with our trained model to disclose the underlying performance in terms of how accurate and precise the model can achieve.
The measurement included the following score:
  1. Accuracy
  2. Precision
  3. Recall
  4. F Score

## Let's predict!

In [10]:
Y_predict = model.predict(X_test)
# The predicted result tell you how likely the item belongs to each class.
# In this case, you should look at the highest one
Y_predict

array([[1.2107743e-12, 9.9996102e-01, 7.1311042e-06, 3.7313619e-06,
        2.7994107e-05, 1.8911867e-07],
       [5.9810273e-13, 1.9131225e-08, 9.9999809e-01, 1.0206600e-06,
        8.1619375e-15, 7.9946363e-07],
       [2.2624832e-12, 8.4217919e-12, 8.4528771e-21, 1.7805880e-07,
        9.9999988e-01, 5.4051525e-12],
       ...,
       [9.9994338e-01, 1.1207113e-06, 3.4379937e-05, 4.2873339e-06,
        7.5084586e-07, 1.6125881e-05],
       [3.7622783e-13, 3.3539950e-14, 4.4315536e-23, 1.2600932e-08,
        1.0000000e+00, 7.5525984e-13],
       [4.0655745e-09, 1.5037463e-05, 9.9764556e-01, 2.3237057e-03,
        3.8136543e-06, 1.1935613e-05]], dtype=float32)

In [12]:
# the actual labels in Y
# 1 indicate where the item belongs among the types of signs
y_test

array([[0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       ...,
       [1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 1., 0., 0., 0.]], dtype=float32)

As you can see the output above, both actual and predicted labels for each item is an vector sized of 6 (the number of types we trained our model to recognize) and it is filled with *probability* of how likely the predicted sign belongs to each class.
 
## But
Our sklearn.metrics scoring system does not quite understand this kind of format.
Therefore before we get started, we should reshape our actual and predicted result into a readable format that is easy for our metric examinator to understand.

In [13]:
# Let's us map these data into telling us the exact class that it belongs.
Y_predict_formatted = [np.argmax(y) for y in Y_predict]
Y_actual_formatted = [np.argmax(y) for y in y_test]
print(Y_predict_formatted[:5])
print(Y_actual_formatted[:5])

[1, 2, 4, 2, 3]
[1, 2, 4, 2, 3]


In [15]:
# Which sign each number represent
# TR: Turn Right
# TK: Truck (Alert large vehicle approaching)
# 80: 80 Speed Limit
# TC: Two cars (No overtaking)
# NE: No entry
# SP: Stop 
print(classes)

{'TR': 0, 'TK': 1, '80': 2, 'TC': 3, 'NE': 4, 'SP': 5}


In [14]:
from sklearn.metrics import classification_report, confusion_matrix

classification_result = classification_report(Y_actual_formatted, Y_predict_formatted)
print(classification_result)

              precision    recall  f1-score   support

           0       0.94      1.00      0.97        17
           1       0.96      1.00      0.98        64
           2       0.99      0.98      0.99       252
           3       1.00      1.00      1.00       215
           4       1.00      1.00      1.00       146
           5       1.00      0.98      0.99        84

    accuracy                           0.99       778
   macro avg       0.98      0.99      0.99       778
weighted avg       0.99      0.99      0.99       778



In [25]:
confusion_result = confusion_matrix(Y_actual_formatted, Y_predict_formatted)
print(f'''
Confusion Matrix
{confusion_result}
''')


Confusion Matrix
Heavy Vehicle	:[ 0 64  0  0  0  0]
Turn Right	:[17  0  0  0  0  0]
Speed Limit	:[  1   3 248   0   0   0]
No Overtaking	:[  0   0   0 215   0   0]
No Entry	:[  0   0   0   0 146   0]
Stop		:[ 0  0  2  0  0 82]



# Voila
The trained model produced a super-duper suprisingly accurate performance, although we do not know whether there exists an overfitting problem.
As far as we concern, the model can accurately recognize the known traffic signs from the Google Image.

In [38]:
# test with a random image downloaded from Google Image
# It is a stop sign, hence we expect the predict answer to be 5
test_img = cm.read_image('a-stop-sign.jpg')
test_img = cm.preprocess_image(test_img)
if test_img is not None:
    predict_data = np.array([test_img])
    prediction = model.predict(predict_data)
    answer = np.argmax(prediction)
    if answer == 5:
        print('The model recognized the sign as Stop sign!!!')
    else:
        print('The model failed to recognize the stop sign :\'(')


The model recognized the sign as Stop sign!!!
