In [None]:
%matplotlib inline

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from math import sqrt, ceil

import pickle

import tensorflow as tf

from tensorflow.keras.utils import to_categorical

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Conv2D, MaxPool2D, Flatten, BatchNormalization, Dense, Dropout, ReLU, Softmax

from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.metrics import SparseCategoricalAccuracy
from tensorflow.keras.callbacks import TensorBoard, ModelCheckpoint, LearningRateScheduler

In [None]:
# CONSTANTS
EPOCHS = 15
BATCH_SIZE = 32
STEPS_PER_EPOCH = 86989 / BATCH_SIZE
#MODEL_SELECTION_VALIDATION_STEPS = 86989 / BATCH_SIZE

<img src="https://ehhnkw.am.files.1drv.com/y4mN1WYhHx5QRWJtliWHht6niSnYdS1_nUvQE4YWRX6kehKQ_L6oqijE8c-P9eaetFFUQXYrl7Of5XZpoNrJHQFAjvz90r2F2mHM1KhKtYlmyiwto1BwesdpS3UrJrURUK4l4lhFLh28bZR2G9L6MSPfuAUSr3oGU5GFao85JzFtuFogGerE6Tgn0u0z6qrTQR8R6n1u83B1EOv07NNvi5NHQ?width=1809&height=576&cropmode=none">

# Did you see the sign?
### Author: Georgi Stoyanov
#### January 2020

## Abstract
In this article, I have shown an example way to create a convolutional network - architecture selection, optimization, visualization of model information and its layers, measuring model accuracy, how to save our model, history and statistics about it, etc. Finally, we will see how well it will handle the task, what can be done as additional work for the future, and last but not least what an exemplary practical application we can find for the model.

## Introduction
In the present article we will use Traffic Signs Preprocessed dataset<sup id="fnref:1"><a href="#fn:1" class="footnote">[1]</a></sup>. Set is ready to use preprocessed data for Traffic Signs, witch initial data is German Traffic Sign Recognition Benchmarks (GTSRB)<sup id="fnref:2"><a href="#fn:2" class="footnote">[2]</a></sup>, saved into the nine pickle files. We will use the data in the file data2.pickle - Shuffling, /255.0 + Mean Normalization. First, we will load the data from the file, then we will preview some of the data. The next step will be to create some architectures and choose the best ones, which we will then try to improve with different techniques. Finally, we will see how well it can handle image class prediction.

## Domain Knowledge<sup id="fnref:3"><a href="#fn:3" class="footnote">[3]</a></sup>
<b>Traffic signs</b> or <b>road signs</b> are signs erected at the side of or above roads to give instructions or provide information to road users. The earliest signs were simple wooden or stone milestones. Later, signs with directional arms were introduced, for example, the fingerposts in the United Kingdom and their wooden counterparts in Saxony.

With traffic volumes increasing since the 1930s, many countries have adopted pictorial signs or otherwise simplified and standardized their signs to overcome language barriers, and enhance traffic safety. Such pictorial signs use symbols (often silhouettes) in place of words and are usually based on international protocols. Such signs were first developed in Europe, and have been adopted by most countries to varying degrees.

### International conventions
Various international conventions have helped to achieve a degree of uniformity in Traffic Signing in various countries.

## Read and prepare  dataset<sup id="fnref:4"><a href="#fn:4" class="footnote">[4]</a></sup>
Let's read and prepare the data and see if we read it correctly - size, structure, types, etc.

In [None]:
with open("../input/traffic-signs-preprocessed/data2.pickle", "rb") as f:
    data = pickle.load(f, encoding="latin1")

# Preparing y_train and y_validation for using in Keras
data["y_train"] = to_categorical(data["y_train"], num_classes=43)
data["y_validation"] = to_categorical(data["y_validation"], num_classes=43)

# Making channels come at the end
data["x_train"] = data["x_train"].transpose(0, 2, 3, 1)
data["x_validation"] = data["x_validation"].transpose(0, 2, 3, 1)
data["x_test"] = data["x_test"].transpose(0, 2, 3, 1)

# Showing loaded data from file
for i, j in data.items():
    if i == "labels":
        print(F"{i}:", len(j))
    else: 
        print(F"{i}:", j.shape)

In [None]:
data["y_validation"][0]

Well, we see that the labels are represented as an array of size 43, which represents the probability for the particular class. Because our classes are mutually exclusive (eg when each sample belongs exactly to one class), there are no probabilities of two or more classes, these are road signs :). For this reason, we will use as a loss and accuracy - sparse categorical crossentropy, a single number representing the class. For example, when one sample can have multiple classes or labels are soft probabilities (like [0.5, 0.3, 0.2]) we may use categorical crossentropy. We need to convert labels.

In [None]:
data["y_validation"] = np.argmax(data["y_validation"], axis=1)
data["y_train"] = np.argmax(data["y_train"], axis=1)

In [None]:
data["y_train"].shape

In [None]:
data["y_train"][0]

In [None]:
data["y_validation"].shape

In [None]:
data["y_validation"][0]

## Visualize set of examples <sup id="fnref:4"><a href="#fn:4" class="footnote">[4]</a></sup>
We will now create a function that will convert input 4D tensor to a grid.

In [None]:
# Preparing function for ploting set of examples
# As input it will take 4D tensor and convert it to the grid
# Values will be scaled to the range [0, 255]
def convert_to_grid(x_input):
    N, H, W, C = x_input.shape
    grid_size = int(ceil(sqrt(N)))
    grid_height = H * grid_size + 1 * (grid_size - 1)
    grid_width = W * grid_size + 1 * (grid_size - 1)
    grid = np.zeros((grid_height, grid_width, C)) + 255
    next_idx = 0
    y0, y1 = 0, H
    for y in range(grid_size):
        x0, x1 = 0, W
        for x in range(grid_size):
            if next_idx < N:
                img = x_input[next_idx]
                low, high = np.min(img), np.max(img)
                grid[y0:y1, x0:x1] = 255.0 * (img - low) / (high - low)
                next_idx += 1
            x0 += W + 1
            x1 += W + 1
        y0 += H + 1
        y1 += H + 1

    return grid

We use the function to show 100 examples of data in the grid.

In [None]:
# Visualizing some examples of training data
examples = data["x_train"][100:200, :, :, :]
print(examples.shape)  # (81, 32, 32, 3)

# Plotting some examples
fig = plt.figure()
grid = convert_to_grid(examples)
plt.imshow(grid.astype("uint8"), cmap="gray")
plt.axis("off")
plt.gcf().set_size_inches(15, 15)
plt.title("Some examples of training data", fontsize=18)

# Showing the plot
plt.show()

# Saving the plot
fig.savefig("training_examples.png")
plt.close()

## Model Architectures
The next task is to test different convolution network architectures and choose the best one for future improvement.<br><br>
In order to compare the models used in this article, we create 3 collections, one for the models with which we test the architecture, second to final models and third for all the models we have used.

In [None]:
models_architectures_results = list()
final_models = list()
all_models = list()

The following 3 functions will save us a lot of code, help us train,save and show results, save and visualize charts with training history, and evaluate the models.

Function <b>plot_hist</b> - save and visualize charts with training history

In [None]:
def plot_hist(hist,
              first_param,
              second_param,
              name_of_plot,
              first_legend_label,
              second_legend_label,
              x_label,
              y_label,
              save_plot_img = False):
    
    plt.rcParams["figure.figsize"] = (15.0, 5.0)
    plt.rcParams["image.interpolation"] = "nearest"
    plt.rcParams["font.family"] = "Consolas"
    
    fig = plt.figure()
    plt.plot(hist.history[first_param], "-o", linewidth=3.0)
    plt.plot(hist.history[second_param], "-o", linewidth=3.0)
    
    plt.plot(range(hist.params["epochs"]),
             hist.history[first_param],
             c = "g",
             label = first_legend_label)
    plt.plot(range(hist.params["epochs"]),
             hist.history[second_param],
             c = "r",
             label = second_legend_label)
    plt.legend(fontsize="xx-large")
    
    plt.xticks(list(range(0, hist.params["epochs"])), range(1, hist.params["epochs"] + 1))
    max_ylim = max(max(hist.history[first_param]), max(hist.history[second_param])) + 0.02
    min_ylim = 0
    if "accuracy" in first_param:
        min_ylim = min(min(hist.history[first_param]), min(hist.history[second_param])) - 0.02
    plt.ylim(min_ylim, max_ylim)
    
    plt.title(name_of_plot, fontsize=22)
    plt.xlabel(x_label, fontsize = 18)
    plt.ylabel(y_label, fontsize = 18)
    plt.tick_params(labelsize=18)
    
    plt.show()
    
    if save_plot_img:
        fig.savefig(F"{hist.model.name}_{name_of_plot}.png")
        plt.close()

Function <b>train_and_plot_results</b> - train,save and show results

In [None]:
def train_and_plot_results(model, callbacks = []):
    model_hist = model.fit(data["x_train"],
                           data["y_train"],
                           batch_size=BATCH_SIZE,
                           epochs = EPOCHS,
                           steps_per_epoch = STEPS_PER_EPOCH,
                           validation_data = (data["x_validation"], data["y_validation"]),
                           callbacks = [TensorBoard(log_dir = F"TensorBoardLogs/{model.name}/", profile_batch = 100000000), 
                                        ModelCheckpoint(filepath = F"ModelsCheckpoints/{model.name}/")] + callbacks,
                           verbose = 0)
    plot_hist(model_hist,
          "loss",
          "val_loss",
          "Loss plot",
          "train loss",
          "validation loss",
          "Epoch",
          "Loss",
          True)
    plot_hist(model_hist,
          "sparse_categorical_accuracy",
          "val_sparse_categorical_accuracy",
          "Accuracy plot",
          "train accuracy",
          "validation accuracy",
          "Epoch",
          "Accuracy",
          True)
    train_loss, train_accuracy = model.evaluate(data["x_train"],
                                               data["y_train"],
                                               batch_size = BATCH_SIZE,
                                               verbose = 0)
    val_loss, val_accuracy = model.evaluate(data["x_validation"],
                                               data["y_validation"],
                                               batch_size = BATCH_SIZE,
                                               verbose = 0)
    print(F"Loss - Train: {train_loss:.3f}, Validation: {val_loss:.3f}")
    print(F"Accuracy - Train: {train_accuracy:.3f}, Validation: {val_accuracy:.3f}")
    print(F"Variance %: {((train_accuracy - val_accuracy) * 100):.3f}")
    return (model.name, train_loss, val_loss, train_accuracy, val_accuracy)

Function <b>evaluate_architectures_models</b> - evaluate the models

In [None]:
def evaluate_architectures_models(models):
    for model in models:
        train_loss, train_accuracy = model.evaluate(data["x_train"],
                                               data["y_train"],
                                               batch_size = BATCH_SIZE,
                                               verbose = 0)
        val_loss, val_accuracy = model.evaluate(data["x_validation"],
                                               data["y_validation"],
                                               batch_size = BATCH_SIZE,
                                               verbose = 0)
        print(model.name)
        print(F"Train        Loss: {train_loss:.3f}, Accuracy: {train_accuracy:.3f}")
        print(F"Validation   Loss: {val_loss:.3f}, Accuracy: {val_accuracy:.3f}")
        print(F"Variance %         {((train_accuracy - val_accuracy) * 100):.3f}")
        print()

In convolution layers we will use filters with 3x3 dimmension and same padding.<br>
Now let's try a different type and complexity architectures.

#### Model architecture №1


##### Model
Тhe simplest.<br>
1 convolution layer + 1 fully connected layers.<br>
In models present standard layers like: Input, MaxPool2D, Flatten, BatchNormalization, Dropout etc. Some of them will be explained when we need to change them.

In [None]:
n1_model = Sequential(
    name = "n1_model",
    layers = [
    Input(shape = (32, 32, 3)),
    Conv2D(filters = 32, kernel_size = 3, padding = "same", activation = ReLU()),
    MaxPool2D(pool_size = 2),
    Flatten(),
    Dense(30, activation = ReLU()),
    Dense(43, activation = Softmax())
])

In [None]:
n1_model.summary()

In [None]:
n1_model.compile(
    optimizer = Adam(learning_rate = 0.001),
    loss = SparseCategoricalCrossentropy(),
    metrics = [SparseCategoricalAccuracy()])

##### Train & Results

In [None]:
models_architectures_results.append(train_and_plot_results(n1_model))

We see that the model has a high variance.<br>
If we look at the loss plot we can see that our model is gone and it is not working!

In [None]:
all_models.append(n1_model)

#### Model architecture №2

##### Model
1 convolution layer + 2 fully connected layers.<br>
In addition to adding another fully connected layer, we also add dropout layers to reduce variance and make the neurons less dependent on each other.

In [None]:
n2_model = Sequential(
    name = "n2_model",
    layers = [
    Input(shape = (32, 32, 3)),
    Conv2D(filters = 32, kernel_size = 3, padding = "same", activation = ReLU()),
    MaxPool2D(pool_size = 2),
    Flatten(),
    Dense(100, activation = ReLU()),
    Dropout(0.1),
    Dense(50, activation = ReLU()),
    Dropout(0.1),
    Dense(43, activation = Softmax())
])

In [None]:
n2_model.summary()

In [None]:
n2_model.compile(
    optimizer = Adam(learning_rate = 0.001),
    loss = SparseCategoricalCrossentropy(),
    metrics = [SparseCategoricalAccuracy()])

##### Train & Results

In [None]:
models_architectures_results.append(train_and_plot_results(n2_model))

The variance has dropped slightly.<br>
But the model is still not ok!

In [None]:
all_models.append(n2_model)

#### Model architecture №3

##### Model
3 convolution layer with increasing number of filters + 1 fully connected layers.<br>
As the first model, but with more convolutional layers and one batch normalization layer between them and fully connected layers. This layer serves to normalize the weights so that we do not have cases of vanishing (0) and exploding (∞) weights.

In [None]:
n3_model = Sequential(
    name = "n3_model",
    layers = [
    Input(shape = (32, 32, 3)),
    Conv2D(filters = 32, kernel_size = 3, padding = "same", activation = ReLU()),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU()),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU()),
    MaxPool2D(pool_size = 2),
    Flatten(),
    BatchNormalization(),
    Dense(30, activation = ReLU()),
    Dense(43, activation = Softmax())
])

In [None]:
n3_model.summary()

In [None]:
n3_model.compile(
    optimizer = Adam(learning_rate = 0.001),
    loss = SparseCategoricalCrossentropy(),
    metrics = [SparseCategoricalAccuracy()])

##### Train & Results

In [None]:
models_architectures_results.append(train_and_plot_results(n3_model))

The variance has dropped.
But the model is still not ok! Let return dropout layers.

In [None]:
all_models.append(n3_model)

#### Model architecture №4

##### Model
3 convolution layer with increasing number of filters + 2 fully connected layers.<br>
Combination of second and third models.

In [None]:
n4_model = Sequential(
    name = "n4_model",
    layers = [
    Input(shape = (32, 32, 3)),
    Conv2D(filters = 32, kernel_size = 3, padding = "same", activation = ReLU()),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU()),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU()),
    MaxPool2D(pool_size = 2),
    Flatten(),
    BatchNormalization(),
    Dense(100, activation = ReLU()),
    Dropout(0.1),
    Dense(50, activation = ReLU()),
    Dropout(0.1),
    Dense(43, activation = Softmax())
])

In [None]:
n4_model.summary()

In [None]:
n4_model.compile(
    optimizer = Adam(learning_rate = 0.001),
    loss = SparseCategoricalCrossentropy(),
    metrics = [SparseCategoricalAccuracy()])

##### Train & Results

In [None]:
models_architectures_results.append(train_and_plot_results(n4_model))

The variance has dropped. Тhe model is starting to look better.

In [None]:
all_models.append(n4_model)

#### Model architecture №5

##### Model
We will now make a more complex model. 6 convolutional layers with 3 max pool layers that will reduce the size of training parameters. After that, we will add 3 fully connected layers, with dropout layers with different percentages and 2 batch normalization layers between them.

In [None]:
n5_model = Sequential(
    name = "n5_model",
    layers = [
    Input(shape = (32, 32, 3)),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU()),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU()),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU()),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU()),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU()),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU()),
    MaxPool2D(pool_size = 2),
    Flatten(),
    BatchNormalization(),
    Dense(200, activation = ReLU()),
    Dropout(0.2),
    Dense(150, activation = ReLU()),
    BatchNormalization(),
    Dropout(0.1),
    Dense(100, activation = ReLU()),
    Dropout(0.05),  
    Dense(43, activation = Softmax())
])

In [None]:
n5_model.summary()

In [None]:
n5_model.compile(
    optimizer = Adam(learning_rate = 0.001),
    loss = SparseCategoricalCrossentropy(),
    metrics = [SparseCategoricalAccuracy()])

##### Train & Results

In [None]:
models_architectures_results.append(train_and_plot_results(n5_model))

Our model now looks like a model :)<br>Maybe we have an architecture to improve!

In [None]:
all_models.append(n5_model)

#### Compare & select best model architecture

In [None]:
evaluate_architectures_models(all_models)

We obviously have a winner - Model architecture №5

## Improve best model architecture

We will now try to improve our best architecture. I guess our model overfitting the train data. Let's see if that's the case.

#### Final model №1

##### Model
To reduce variance, we can use regularization in layers, let's start with the fully conected layers.

In [None]:
final_model_1 = Sequential(
    name = "final_model_1",
    layers = [
    Input(shape = (32, 32, 3)),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU()),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU()),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU()),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU()),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU()),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU()),
    MaxPool2D(pool_size = 2),
    Flatten(),
    BatchNormalization(),
    Dense(200, activation = ReLU(), kernel_regularizer = l2()),
    Dropout(0.2),
    Dense(150, activation = ReLU(), kernel_regularizer = l2()),
    Dropout(0.1),
    Dense(100, activation = ReLU(), kernel_regularizer = l2()),
    Dropout(0.05),  
    Dense(43, activation = Softmax(), kernel_regularizer = l2())
])

In [None]:
final_model_1.compile(
    optimizer = Adam(learning_rate = 0.001),
    loss = SparseCategoricalCrossentropy(),
    metrics = [SparseCategoricalAccuracy()])

##### Train & Results

In [None]:
train_and_plot_results(final_model_1)

In [None]:
all_models.append(final_model_1)
final_models.append(final_model_1)

There is definitely an improvement.

#### Final model №2

##### Model
Let's try to regulate the convolutional layers.

In [None]:
final_model_2 = Sequential(
    name = "final_model_2",
    layers = [
    Input(shape = (32, 32, 3)),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2()),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2()),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2()),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2()),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2()),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2()),
    MaxPool2D(pool_size = 2),
    Flatten(),
    BatchNormalization(),
    Dense(200, activation = ReLU()),
    Dropout(0.2),
    Dense(150, activation = ReLU()),
    Dropout(0.1),
    Dense(100, activation = ReLU()),
    Dropout(0.05),  
    Dense(43, activation = Softmax())
])

In [None]:
final_model_2.compile(
    optimizer = Adam(learning_rate = 0.001),
    loss = SparseCategoricalCrossentropy(),
    metrics = [SparseCategoricalAccuracy()])

##### Train & Results

In [None]:
train_and_plot_results(final_model_2)

In [None]:
all_models.append(final_model_2)
final_models.append(final_model_2)

We can see that the regulation of the convolution layers only, does not have a good effect on the model.

#### Final model №3

##### Model
What if we applied regulation to both types of layers?

In [None]:
final_model_3 = Sequential(
    name = "final_model_3",
    layers = [
    Input(shape = (32, 32, 3)),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2()),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2()),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2()),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2()),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2()),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2()),
    MaxPool2D(pool_size = 2),
    Flatten(),
    BatchNormalization(),
    Dense(200, activation = ReLU(), kernel_regularizer = l2()),
    Dropout(0.2),
    Dense(150, activation = ReLU(), kernel_regularizer = l2()),
    Dropout(0.1),
    Dense(100, activation = ReLU(), kernel_regularizer = l2()),
    Dropout(0.05),  
    Dense(43, activation = Softmax(), kernel_regularizer = l2())
])

In [None]:
final_model_3.compile(
    optimizer = Adam(learning_rate = 0.001),
    loss = SparseCategoricalCrossentropy(),
    metrics = [SparseCategoricalAccuracy()])

##### Train & Results

In [None]:
train_and_plot_results(final_model_3)

In [None]:
all_models.append(final_model_3)
final_models.append(final_model_3)

Looks better, but we lost accuracy.

#### Final model №4

##### Model
We can see that the regulation with a default value in the convolution layers does not have a good effect on the model. Obviously, we need to reduce regulation to have more accurate filters.

In [None]:
final_model_4 = Sequential(
    name = "final_model_4",
    layers = [
    Input(shape = (32, 32, 3)),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-3)),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-3)),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-3)),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-3)),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-3)),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-3)),
    MaxPool2D(pool_size = 2),
    Flatten(),
    BatchNormalization(),
    Dense(200, activation = ReLU(), kernel_regularizer = l2()),
    Dropout(0.2),
    Dense(150, activation = ReLU(), kernel_regularizer = l2()),
    Dropout(0.1),
    Dense(100, activation = ReLU(), kernel_regularizer = l2()),
    Dropout(0.05),  
    Dense(43, activation = Softmax(), kernel_regularizer = l2())
])

In [None]:
final_model_4.compile(
    optimizer = Adam(learning_rate = 0.001),
    loss = SparseCategoricalCrossentropy(),
    metrics = [SparseCategoricalAccuracy()])

##### Train & Results

In [None]:
train_and_plot_results(final_model_4)

In [None]:
all_models.append(final_model_4)
final_models.append(final_model_4)

It looks much better.

#### Final model №5

##### Model
The chosen architecture of the model is the deepest of all, so we can add a bach normalization layers that, perhaps, will improve the model by eliminating problems like vanishing (0) and exploding (∞) weights.

In [None]:
final_model_5 = Sequential(
    name = "final_model_5",
    layers = [
    Input(shape = (32, 32, 3)),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-3)),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-3)),
    BatchNormalization(),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-3)),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-3)),
    BatchNormalization(),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-3)),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-3)),
    BatchNormalization(),
    MaxPool2D(pool_size = 2),
    Flatten(),
    Dense(200, activation = ReLU(), kernel_regularizer = l2()),
    Dropout(0.2),
    Dense(150, activation = ReLU(), kernel_regularizer = l2()),
    BatchNormalization(),
    Dropout(0.1),
    Dense(100, activation = ReLU(), kernel_regularizer = l2()),
    Dropout(0.05),  
    Dense(43, activation = Softmax(), kernel_regularizer = l2())
])

In [None]:
final_model_5.compile(
    optimizer = Adam(learning_rate = 0.001),
    loss = SparseCategoricalCrossentropy(),
    metrics = [SparseCategoricalAccuracy()])

##### Train & Results

In [None]:
train_and_plot_results(final_model_5)

In [None]:
all_models.append(final_model_5)
final_models.append(final_model_5)

Hmm...

#### Final model №6

##### Model
Let's reduce the regulation in the convolutional layers a little and see if it will work.<br>
We can also add a callback function that minimizes the learning rate step with each passing epoch. So we should prevent wandering of loss functions around the minimum.

In [None]:
final_model_6 = Sequential(
    name = "final_model_6",
    layers = [
    Input(shape = (32, 32, 3)),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 7e-4)),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 7e-4)),
    BatchNormalization(),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 7e-4)),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 7e-4)),
    BatchNormalization(),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 7e-4)),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 7e-4)),
    BatchNormalization(),
    MaxPool2D(pool_size = 2),
    Flatten(),
    Dense(200, activation = ReLU(), kernel_regularizer = l2()),
    Dropout(0.2),
    Dense(150, activation = ReLU(), kernel_regularizer = l2()),
    BatchNormalization(),
    Dropout(0.1),
    Dense(100, activation = ReLU(), kernel_regularizer = l2()),
    Dropout(0.05),  
    Dense(43, activation = Softmax(), kernel_regularizer = l2())
])

In [None]:
final_model_6.compile(
    optimizer = Adam(learning_rate = 0.001),
    loss = SparseCategoricalCrossentropy(),
    metrics = [SparseCategoricalAccuracy()])

##### Train & Results

In [None]:
annealer = LearningRateScheduler(lambda x: 1e-3 * 0.95 ** (x + EPOCHS))

In [None]:
train_and_plot_results(final_model_6, [annealer])

In [None]:
all_models.append(final_model_6)
final_models.append(final_model_6)

#### Final model №7

##### Model
We reduce the regulation in the convolutional layers a little further and increase the regulation in the convolutional layers a little.

In [None]:
final_model_7 = Sequential(
    name = "final_model_7",
    layers = [
    Input(shape = (32, 32, 3)),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-4)),
    Conv2D(filters = 64, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-4)),
    BatchNormalization(),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-4)),
    Conv2D(filters = 128, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-4)),
    BatchNormalization(),
    MaxPool2D(pool_size = 2),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-4)),
    Conv2D(filters = 256, kernel_size = 3, padding = "same", activation = ReLU(), kernel_regularizer = l2(l = 1e-4)),
    BatchNormalization(),
    MaxPool2D(pool_size = 2),
    Flatten(),
    Dense(200, activation = ReLU(), kernel_regularizer = l2(0.5)),
    Dropout(0.2),
    Dense(150, activation = ReLU(), kernel_regularizer = l2(0.5)),
    BatchNormalization(),
    Dropout(0.1),
    Dense(100, activation = ReLU(), kernel_regularizer = l2(0.5)),
    Dropout(0.05),  
    Dense(43, activation = Softmax(), kernel_regularizer = l2(0.5))
])

In [None]:
final_model_7.compile(
    optimizer = Adam(learning_rate = 0.001),
    loss = SparseCategoricalCrossentropy(),
    metrics = [SparseCategoricalAccuracy()])

##### Train & Results

In [None]:
train_and_plot_results(final_model_7, [annealer])

In [None]:
all_models.append(final_model_7)
final_models.append(final_model_7)

Enough, let's move on to the choice of model and tests.

#### Evaluate & select best model

In [None]:
evaluate_architectures_models(final_models)

Lets test final_model_1.

## Test best model

In [None]:
test_loss, test_accuracy = final_model_1.evaluate(data["x_test"],
                                               data["y_test"],
                                               batch_size = BATCH_SIZE,
                                               verbose = 0)
print(F"Test - Loss: {test_loss:.3f}, Accuracy: {test_accuracy:.3f}")

We achieved accuracy with the test data 0.978. Now we test all models.

## Test all models

The following function makes it easier for us to test models.

In [None]:
def evaluate_all_models(models):
    for model in models:
        train_loss, train_accuracy = model.evaluate(data["x_train"],
                                               data["y_train"],
                                               batch_size = BATCH_SIZE,
                                               verbose = 0)
        val_loss, val_accuracy = model.evaluate(data["x_validation"],
                                               data["y_validation"],
                                               batch_size = BATCH_SIZE,
                                               verbose = 0)
        test_loss, test_accuracy = model.evaluate(data["x_test"],
                                               data["y_test"],
                                               batch_size = BATCH_SIZE,
                                               verbose = 0)
        print(model.name)
        print(F"Train        Loss: {train_loss:.3f}, Accuracy: {train_accuracy:.3f}")
        print(F"Validation   Loss: {val_loss:.3f}, Accuracy: {val_accuracy:.3f}")
        print(F"Test         Loss: {test_loss:.3f}, Accuracy: {test_accuracy:.3f}")
        print()

In [None]:
evaluate_all_models(all_models)

I think the best model is the final_model_1

## Conclusion

After repeated training of the models, I have found that the results obtained can be described as unstable and difficult to reproduce, the deviation can be in the range 1-5%. In the future work section there are ideas that would change this and stabilize the results of the models. However, we can determine that the achievable results on this data set are about 96-98%. As I was unable to find information on the optimum accuracy value of this set, any higher value I would classify as unrealistic (overfitting).

## Future work

The following things can be done as future actions:
* Different filter sizes 5x5 7x7
* Different batch size
* Selection of other architectures, optimization of other parameters or of the same but with other approaches.
* Detailed study of the data - wrongly predicted and finding some connection between them.
* Finding more data or shuffling all data and splitting training, validation and test sets again.
* Visualization and examination of filter heatmaps and Feature Maps

Or whatever you think will be helpful - Welcome to Deep Learning :)

## Practical application of model

Okay, we have a model, how can it be useful to us? How can we use it?<br>
Here is an example:
A car device that shows the last seen speed limit sign - this is actually a speed we should not exceed :)<br><br>
What do we need:
1. Raspberry Pi with camera and lcd display
2. Detected Sign model that uses the camera data and finds the coordinates of the sign in the image(frame from video).
3. Cut the image by given coordinates and submit it to our classification model.
4. If the result indicates that the sign is of the speed limit type, we will display it on the display.

Here's a high level description of how to use our model :)

## Refferences

<div class="footnotes">
  <ol>
      <li id="fn:1">
       <p><a href="https://www.kaggle.com/valentynsichkar/traffic-signs-preprocessed" rel="noopener" target="_blank"> Traffic Signs Preprocessed</a><a href="#fnref:1" class="reversefootnote">&#8617;</a></p>
    </li>
      <br/>
    <li id="fn:2">
       <p><a href="https://www.kaggle.com/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign">GTSRB - German Traffic Sign Recognition Benchmark</a><a href="#fnref:2" class="reversefootnote">&#8617;</a></p>
    </li>
      <br/>
      <li id="fn:3">
       <p><a href="https://en.wikipedia.org/wiki/Traffic_sign" rel="noopener" target="_blank">Wikipedia – Traffic sign</a><a href="#fnref:3" class="reversefootnote">&#8617;</a></p>
    </li>
      <br/>
      <li id="fn:4">
       <p><a href="https://www.kaggle.com/valentynsichkar/traffic-signs-classification-with-cnn" rel="noopener" target="_blank">Traffic Signs Classification with CNN</a> by <a href="https://www.kaggle.com/valentynsichkar" rel="noopener" target="_blank">Valentyn Sichkar</a><a href="#fnref:4" class="reversefootnote">&#8617;</a></p>
    </li>
  </ol>
</div>