# Writeup

Taking a look at the nature of the problem, we want to think from the ground up: what are the abstractions necessary in this task? Our one goal is to get above ~95% on the test set. 

Traffic signs have fairly simple larger geometries (triangles with acute angles, squares with right angles, and circles), so a few convolutions should be able to capture the outer shape. However, when the signs are skewed, different shapes take hold (parallelograms, ellipses).   

On top of that, the inner shape however needs to be able to represent numbers and other symbols, since the signs are categorized via their specific speed limit, for example. 

![100](unnamed.png)

Here is a sample of images in the 100km/h speed limit sign category. 

### Timeline 

Before doing any large bloated architectures, we wanted a baseline with respect to a simple neural network. The author also wanted to reduce collaboration and other forms of outright plagiarism, thinking from basic concepts. With one convolution layer and one fully connected layer, the accuracy hovered in the 80s with layer widths around 64. 

From here, the network was built upon. Max pooling with stride 2 was added to help with recognizing the blurred sign images with the initial inputs. With the performance still under 90%, a deeper network with two residual layers were added to preserve signal/gradients, and images augmented with grayscaling. However, this network was not too performant either. 

More fully connected layers were added, and the performance reached just above 90%. When adding more convolutional layers, the performance crept up to ~92%. It seems that the head of fully connected layers was slowing down training more than helping, so instead, a large tail of initial convNets were lined in sequence whereas fully connected layers at the end were reduced to two. Too many convNets, however, decreased performance, so it seemed that less than 10 would be the right level of abstraction.

Since the validation loss and training loss started to diverge, this was a signal of overfitting, so conv2d batch normalization was added to keep weights normalized. Then, dropout was added to make the network more robust. The default dropout rate of half was a bit much, so it was tempered down to around one fourth. The accuracy rate went up to 94%, close to target. 

To reach a deeper minima, control logic was implemented to slow down the learning rate after certain thresholds. After training the network for 256 epochs, there were models that reached the goal of >95%.

We plot the graph of one of these >95% test performer kernels' loss as follows:

![f1](Figure_1.png)

and the validation accuracy trace: 

![f2](Figure_2.png)

## Next steps

With the goal reached, we now wanted to experiment with more complicated architectures. Disclaimer: the experiments here did not exactly succeed, but are included to talk about what else was tried.

To be robust to scale, something called a multi-scale architecture was attempted by sequentially building convnets in sequence but then concatenating initial layers to the final fully connected layer, so that the network can pick and choose finer and broader details to weigh on final classification.