# Traffic Sign Classifier

The goals / steps of this project are the following:

- Load the data set (see below for links to the project data set)
- Explore, summarize and visualize the data set
- Design, train and test a model architecture
- Use the model to make predictions on new images
- Analyze the softmax probabilities of the new images
- Summarize the results with a written report

## 1. Dataset overview

Basic overview:
- Number of training examples = 34799
- Number of testing examples = 34799
- Number of validation examples = 34799
- Image data shape = (32, 32, 3)
- Number of classes = 43

Number of elements per class:

![title](./writeup/classes_distr.png)

Classes example:

![title](./writeup/data_examples.png)

## 2. Data augmentation

The training set is not particulary well balanced in terms of classes, I decided to artificialy generate new data for classes were the inbalance is the most meaningful (the amount of examples is below the avarage, taken from the class distribution).

Unfortunately, the images in the traffic sign dataset are in a low resulotion and quality, therefore I decided not to use any techniques related to blurring, or noise adding. Otherwise even small change, could override the actual image content. I focused only on rotations and cropping images, which shouldn't decrease image quality.

Classes distribution after augmentation:
![title](./writeup/classes_distr_aug.png)

Example of augmentented data:

![title](./writeup/augmented_data.png)

## 3. Data preprocessing

The data preprocessing consists of:
- histogram equalization (in order to improve image contrast)
- gray conversion (brings better results, as it seems like the network recognizes)
- min/max normalization

## 4. Model architecture

The architecture bases on the LeNet architecture, which was extended in order to meet the accuracy of 0.93. I've also applied techniques known from AlexNet, like 'Local Response Normalization' and dropout.

There is a set of methods, I wrote in order to make the network assembling easier. 
- conv2d creates weights, biases, applies convolutions and returns the result after relu function
- fc creates fully connected layer, with a given set of inputs and outputs. Also applies relu
- lrn is a simple wrapper for tf.nn.local_response_normalization to make the code more clean

The overall architecture might be described as follows:

##### Layer 1: 

In [None]:
# Convolution, filter size=(7, 7), filters = 24, strides = (1, 1)
conv1 = conv2d(x, 7, 24, 1, 'conv1')
# Max pooling: kernel=(2, 2), strides=(2, 2)
conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
# Local response normalization
conv1 = lrn(conv1, 2, 2e-05, 0.75, name = 'norm1')

##### Layer 2:

Similary to AlexNet, just convolve, don't pool, as I don't want to shrink the filters to quickly

In [None]:
# Convolution: filter size=(3,3), filters = 32, strides = (1, 1)
conv2 = conv2d(conv1, 3, 32, 1, 'conv2')

##### Layer 3:

In [None]:
# Convolution: filter size=(3,3), filters = 32, strides = (1, 1)
conv3 = conv2d(conv2, 3, 32, 1, 'conv3')
# Max pooling: kernel=(2, 2), strides=(2, 2)
conv3 = tf.nn.max_pool(conv3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
# Local response normalization
conv3 = lrn(conv3, 2, 2e-05, 0.75, name = 'norm3')

##### Flatten:

In [None]:
# flatten before passing to fully connected layers
fc0 = flatten(conv3)
fc0 = tf.nn.dropout(fc0, keep_prob)

##### Layer 4:

In [None]:
# Fully connected layer with relu: inputs=512, outputs=512
fc1 = fc(fc0, 512, 512, 'fc1')
fc1 = tf.nn.dropout(fc1, keep_prob)

##### Layer 5:

In [None]:
# Fully connected layer with relu: inputs=512, outputs=512
fc2 = fc(fc1, 512, 512, 'fc2')
fc2 = tf.nn.dropout(fc2, keep_prob)

##### Logits

In [None]:
# Fully connected layer: inputs=512, outputs=43
fc(fc2, 512, 43, 'logits', False)

## 5. Solution

###### Approach

The starting point of assembling the classifier, was the LeNet, that was iteratively improved till meeting given accuracy. Every modification was verified against the performance improvemenent it brings. Finally I end up with a network described above.

##### Training:
- optimizer: Adam
- learning_rate: 0.001
- epochs: 10
- batch_size: 128

##### Regularization:
- dropout (training probability: 0.5, validation probability: 1.0)
- weight decay: L2 loss, ratio = 0.001

It's still to be discussed if dropout works well with weight decay, however applying such combination for the problem, showed an improvement in the increase of the final accuracy.

## 6. Confusion matrix

## 7. Layers visualization