# **Traffic Sign Classifier**

The goals / steps of this project are the following:
* Load the data set
* Explore, summarize and visualize the data set.
* Design, train and test a model architecture.
* Use the model to make predictions on new images.
* Analyze the softmax probabilities of the new images.
* Summarize the results with a written report.

[//]: # (Image References)
[image1]: ./photos_report/Each_sign.png
[image2]: ./photos_report/Histogram.png 
[image3]: ./photos_report/Grayscale.png
[image4]: ./photos_report/Rotated.png 
[image5]: ./photos_report/web_images_with_pred.png
[image6]: ./photos_report/top5.png


# **Rubric Points**
# Here I will consider the rubric points individually and describe how I addressed each point in my implementation.

## Data Set Summary & Exploration

### 1. Provide a basic summary of the data set. In the code, the analysis should be done using python, numpy and/or pandas methods rather than hardcoding results manually.

The analysis was done using numpy.

* The size of the training set is 34799
* The size of validation set is 12630
* The size of test set is 4410
* The shape of a traffic sign is 32x32x3
* The number of classes in the data set is 43

### 2. Include an exploratory visualization of the dataset.

Here is a frequency diagram of the dataset. This helps us see which classes have less exeamples and which have more. This data is also used later to see the extent to which each class needs to be augmented. Since the data is biased, I use data augmentation to recify that later. Since we are training using only the training data, I have not performed any histogram analysis on the test and validation set.

![Frequency Diagram][image2]

Here is one image of each class along with the class label to be able to recognize what each sign is. This is just for the purposes of our understanding and dosen't help the model in any way.

![One picture of each sign][image1]

## Design and Test a Model Architecture
### 1. Describe how you preprocessed the image data. What techniques were chosen and why did you choose these techniques? 

The Preproccessing steps include:-
* Converting the images to grayscale and normalize
* Randomly rotating the images

Here is a image pair of real and grayscaled image. This is achieved via the gray_dataset function in the code. Normalization is done by deviding by 255.

![grayscaled image][image3]

Next I decided to randomly rotate the image to provide the model with extra robustness. This is done via the add_rotation function.

![Rotated image][image4]

So the augmented datset contains a lot more images than the original. The number of rotated images added to each class is dependant on how much data is already present for that class. The number of images is either 500 or 2200-number of instances of that class(whichever is smaller).

### 2. Describe what your final model architecture looks like

The final model consists of these layers.

|       Layer       |             Description                       |
|-                  |-                                              |
|       Input       |32x32x1 Grayscaled image                       |
|                   |                                               |
|  Convolution 5x5  |No. of filters-12 Output shape-28x28x12        |
|                   |                                               |
|       RELU        |Output shape-28x28x12                          |
|                   |                                               |
|      Dropout      |Randomly make some outputs 0                   |
|                   |                                               |
|    Max Pooling    |Stride-2x2 Output shape-14x14x12               |
|                   |                                               |
|  Convolution 5x5  |No. of filters-25 Output shape-10x10x25        |
|                   |                                               |
|       RELU        |Output shape-10x10x12                          |
|                   |                                               |
|      Dropout      |Randomly make some outputs 0                   |
|                   |                                               |
|    Max Pooling    |Stride-2x2 Output shape-5x5x25                 |
|                   |                                               |
|  Fully Connected  |Input Shape-625 Output Shape-300               |
|                   |                                               |
|       RELU        |Output shape-300                               |
|                   |                                               |
|      Dropout      |Randomly make some outputs 0                   |
|                   |                                               |
|  Fully Connected  |Input Shape-300 Output Shape-100               |
|                   |                                               |
|       RELU        |Output shape-100                               |
|                   |                                               |
|      Dropout      |Randomly make some outputs 0                   |
|                   |                                               |
|  Fully Connected  |Input Shape-100 Output Shape-43                |
|                   |                                               |
|     Softmax       |Output Shape-43                                |

I tried many different values for the hyperparameters that is the number o fepochs, filter size for convolution, number of filters, output dimensions of fully connected layers and learning rate. This is the best combination that worked. Adding dropout increased the accuracy on the test set drastically showing it's regularization effects.

### 3. Describe how you trained your model. 

The optimizer used was the Adam Optimizer. The training pipeline involved passes over the augmented dataset in shuffled batches and performing gradient updates after each batch. The loss used was the cross-entropy loss. The number of epochs was 40, learning rate was 0.001. The activation used was relu in the layers except for the final  layers where softmax was used.

### 4. Describe the approach taken for finding a solution and getting the validation set accuracy to be at least 0.93.

The first model used was a normal convolution network with one fully connected layer. Than very small tweaks were made to that model. The main reason for the accuracy increasing over 0.93 is the preproccessing and data augmentation done. The rotation provided a more robust network leading to a higher test set accuracy. The grayscaling makes the network more smaller as the complexity of colored images is unneccessary in this case. Grayscaled images provide very similar results and lead to a reduction in the size of the convolution filers of the first layer. 

The initial architecture used was very simple and did not have the capacity to learn the distribution of the dataset. So I increased the number of layers which led to a drastic increase in the training set accuracy but ver bad validation accuracy. So I introduced dropout to provide a regularizing effect and finally reached a model with sufficient accuracy.

Parameter tuning involved tweaking the learning rate, Number of filter in the convolution etc. The tuning is just done to counteract and bad observations. If the loss is increasing than a general solution is to decrease the learning rate etc.

The final architecture is very similar to the lenet architecture. This architecture has proven itself time and time again and using an iterative approach and reaching this network shows that this network has the hypermeters just right and so performs very well on most tasks that are not that complicated. This is a network that has the correct proportion of complexity and accuracy.

training set accuracy of 99.8%
validation set accuracy of 93.6%
test set accuracy of 95.8%

## Test a Model on New Images
### 1. Choose five German traffic signs found on the web and provide them in the report. For each image, discuss what quality or qualities might be difficult to classify.

The five images chosen are these.

![Five images][image5]

The five images that were chosen were 30kmph and 50kmph limit, Warning road sign, Priority road, STOP sign.

Some of these signs need to be detected immediately without fail like STOP, Warning road sign and priority road. So I thought they should be tested. 30kmph and 50 kmph are chosen because they are really similar and it is essential to differenciate between them.

The results are as follows of running the network on them. The image shows the categories with the top 5 softmax probabilities.

![Top 5 softmax probabilities][image6]