[//]: # (Image References)

[image1]: ./examples/visualization.jpg "Visualization"
[image2]: ./examples/grayscale.jpg "Grayscaling"
[image3]: ./examples/random_noise.jpg "Random Noise"
[image4]: ./examples/placeholder.png "Traffic Sign 1"
[image5]: ./examples/placeholder.png "Traffic Sign 2"
[image6]: ./examples/placeholder.png "Traffic Sign 3"
[image7]: ./examples/placeholder.png "Traffic Sign 4"
[image8]: ./examples/placeholder.png "Traffic Sign 5"
[image9]: ./writeup__supporting_data/labels_histogram.png "Labels histogram"
[image10]: ./writeup__supporting_data/sample_images_training.png "Sample images training"
[image11]: ./writeup__supporting_data/original_normalized.png
[image12]: ./writeup__supporting_data/sample_images_training_cropped.png "Sample images training cropped"
[image13]: ./writeup__supporting_data/sample_images_training_cropped_normalized.png
[image14]: ./writeup__supporting_data/sample_images_training_cropped_grayscaled.png
[image15]: ./writeup__supporting_data/sample_images_training_cropped_grayscaled_normalized.png


## Project 2 writeup
Here is a link to my [project code](https://github.com/udacity/CarND-Traffic-Sign-Classifier-Project/blob/master/Traffic_Sign_Classifier.ipynb) with the corresponding output.

### Data set summary and exploration

#### 1. Data set(s) summary.

I used the numpy library conbined with python's native methods to calculate summary statistics of the traffic signs data set. Here are the summaries:

* Number of training examples = 34799
* Shape of an example image (in all datasets) = (32, 32, 3)
* Number of validation examples = 4410
* Number of testing examples = 12630
* Number of examples in my custom generated testing set = 5
* Number of classes = 43

#### 2. Data visualization.

##### Here's a histogram showing the distribution of the number of examples of each kind available in the datasets:
(if the x-axis labels look a bit off, find the nearest bar to the tick)
##### With an initial gaze, the number of examples of each type look pretty evenly distributed in each dataset (this is the last time we'll look at any statistic about the test set before we test our model). 

![labels_histogram][image9]

<br/><br/><br/>

##### Here are some randomly sampled images from the training dataset (arranged in the increasing order of their number of occurences) labelled with the name of the traffic sign and the number of times that sign appears in the training dataset:

![sample_images_training][image10]

<br/><br/><br/>


### Design and Test a Model Architecture


#### 1. Data set(s) preparation.

#### Transformations:
I decided to apply a few transformations to the data and test out my model's performance on each one of them. For each transformation, I'm including a few images after that transformation was applied to show what they look like (this is for general intuition -- I don't think it matters for how the neural net operates).
*Similar transformations were applied to images in each dataset*

**1. Normalization:**
* I took each dataset, computed the mean pixel value for all the images in the dataset and used it to normalize every pixel in every image in the dataset using the formula new_pixel_value = ((old_pixel_value - mean_pixel_value)/mean_pixel_value). This is with an expectation to make it easier for the model to work with the images since the mean of the pixels in the dataset is closer to zero now. Here's what the training dataset looks like now:

![original_normalized][image11]

<br/>
**2. Cropping the original images:**
* The provided dataset(s) have a 'coords' property which gives information about a bounding box around the traffic sign within the image. I cropped out the part specified by the bounding box in each of train, test and validation dataset and resized the image to 32*32 by padding it with black pixels. This should assist the neural network with translational variance in the images. For my testing dataset of 5 images, I purposefully did something similar by only taking a tightly bounded picture from the internet. Here's what the training dataset looks like now:

![sample_images_training_cropped][image12]

<br/>
**3. Normalization on the cropped images:**
* It's the same normalization, except that it is applied on the cropped images:

![sample_images_training_cropped_normalized][image13]

<br/>
**4. Grayscaling the cropped images:**
* Grayscaling will reduce the number of channels from 3 to 1 for each "pixel". This may work out if the model is too sensitive to slight variations in conditions under which images were shot etc.. 

![sample_images_training_cropped_grayscaled][image14]

<br/>
**5. Normalized grayscaled cropped images:**
* This is the same kind of normalization applied to the previous dataset. 

![sample_images_training_cropped_grayscaled_normalized][image15]

**Other things I would've tried if I had more time:**
* I was very curious to explore how generating more training images with the signs rotated by varying degrees would impact model performance. Sub-sampling takes care of translational variance but the model may still be sensitive to rotation.

<br/><br/><br/>


#### 2. Model architecture.
## TODO: <insert model type>

My final model consisted of the following layers:

| Layer         		|     Description	        					| 
|:---------------------:|:---------------------------------------------:| 
| Input         		| 32x32x3 for RGB image / 32x32x1 for grayscale | 
| Convolution 5x5/5x5   | 1x1 stride, VALID padding, outputs 28x28x6 	|
| RELU					|												|
| Max pooling	      	| 2x2 stride, VALID padding, outputs 14x14x6    |
| Convolution 5x5	    | 1x1 stride, SAME  padding, outputs 10x10x6 	|
| Max pooling	      	| 2x2 stride, VALID padding, outputs  5x 5x6    |
| Flatten	      	    | outputs 400                                   |
| Fully connected		| outputs 120       							|
| RELU					|												|
| Fully connected		| outputs 84        							|
| RELU					|												|
| Fully connected		| outputs 43        							|
| Logit          		| outputs 43        							|
| Softmax				| outputs 43        							|
 

Further, I used the AdamOptimizer for optimization operation with an optimization objective of reducing the mean cross-entropy between one-hot actual labels and max softmax probability of the model with a learning rate of 0.001.

The accuracy is calculated as the fraction of correct predictions.

I am training the model over 500 EPOCHS and a BATCH_SIZE of 128.
 
#### 3. Results.

**Training, validation and test:**
### Here are the plots of training and validation accuracies while using the different transformations of the dataset. Each one of the datasets produced models that could achieve at least 93% accuracy over 500 epochs. The curves, as seen in the plots below (and raw values that I saw) had mostly flattened out with only minor variance in accuracy over each other Epoch - so I wasn't too hopeful of the performance to incrase with more training iterations.

**Training accuracy:**
*Training dataset:* 100% on all datasets.
*Validation dataset:* Over 93% on all datasets. Over 94% on the "Cropped images dataset", the  "Cropped_grayscale_data" and the "Cropped grayscaled normalized dataset".
*Test dataset:* *92%* for the original dataset, *92.2%* for the cropped grayscaled dataset, *92.6%* for the cropped dataset, *92.7%* for the original normalized dataset, *93.1%* for the cropped grayscaled normalized dataset, *93.7%* for the cropped normalized dataset.

All the results are very comparable and satisfactory and too statistically insignificant to declare a winner here. LeNet architecture proved to be the right choice for this problem, mainly because of the similarity of this problem to the hand-written digit recognition problem where eNet has already proven to work well. Our problem has similar characteristics where translational invariance, finding hidden features etc. will help the network classify better. I also believe that the order of the number of classes between handwritten digit recognition problem and our problem is comparable, which helps teh architecture work in our favour but I have no evidece to prove that LeNet won't perform well when the number of classes is huge.

The higher accuracy on the training dataset as compared to the validation and testing dataset points to overfitting, but since we have gotten the accuracy values we wanted, I'll move on.

## TODO: Point out the specific section
The attached ipynb has the relevant outputs.

**Other things I would've tried if I had more time:**
* The model architecture is the same as from the LeNet lab in the course. I did not have to alter it to achieve the required acccuracy on the validation set. However, I believe that choosing a more informed (through triall and error) kernel size after analyzing the images in the dataset, considering dropout techniques, dirferent kinds of padding techniques etc. could have improved the performance and are worth exploring.

## TODO: Insert plots here
 

<br/><br/><br/>




###Test a Model on New Images

####1. Choose five German traffic signs found on the web and provide them in the report. For each image, discuss what quality or qualities might be difficult to classify.

Here are five German traffic signs that I found on the web:

![alt text][image4] ![alt text][image5] ![alt text][image6] 
![alt text][image7] ![alt text][image8]

The first image might be difficult to classify because ...

####2. Discuss the model's predictions on these new traffic signs and compare the results to predicting on the test set. At a minimum, discuss what the predictions were, the accuracy on these new predictions, and compare the accuracy to the accuracy on the test set (OPTIONAL: Discuss the results in more detail as described in the "Stand Out Suggestions" part of the rubric).

Here are the results of the prediction:

| Image			        |     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| Stop Sign      		| Stop sign   									| 
| U-turn     			| U-turn 										|
| Yield					| Yield											|
| 100 km/h	      		| Bumpy Road					 				|
| Slippery Road			| Slippery Road      							|


The model was able to correctly guess 4 of the 5 traffic signs, which gives an accuracy of 80%. This compares favorably to the accuracy on the test set of ...

####3. Describe how certain the model is when predicting on each of the five new images by looking at the softmax probabilities for each prediction. Provide the top 5 softmax probabilities for each image along with the sign type of each probability. (OPTIONAL: as described in the "Stand Out Suggestions" part of the rubric, visualizations can also be provided such as bar charts)

The code for making predictions on my final model is located in the 11th cell of the Ipython notebook.

For the first image, the model is relatively sure that this is a stop sign (probability of 0.6), and the image does contain a stop sign. The top five soft max probabilities were

| Probability         	|     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| .60         			| Stop sign   									| 
| .20     				| U-turn 										|
| .05					| Yield											|
| .04	      			| Bumpy Road					 				|
| .01				    | Slippery Road      							|


For the second image ... 

### (Optional) Visualizing the Neural Network (See Step 4 of the Ipython notebook for more details)
####1. Discuss the visual output of your trained network's feature maps. What characteristics did the neural network use to make classifications?

