#**Traffic Sign Recognition** 

---

**Build a Traffic Sign Recognition Project**

The goals / steps of this project are the following:
* Load the data set (see below for links to the project data set)
* Explore, summarize and visualize the data set
* Design, train and test a model architecture
* Use the model to make predictions on new images
* Analyze the softmax probabilities of the new images
* Summarize the results with a written report


[//]: # (Image References)

[image1_train]: ./examples/train_hist.png "Class distribution of training set"
[image1_validation]: ./examples/validation_hist.png "Class distribution of validation set"
[image1_test]: ./examples/test_hist.png "Class distribution of test set"

[image4]: ./examples/3.jpg "Traffic Sign 1"
[image5]: ./examples/11.jpg "Traffic Sign 2"
[image6]: ./examples/14.jpg "Traffic Sign 3"
[image7]: ./examples/27.jpg "Traffic Sign 4"
[image8]: ./examples/30.jpg "Traffic Sign 5"

[image4c]: ./samples/3.jpg "Traffic Sign 1(cropped)"
[image5c]: ./samples/11.jpg "Traffic Sign 2(cropped)"
[image6c]: ./samples/14.jpg "Traffic Sign 3(cropped)"
[image7c]: ./samples/27.jpg "Traffic Sign 4(cropped)"
[image8c]: ./samples/30.jpg "Traffic Sign 5(cropped)"
## Rubric Points
###Here I will consider the [rubric points](https://review.udacity.com/#!/rubrics/481/view) individually and describe how I addressed each point in my implementation.  

---
### Writeup / README

Here is a link to my [project code](https://github.com/kzinmr/CarND-Traffic-Sign-Classifier-Project/blob/master/Traffic_Sign_Classifier.ipynb)


### Data Set Summary & Exploration

I used the numpy method to calculate summary statistics of the data set:

* The size of training set is 31367
* The size of the validation set is 3921
* The size of test set is 3921
* The shape of a traffic sign image is (32, 32, 3) which will be (36, 36, 3) after zero padding.
* The number of unique classes/labels in the data set is 43

####2. Include an exploratory visualization of the dataset.

Here is an exploratory visualization of the data set. It is a bar chart showing
the distribution of each class labels in training/validation/test set respectively.

![alt text][image1_train]
![alt text][image1_validation]
![alt text][image1_test]


### Design and Test a Model Architecture

#### Description of my preprocessing
I didn't use grayscale conversion because I got better results without this.
Zero padding is added because I use valid padding in my CNN layers.
As a last step, I normalized the image data into the range [-1,1].

#### Description of my model architecture
My model consists of two convolution layers and average poolings, and two fully connected layers.
Weight parameter initialization is done with Xavier initialization.
Batch normalization technique especially offers my model good performance.

Layers of My final model are summarized as following:

| Layer         		|     Description	        					| 
|:---------------------:|:---------------------------------------------:| 
| Input         		| 36x36x3 RGB image   							| 
| Convolution 5x5     	| 1x1 stride, valid padding, outputs 32x32x16 	|
| Batch Normalization	|												|
| RELU					|												|
| Dropout				| drop probability is 0.5						|
| Average pooling 2x2	| 1x1 stride, valid padding,  outputs 32x32x16 	|
| Convolution 5x5     	| 1x1 stride, valid padding, outputs 30x30x32 	|
| Batch Normalization	|												|
| RELU					|												|
| Dropout				| drop probability is 0.5						|
| Average pooling 2x2	| 2x2 stride, valid padding,  outputs 15x15x32 	|
| Fully connected		| outputs 256      								|
| Batch Normalization	|												|
| RELU					|												|
| Dropout				| drop probability is 0.5						|
| Fully connected		| outputs 128      								|
| Batch Normalization	|												|
| RELU					|												|
| Dropout				| drop probability is 0.5						|
| Fully connected		| outputs 43      								|
| Softmax				| etc.        									|


#### Description of my hyper parameters selection
I used the Adam optimizer because it provides decent performance.
Learning rate and batch size were chosen from a few experiments. Keep probability was fixed to 0.5. 
For the number of epochs 10, accuracy on the validation set seems high enough (99.5%).

#### Discussion of my final model


My final model results were:
* training set accuracy of  99.904 %
* validation set accuracy of 99.2 %
* test set accuracy of 99.2 %

The main factor of the validation set accuracy below 0.93 was the batch normalization technique.

If an iterative approach was chosen:
* I first choose the single convolution layer and single fully connected layer architecture with dropout which resulted in poor performance around 80% validation set accuracy (underfitting).
* Next I tried the same architecture as the final one without batch normalization which gave around 85% accuracy (maybe overfitting).
* Finally I achieved the final performance by adding batch normalization.
* I adjusted the volume sizes of each layers with respect to the number of classes by comparing LeNet weight sizes.
* After I found out the final architecture, I tuned the batch size and number of epochs in order to achieve decent accuracy.
* LeNet architecture helps with this problem because convolutional layer could capture the local features of each classes of traffic signs.
* I couldn't find out why batch normalization works so well compared with dropout.
 

### Test a Model on New Images

#### Description of my test images
I collect original five images from the web.

Here are five German traffic signs that I found on the web:

![alt text][image4] ![alt text][image5] ![alt text][image6] 
![alt text][image7] ![alt text][image8]

All of these skew images except Image 3 were unable to classify at top-1 by my model(ACC 20%).
So I cropped these images into square manually. Then the accuracy was improved to 80 %.
Here are the cropped images actually used in my prediction:
![alt text][image4c] ![alt text][image5c] ![alt text][image6c] 
![alt text][image7c] ![alt text][image8c]

#### Description of my final predictions

Here are the results of the prediction:

| Image			        |     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| 60km/h        		| 60km/h    									| 
| Right-of-way 			| Right-of-way 									|
| Stop					| Stop											|
| Pedestrians	      	| Pedestrians					 				|
| Beware of ice/snow	| 100km/h           							|


The model was able to correctly guess 4 of the 5 traffic signs, which gives an accuracy of 80%. 
The image of 'Beware of ice/snow' (30.jpg) was still unrecognized. 
From the observation of feature maps, I guess it is because the resolution is too small to grasp the detail of the sign.

#### Description of the certainty

The top five soft max probabilities were

| Probability         	|     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| .99         			| 60km/h    									| 
| .99     				| Right-of-way									|
| .99					| Stop											|
| .98	      			| Pedestrians					 				|
| .99				    | 100km/h           							|


From these results, my model perform well on the cropped data although it seems ovefitting for each classes.

### (Optional) Visualizing the Neural Network (See Step 4 of the Ipython notebook for more details)

From the obtained feature maps of two convolution layers, we could interprete the first layer feature maps, but couldn't the second layer ones obviously. For '30.jpg(Beware of ice/snow)', it's difficult to interprete even the first feature map.