# **Traffic Sign Recognition** 
**Build a Traffic Sign Recognition Project**
The goals / steps of this project are the following:
* Load the data set (see below for links to the project data set)
* Explore, summarize and visualize the data set
* Design, train and test a model architecture
* Use the model to make predictions on new images
* Analyze the softmax probabilities of the new images
* Summarize the results with a written report


## Rubric Points
### Here I will consider the [rubric points](https://review.udacity.com/#!/rubrics/481/view) individually and describe how I addressed each point in my implementation. 

**Build a Traffic Sign Recognition Project**

The goals / steps of this project are the following:
* Load the data set (see below for links to the project data set)
* Explore, summarize and visualize the data set
* Design, train and test a model architecture
* Use the model to make predictions on new images
* Analyze the softmax probabilities of the new images
* Summarize the results with a written report

[image1]: ./pictures_for_writeup/3_test_signs.png "3 Traffic Sign"
[image2]: ./pictures_for_writeup/training_distribution.png "training distribution"
[image3]: ./pictures_for_writeup/validation_distribution.png "validation distribution"
[image4]: ./pictures_for_writeup/testing_distribution.png "testing distribution"
[image5]: ./pictures_for_writeup/Before_converted_and_normalize.png "before convert"
[image6]: ./pictures_for_writeup/After_converted_and_normalize.png "after convert"
[image7]: ./pictures_for_writeup/1-Figure2-1.png "CNN architecture"

---
### Writeup / README
You're reading it! and here is a link to my [project code](https://github.com/sucre1990/CarND_2ndProject/blob/master/Traffic_Sign_Classifier.ipynb)

### Data Set Summary & Exploration

#### 1. Provide a basic summary of the data set.

The content for this section is in cell 2 of the python notebook and here are some statistics:

* Number of training examples = 34799
* Number of validation examples = 4410
* Number of testing examples = 12630
* Image data shape = (32, 32, 3)
* Number of classes = 43

#### 2. Include an exploratory visualization of the dataset.
Here is an exploratory visualization of the data set.
* First is 3 random pictures from training dataset and I used matplotlib to plot the pictures.


![3 random testing signs][image1]

* Hist graphs below show the traffic sign distribution among training data, validation data and testing data

![training distribution][image2]

![validation distribution][image3]

![testing distribution][image4]
**As we can see from the three distrbution graphs. We have uneven data both for training, validating and testing. It may bring us problems**

### Design and Test a Model Architecture
#### 1. Describe how you preprocessed the image data.

*I got inspired by Pierre Sermanet and Yann LeCun's "Traffic Sign Recognition with Multi-Scale Convolutional Networks"*
**Image Augmentation**
As we see from the data exploration, the training data is not evenly distributed. For better result, I added augumentation for the training set (every catagory reached around 2200 samples, augmentation methods: randomly adopts horiztally flipping, adding Gaussian noise and random dropout)

**Reason for Image Augmentation**
1. balance the training set
2. increase the total training sample

Here is the new distribution:
[image2.1]:./pictures_for_writeup/Augmented_training_distribution.png "Augmented training distribution"

![][image2.1]
**Image Processing (data preprocessing)**

As the first step, I converted the RGB format to gray and normalized it. (Instead of converting RGB to YUV space and using UV channel like LeCun's work, I chose to use only Y channel which is the grayscale channel).

**Reason for Converting to Grayscale and Normalization **

1. Converting grayscale is to better capture the content in the picture and reduce irrelavant information.

2. Normalizing grayscale picture is for training the CNN. Since the SGD algorithm relys on backward propagation to update weigths throughout the network as training examples are passed through. If we didn't scale our input values, some of the feature values would be very likely different from others' and this would cause weights updated not evenly (some over compensated and some under corrected) and hard to find opitmization point.

Here is an example of a traffic sign image before and after grayscaling.
![][image5]
![][image6]
After grayscaling them, I applied a normalization to the picture as the CNN requires normalization. 
You can find **CVT_Gray_Norm** function is dealing with those two steps.

*Improvements in the future: There are no more data processing after normaliztion, but I did realize there are some pictures are too dark and in the future I will consider additional processes.*

#### 2. Describe what your final model architecture looks like including model type, layers, layer sizes, connectivity, etc.)

Here I used a picture from [Pierre Sermanet and Yann LeCun's Traffic Sign Recognition with Multi-Scale Convolutional Networks (http://yann.lecun.com/exdb/publis/pdf/sermanet-ijcnn-11.pdf) to illustrate my CNN architecture.
![][image7]


| Layer         		|     Description	        					            | 
|:---------------------:|:---------------------------------------------------------:| 
| Input         		| 32x32x1 RGB image   							            | 
| Convolution 5x5     	| 1x1 stride, same padding, 30 filters, outputs 28x28x30   	|
| RELU					|												            |
| Max pooling	      	| 2x2 stride,  outputs 14x14x30, as CONV1 				    |
| Convolution 5x5	    | 1x1 stride, same padding, 64 filters, outputs 10x10x64    |
| RELU          		|         									                |
| Max pooling		    | 2x2 stride,  outputs 5x5x64, as CONV2   	                |
| Fully connected		| **CONV1 and CON2 were flatten together, 1 fully conneted layers.**	|
| RELU                  |                                                           |
|Dropout                | Keep Probability = 0.3                                    |
| Softmax				| one hot encoded											|

#### 3. Describe how you trained your model.

I used **SGD** as my approach to train the model and chose **AdamOptimizer** as the optimizer.
In the training and experiment, my focus was find a good learning pace and also minimize overfitting. 

The parameters I tuned were **EPOCHS**, **BATCH_SIZE**, **learning rate** and **dropout**. **BATCH_SIZE**, **learning rate** were tuned to find a good leanrning pace, so that gradient desenct would not take too long or be stuck in a local optimal point or not converge. **dropout** was used to prevent overfitting.

Here are the parameters:
**EPOCHS** = 15
**BATCH_SIZE** = 128
**learning rate** = 0.001
**dropout** = 0.7

#### 4. Describe the approach taken for finding a solution and getting the validation set accuracy to be at least 0.93.
After reading [Pierre Sermanet and Yann LeCun's Traffic Sign Recognition with Multi-Scale Convolutional Networks (http://yann.lecun.com/exdb/publis/pdf/sermanet-ijcnn-11.pdf), I changed LeNet's architecture and adopted their Multi-scale Convolutional Networks, which, instead of only fully connets second convolutional result, combines the first layer's convolutional result.

Then I tuned parameters like **EPOCHS**, **BATCH_SIZE**, **learning rate** and **dropout** to make validation set accuracy to be at least 0.93. The result is pretty good. Since the nature of SGD, the accuracies were oscillating, but overall **training accuracy is 99.8% and validation accuracy on average is 96.5% **.

I noticed the training accuracy is 100%, which may be a sign of overfitting, but with **dropout = 0.7** and later's testing accuracy similar to validation accuracy, I believe the model is not overfitting.


**Multi-Scale Convolutional Networks (MSCNN) architecture vs LeNet architecture**: *MSCNN's feeding both 1st and 2nd layers of convolutional result to the classifier provides different scales to the model. [The motivation for combining
representation from multiple stages in the classifier is to
provide different scales of receptive fields to the classifier.
In the case of 2 stages of features, the second stage extracts
“global” and invariant shapes and structures, while the first
stage extracts “local” motifs with more precise details. We
demonstrate the accuracy gain of using such layer-skipping
connections in section III-B.] (http://yann.lecun.com/exdb/publis/pdf/sermanet-ijcnn-11.pdf)*

My final model results were:
* training set accuracy of 100%
* validation set accuracy of 95+%
* test set accuracy of 95%

Here is the training log:
EPOCH 1 ...
Training Accuracy = 0.871
Validation Accuracy = 0.773

EPOCH 2 ...
Training Accuracy = 0.960
Validation Accuracy = 0.884

EPOCH 3 ...
Training Accuracy = 0.978
Validation Accuracy = 0.912

EPOCH 4 ...
Training Accuracy = 0.986
Validation Accuracy = 0.933

EPOCH 5 ...
Training Accuracy = 0.992
Validation Accuracy = 0.938

EPOCH 6 ...
Training Accuracy = 0.992
Validation Accuracy = 0.919

EPOCH 7 ...
Training Accuracy = 0.996
Validation Accuracy = 0.946

EPOCH 8 ...
Training Accuracy = 0.996
Validation Accuracy = 0.945

EPOCH 9 ...
Training Accuracy = 0.997
Validation Accuracy = 0.952

EPOCH 10 ...
Training Accuracy = 0.998
Validation Accuracy = 0.950

EPOCH 11 ...
Training Accuracy = 0.998
Validation Accuracy = 0.951

EPOCH 12 ...
Training Accuracy = 0.998
Validation Accuracy = 0.954

EPOCH 13 ...
Training Accuracy = 0.998
Validation Accuracy = 0.952

EPOCH 14 ...
Training Accuracy = 0.999
Validation Accuracy = 0.961

EPOCH 15 ...
Training Accuracy = 0.998
Validation Accuracy = 0.965

### Test a Model on New Images
[image8]: ./pictures_for_writeup/resize_Caution.jpg "Caution"
[image9]: ./pictures_for_writeup/resize_PriorityRoad.jpg "PriorityRoad"
[image10]: ./pictures_for_writeup/resize_stop.jpg "stop"
[image11]: ./pictures_for_writeup/resize_30speed.jpg "30speed"
[image12]: ./pictures_for_writeup/resize_Do-Not-Enter.jpg "Do-Not-Enter"
[image13]: ./pictures_for_writeup/resize_roadWorks-2.jpg "roadWorks"


Here are 6 newly downloaed pictures:
![][image8]
![][image9]
![][image10]
![][image11]
![][image12]
![][image13]

I think the 4th and last are a little bit difficult to predict, since they have some background noises.

#### 2. Discuss the model's predictions on these new traffic signs and compare the results to predicting on the test set.
For the web downloaded testing set, the accuracy is 100%, compared to **testing data's 95%**. It is pretty normal, since our web testing data is too small (only 6).

Here are the results of the predictions:

| Image			        |     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| Caution        		| Caution   									| 
| Priority Road     	| Priority Road									|
| Stop Sign				| Stop Sign										|
| 30 km/h limit	      	| 30 km/h limit					 				|
| Do-Not-Enter          |Do-Not-Enter|
| Road Works			| Road Works      							    |

As shown above, the model's predictions on those 6 new pictures are all correct.

#### 3. Describe how certain the model is when predicting on each of the five new images by looking at the softmax probabilities for each prediction. Provide the top 5 softmax probabilities for each image along with the sign type of each probability.

Here are the top 5 softmax probabilities for the 6 newly downloaded pictures:

|  Top 1 Probability         	|     Prediction	        					| 
|:---------------------:|:---------------------------------------------:| 
| 1.00         			| Caution  									    | 
| 1.00     				| Priority Road									|
| 1.00					| Stop Sign										|
| 0.557	      			| 30 km/h limit						 			|
| 1.00				    | Do-Not-Enter     							    |
| 0.991                 | Road Works                                    |
Here are the pie charts showing relative probabilities distribution:
[image14]: ./pictures_for_writeup/picture_0.jpg "picture1"
[image15]: ./pictures_for_writeup/picture_1.jpg "picture2"
[image16]: ./pictures_for_writeup/picture_2.jpg "picture3"
[image17]: ./pictures_for_writeup/picture_3.jpg "picture4"
[image18]: ./pictures_for_writeup/picture_4.jpg "picture5"
[image19]: ./pictures_for_writeup/picture_5.jpg "picture6"

![][image14]
![][image15]
![][image16]
![][image17]
![][image18]
![][image19]

**We can see here, compared to Picture 1,2,3 and 5, the predictions for Picture 4 and 5 are not that certain, especially Picture 4.**

### (Optional) Visualizing the Neural Network (See Step 4 of the Ipython notebook for more details)

Example I used "Caution Sign" as an example.
Processed Image
[image20]: ./pictures_for_writeup/visualization1.jpg "visualization1"
![][image20]

After being converted to grayscale and normalized
[image21]: ./pictures_for_writeup/visualization2.jpg "visualization2"
![][image21]

First convolutional layer 30 filters
[image22]: ./pictures_for_writeup/First_layer_filters.png "1st layer"
![][image22]
Second convolutional layer 64 filters
[image23]: ./pictures_for_writeup/Second_layer_filters.png "2nd layer"
![][image23]

**As we can see the first layer seems to find the general shape and second layer is trying to identify certain pattern.**