# **Traffic Sign Recognition** 

---

The goals / steps of this project are the following:
* Load the data set (see below for links to the project data set)
* Explore, summarize and visualize the data set
* Design, train and test a model architecture
* Use the model to make predictions on new images
* Analyze the softmax probabilities of the new images
* Summarize the results with a written report


[//]: # (Image References)

[image1]: ./example_images/all_labelled_images.jpg "Visualization"
[image2]: ./example_images/histogram.jpg "Histogram"
[image3]: ./examples/random_noise.jpg "Random Noise"
[image4]: ./examples/placeholder.png "Traffic Sign 1"
[image5]: ./examples/placeholder.png "Traffic Sign 2"
[image6]: ./examples/placeholder.png "Traffic Sign 3"
[image7]: ./examples/placeholder.png "Traffic Sign 4"
[image8]: ./examples/placeholder.png "Traffic Sign 5"

---

### Data Set Summary & Exploration

#### 1. Provide a basic summary of the data set. In the code, the analysis should be done using python, numpy and/or pandas methods rather than hardcoding results manually.

Here is a visual summary of the number of training images with respect to each label:
![alt text][image2]

There are in total of 34799 training images, 4410 validation images and 12630 testing images. Each image has a dimension of $32 \times 32 \times 3$. In total, there are 43 labels that uniquely determine 43 German Traffic Sign. 

#### 2. Include an exploratory visualization of the dataset.

Here is a visualization of each unqiue label and crossponding traffic sign:
![alt text][image1]

### Design and Test a Model Architecture

#### Image Processing

I have tried to train the neural network with different image pre-processing techniques and I notice the nomalization plays a huge important role. Since I have a powerful GPU on my local machine (GTX980Ti), I decide to use image input with three-color channels (RGB) direclty without applying grayscale. First, I tred to apply normalziation with `(pixel - 128)/ 128` and the validation accuracy can bareley reach 85%. Next, I applied the standard statistical nomalization technique for the training data using `pixel - pixel.mean()/pixel.std()` and the result improves the accracuy dramatlcally. 

To increase the ramdomness of data argumentation, I decide to perform an on-the-go data argumentation approach, .i.e., randomly making modifications on each image in one Epoch training. I believe the brightness might play a big role during image classification. Therefore, I ramdomly apply brightness adjustment during each Epoch training process using `brightness_adjustment()` function on the normalized image. In addition, I have also tried gaussian blur to further increase the possibility of image argumentation but the result is not phenomenal. It is then removed to increase data processing speed. 

![alt text][image2]

Here is an example of an original image and an augmented image:


#### Neural Network Models
I have experimented with two different Convolutional Neural Network (CNN) models. Here is the the one I end up using (namely nn_model()):

| Layer         		|     Description	        					| 
|:---------------------:|:---------------------------------------------:| 
| Input         		| 32x32x3 RGB image   							| 
| Convolution 5x5    	| 1x1 stride, Valid padding, outputs 28x28x32 	|
| RELU                  |                                               |
| Convolution 5x5       | 1x1 stride, Valid padding, outputs 24x24x64   |
| RELU                  |                                               |
| Max Pooling	      	| 2x2 stride, outputs 16x16x64 				    |
| Convolution 3x3       | 1x1 stride, Valid padding, outputs 10x10x128  |
| RELU                  |                                               |
| Max Pooling           | 1x1 stride, outputs 5x5x128                   |
| Convolution 2x2       | 1x1 stride, Valid padding, outputs 4x4x256    |
| RELU                  |                                               |
| Max Pooling           | 2x2 stride, outputs 2x2x256                   |
| Flatten               | outputs 1024                                  |
| Fully connected       | outputs 512                                   |
| RELU                  |                                               |
| Dropout               |                                               |
| Fully Connected       | outputs 256                                   |
| RELU                  |                                               |
| Dropout               |                                               |
| Fully Connected       | outputs 43                                    |
| Softmax				| outputs 43     								|

This particular CNN structure is inspired from VGG16. The idea is to multiple convolutional layers with decreasing kernel size as the layer gets deeper. However, the number of CNN filters doubles consecutively (32, 64, 128 ...). The itution is that the first few CNN filter learns the general features, such as edges etc. As the layer gets deeper, the smaller filters are used to learn the more detailed features. To reduce the possibility of overfitting, I added two dropout layers that remove 50% of the input from previous layers. 

Just for reference, the second CNN model`nn_model_2()` is simlar but with much less layers:

| Layer         		|     Description	        					| 
|:---------------------:|:---------------------------------------------:| 
| Input         		| 32x32x3 RGB image   							| 
| Convolution 5x5    	| 1x1 stride, Valid padding, outputs 28x28x32 	|                       |
| Max Pooling	      	| 2x2 stride, outputs 16x16x64 				    |
| RELU                  |                                               |
| Convolution 2x2       | 1x1 stride, Valid padding, outputs 12x12x64   |
| Max Pooling           | 2x2 stride, outputs 6x6x64                    |
| RELU                  |                                               |
| Flatten               | outputs 2304                                  |
| Fully connected       | outputs 1152                                  |
| RELU                  |                                               |
| Dropout               |                                               |
| Fully Connected       | outputs 576                                   |
| RELU                  |                                               |
| Dropout               |                                               |
| Fully Connected       | outputs 43                                    |
| Softmax				| outputs 43     								|


To train the model, I initialize the weights of the CNN with a mean of 0 and variance of 0.01. I break the entire training data set into 256 images for each batch and perform training with 20 epochs. The training process is relatively fast with a dedicated GPU. 

#### CNN Design Process
The design process is performed iteratively. Starting with Lenet from Yann LeCun, the validation accuracy can only reach roughly 85% even with image argumentation. From my experience, we need more complex neural network structure i.e., more CNN filters and deeper NN layers, to train the model better. But there is no free lunch, as the more complciated NN requries more data to train. It would be an overkill to use something like Resnet-50 for this paticular classifier. Therefore, the VGG-16 draws my attension during the design process. The idea is to design a CNN model that can capture enough features, while minimizing the number of layers and parameters. In the end, we do not want a overcomplicated model that is difficult to train. After a few trial and errors, I ended up with a CNN that has 10 layers. 

My final model has a validation accuracy around 96%. 


### Test a Model on New Images

To test out if my CNN can correctly classify new images that it has never seen before. Here are eight German traffic signs that I found on the web:


Before feeding testing images into the pipeline, I perform a quick resizing so it matches the input requirement of the CNN (In this case, 32x32x3). Next, we must also normalize the input using the previously introduced technique. The code is shown as the following:

```
from PIL import Image

# Scale Image to Appropriate Size 32x32x3
test_images_reshaped = np.empty(shape=(num_test_images,32,32,3),dtype =np.uint16)
for index in range(0,num_test_images):
    im = Image.fromarray(new_test_images_array[index])
    im_reshaped = im.resize((32,32),Image.LANCZOS)
    test_images_reshaped[index] = np.asarray(im_reshaped)

# Generate test images label
y_new_test = np.array([17,28,8,25,33,14,4,11])

# Generate test images feature matrix
x_new_test = test_images_reshaped

# Normalization
x_new_test_normalized = (x_new_test - x_new_test.mean())/x_new_test.std()

```

My model has sucessfully identified all testing images from the web.Here are the results of the prediction:

| Testing Image	        |     Predicted Class ID	        					| 
|:---------------------:|:---------------------------------------------:| 
| No Entry      		| 17 (No Entry)   					            | 
| U-turn     			| 28 (Children crossing)		                |
| Yield					| 8  (Speed limit 120km/h)			            |
| 100 km/h	      		| 25 (Road Work) 				                |
| Slippery Road			| 33  (Turn right ahead)			            |
| Slippery Road			| 14  (Stop)				                    |
| Slippery Road			| 4   (Speed limit 70km/h)  					|
| Slippery Road			| 11  (Right-of-way at the next intersection)	|

The code for making predictions on my final model is the following:

```
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    server.restore(sess, "./model")
    test_accuracy = evaluate(x_new_test_normalized, y_new_test)
    print('The test accuracy is ',test_accuracy*100,'%')
```

It reads the previoulsy saved model and then use `evaluate()` function to check the accuracy of each prediction. 

For the testing image `No Entry`, the model believes it has a probability 1.0 to be `No Entry`. 

The top-5 softmax probability is the following:


| Probability         	|     Predicted Class ID		       					| 
|:---------------------:|:---------------------------------------------:| 
| 1.0         			| 17  									| 
| 4.889838e-13		    | 42 										|
| 1.712276e-13		    | 6											|
| 3.379616e-15	        | 14					 				|
| 7.102676e-16          | 32      							|

Note, mathematically the sum of all probabilities has to equal to 1.0. If the first entry is 1.0, that automatically imples all the other classes has a probability of 0. Therefore, any number such as 4.889838e-13 i simply munerical error from computation, and can be treated as 0. 

For the second image 
