GitHub - jzhoucliqr/sdcng-traffic-sign

#Traffic Sign Recognition

Build a Traffic Sign Recognition Project

The goals / steps of this project are the following:

Load the data set (see below for links to the project data set)
Explore, summarize and visualize the data set
Design, train and test a model architecture
Use the model to make predictions on new images
Analyze the softmax probabilities of the new images
Summarize the results with a written report

Rubric Points

###Here I will consider the rubric points individually and describe how I addressed each point in my implementation.

###Writeup / README

####1. Provide a Writeup / README that includes all the rubric points and how you addressed each one.

You're reading it! and here is a link to my project code

###Data Set Summary & Exploration

####1. Provide a basic summary of the data set. I used the pandas library to calculate summary statistics of the traffic signs data set:

The size of training set is 34799
The size of the validation set is 4410
The size of test set is 12630
The shape of a traffic sign image is (32,32,3)
The number of unique classes/labels in the data set is 43

####2. Include an exploratory visualization of the dataset.

Here is an exploratory visualization of the data set. It is a bar chart showing how the data distributed. Some classes have around 2000 samples, while others only have around 200. So In the data pre-process section we'll first try to generate more data for those with a smaller size.

###Design and Test a Model Architecture

####1. Describe how you preprocessed the image data.

convert to grayscale using: np.array(np.dot(rgb[...,:3], [0.299, 0.587, 0.114]))
normalize using: (pixel - 128)/ 128

As a first step, I decided to convert the images to grayscale because it'll reduce number of features and increase training speed.

Here is an example of a traffic sign image before and after grayscaling.

Then I normalized the image data because it'll make gradient descent running faster.

I decided to generate additional data because the model has higher training accuracy and lower validation accuracy, so it is somewhat overfitting.

The following techniques are applied to generate more data:

find the classes with less than 500 training sample, and do shifts by 2 pixel along the axis=(1,2), then add to original set. This will help make the trainig sample distribution even.
do zoom by 1.2 then slice to correct image size
do zoom by 0.8 then do np.pad
do rotate by 5,-5,10,-10 degrees

Then we add all the generated data together, we get 328433 training samples, from original 34799.

####2. Describe what your final model architecture looks like

My final model consisted of the following layers:

Layer	Description
Input	32x32x3 RGB image
Convolution 5x5	1x1 stride, valid padding, outputs 28x28x6
RELU
Max pooling	2x2 stride, outputs 14x14x6
Convolution 5x5	1x1 stride, valid padding, outputs 10x10x16
RELU
Max pooling	2x2 stride, outputs 5x5x16
Flatten	flatten both conv1 and conv2, combine outputs 1576
Fully connected	input 1576, output 120
RELU
Fully connected	input 120, output 84
RELU
Fully connected	input 84, output 43
Softmax

####3. Describe how you trained your model.

Here are the details to train the model:

Optimizer: Adam
Batch size: 128
EPOCHs: 15
Initial learning rate: 0.001

I tried to do exponentially decay the training rate whenever validation accuracy decreases, by:

	if (validation_accuracy < validation_accuracy_pre):
		rate = 0.5 * rate

Since I use a larger epoch, 15, I also tried to stop training when training accuracy is high enough by:

	if (1.0 - train_accuracy < 0.0001):
		print("Train accuracy 1.0, break")
		break

####4. Describe the approach taken for finding a solution

My final model results were:

training set accuracy of 1.0
validation set accuracy of 0.955
test set accuracy of 0.945

I first directly tried the LeNet, with no generated data, which gives about 0.9 accuracy.
Then I tried to add one more fully connected layer, which gives a higher training accuracy but even lower validation accuracy. I think add more layers makes the network overfitted, then just removed the extra layer and fall back to LeNet.
Since the architecture is already towards overfitting, I decided to focus on getting more training data in the following tests.
Step by step, I added rotation, zooming, shift, to get more data. Among which I found the zooming have most effect, then rotation, then shifting.
Then I followed the paper to add the result of first convolutional layer together with the second one to the fully connected layer. This give a little bit more accuracy, which is the final result above.

###Test a Model on New Images

####1. Choose five German traffic signs found on the web and provide them in the report.

Here are 10 German traffic signs that I download from google:

The "right of way" image is difficult to classify, because it have lots of rotation, and the image is too close to the right side.

The "Generala caution" one have another sigh underneath it, so that'll cause confusion also.

####2. Discuss the model's predictions on these new traffic signs

Here are the results of the prediction:

Image	Prediction
Stop	Stop
30 km/h	30km/h
Right of way	Yield
Priority road	Priority road
Yield	Yield
Children crossing	Children crossing
General caution	No passing
Pedestrain	General caution
Keep right	Keep right
Road work	Road work

The model was able to correctly guess 7 of the 10 traffic signs, which gives an accuracy of 70%, compared with test set accuracy with 94.5%. Because this test sample is too small, only 10 images, so it's hard to say that the model will do so badly on other images. When I tried the more clear images from https://en.wikipedia.org/wiki/Road_signs_in_Germany, like the one with following, the model sometimes can have 100% accuracy.

####3. Describe how certain the model is when predicting

The following image shows the top 5 probabilty of the testing images. For the correct predicted images, the model is pretty sure about the right label, with probability very close to 1.0, like the stop sign, 30km limit etc.

For the incorrect ones, The "right of way" image is a difficult one because it has lots of rotation, and the model is not that sure about it compared with other correctly labeled ones.

The "General Caution" image is incorrectly labeled with "No passing". I guess the german characters under the sign make the model confusing.

The "General Caution" and "Pedestrian" kinds of pretty similar, so it's not that a surprise that it labeled "Pedestrain" with "General Caution".

Visualizing the Neural Network

####1. Discuss the visual output of your trained network's feature maps.

I tried to visualize featuremap for the convolution layer 1 & 2, together with the Relu layer and max pooling layer, so total 6 images.

The convolutional layer maitains shape of the sign, and The relu layer makes the image to be like binary, either black or white. The max pooling layer dropped quite a lot information, make it quite difficult for humans to read the image. But I guess this is because the original image is pretty small, for real world images with high resolution, max pooling can help reduce lots of features.

Following are the featuremap images:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
examples		examples
img		img
img2		img2
.gitignore		.gitignore
README.md		README.md
Traffic_Sign_Classifier.html		Traffic_Sign_Classifier.html
Traffic_Sign_Classifier.ipynb		Traffic_Sign_Classifier.ipynb
checkpoint		checkpoint
signnames.csv		signnames.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rubric Points

Visualizing the Neural Network

About

Releases

Packages

Languages

jzhoucliqr/sdcng-traffic-sign

Folders and files

Latest commit

History

Repository files navigation

Rubric Points

Visualizing the Neural Network

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages