The goals / steps of this project are the following:
- Load the data set (see below for links to the project data set)
- Explore, summarize and visualize the data set
- Design, train and test a model architecture
- Use the model to make predictions on new images
- Analyze the softmax probabilities of the new images
- Summarize the results with a written report
In this project, the German Traffic Sign Dataset has been used to train a convolutional neural network. Each image is represented as a 32x32 pixel RGB image. The number of sign images is then split into Training, Validation and Testing sets as follows.
Data | Number of Images |
---|---|
Training | 34799 |
Validation | 4410 |
Testing | 12630 |
The testing data is never looked at except at the very end to calculate model performance. The data consists of 43 variety of traffic signs. An example of each sign from the training dataset has been shown below.
The number of images for each sign is not equal in the dataset. Some signs occur more frequently than the other. Thus, precaution needs to be taken so that the model is not biased towards more frequent signs. The frequency plot is given below.
First, the motivation is to increase the amount of data to enhance the learning. Some of the traffic signs have an axis of symmetry i.e. they are invariant to fliping them vertically, horizontally or diagonally. An example is the "Yield" sign which as a vertical axis of symmetry passing through its center. Including this flipped data will help the model capture this inherent symmetry in some of the traffic signs. Also, some pairs of signs interchange between each other on flipping like "Turn Right" becomes "Turn Left". The Symmetric properties of signs are given in the table below
ID | Data | Line of Symmetry |
---|---|---|
0 | Speed limit (20km/h) | |
1 | Speed limit (30km/h) | Horizontal |
2 | Speed limit (50km/h) | |
3 | Speed limit (60km/h) | |
4 | Speed limit (70km/h) | |
5 | Speed limit (80km/h) | Horizontal |
6 | End of speed limit (80km/h) | |
7 | Speed limit (100km/h) | |
8 | Speed limit (120km/h) | |
9 | No passing | |
10 | No passing for vehicles over 3.5 metric tons | |
11 | Right-of-way at the next intersection | Vertical |
12 | Priority road | Vertical, Horizontal |
13 | Yield | Vertical |
14 | Stop | |
15 | No vehicles | Vertical, Horizontal |
16 | Vehicles over 3.5 metric tons prohibited | |
17 | No entry | Vertical, Horizontal |
18 | General caution | Vertical |
19 | Dangerous curve to the left | Interchange 1 |
20 | Dangerous curve to the right | Interchange 1 |
21 | Double curve | |
22 | Bumpy road | Vertical |
23 | Slippery road | |
24 | Road narrows on the right | |
25 | Road work | |
26 | Traffic signals | Vertical |
27 | Pedestrians | |
28 | Children crossing | |
29 | Bicycles crossing | |
30 | Beware of ice/snow | Vertical |
31 | Wild animals crossing | |
32 | End of all speed and passing limits | Diagonal |
33 | Turn right ahead | Interchange 2 |
34 | Turn left ahead | Interchange 2 |
35 | Ahead only | Vertical |
36 | Go straight or right | Interchange 3 |
37 | Go straight or left | Interchange 3 |
38 | Keep right | Interchange 4 |
39 | Keep left | Interchange 4 |
40 | Roundabout mandatory | Diagonal |
41 | End of no passing | |
42 | End of no passing by vehicles over 3.5 metric tons |
Using these symmeric properties, the amount of training data is increased to 56368 images. Some newally added images are shown below.
Horizontal Symmetry: "Speed Limit 80km/h"
Vertical Symmetry: "Ahead Only"
Diagonal Symmetry: "End of all speed and passing limit"
Vertical and Horizontal Symmetry: "Priority Road"
Sign Interchange: "Keep Right"
In real world, a traffic sign can be viewed from a car at a slight camera angle or rotation. To make the learning algorithm more robust to such changes, minor modifications to training data is done and added. After this step, the training data consists of 225472 traffic sign images. A sample modification is shown below.
Rotation transformation of a sign image
Prespective transformation of a sign image
This step concludes the enrichment of training data.
The images in training data consists of very large vaiations in the amount of lighting and brightness depending upon the time of day. To improve the contrast properties of training data, techniques of Grayscaling and Adaptive Histogram Equalization are used. This results in transformation of training data into gray 32x32 pixel images. Some examples are given below.
Finally, each image is normalized such that each pixel value varies in the range (-1,1). We use the expression $$ norm=\frac{px-128}{128}$$
The final model architecture of Convolutional Neural Network used to solve the Traffic sign classification is as follows
Layer | Description |
---|---|
Input | 32x32x1 grayscale image |
Convolution 5x5 | 1x1 stride, valid padding, outputs 28x28x12 |
RELU | |
Max pooling | 2x2 stride, outputs 14x14x12 |
Convolution 5x5 | 1x1 stride, valid padding, outputs 10x10x36 |
RELU | |
Max pooling | 2x2 stride, outputs 5x5x36 |
Flatten | Outputs 900 |
Fully connected | Outputs 300 |
RELU | |
Fully connected | Outputs 150 |
RELU | |
Fully connected | Output logits 43 |
To train the model, hyperparameters used were EPOCHS = 25, BATCH_SIZE = 256 and RATE = 0.001. AdamOptimizer was used for learning. The final model results were:
- training set accuracy of 98.7%
- validation set accuracy of 97.0%
- test set accuracy of 94.8%
The architecture of LeNet was used as the starting point of model. Training it with un-augmented data gave results around 90% accuracy. Increasing the complexity of LeNet by increasing layer nodes lead to overfitting. Thus, the decision to generate more data was taken. Using new data with original LeNet model led to underfitting. Finally, a more complex final model was used using which much better accuracy and generalization was achieved. In our model, dropout layers weren't used because their no affect on accuracy was observed.
Here are eight German traffic signs that I found on the web to test the model:
These images seem simple to classify since there are well lit and contain no background noise. They are resized into 32x32 images, preprocessed and then fed to the model. Here are the results of the prediction:
The model was able to correctly guess all of the traffic signs, which gives an accuracy of 100%. Finally, we check the top five softmax probabilities of the predictions. This is to check the confidence with which our model has made these predictions. For these images, we find that the model is almost 100% confident about its predictions. This makes sense since the images seemed easy to classify.
This project is submitted as partial fulfillment of the Udacity's Self-Driving Car Engineer Nanodegree program. The helper code is available here.