# Hands-on tutorial: Traffic sign classifier with Tensorflow

## Preparation steps
You can download this jupyter notebook with git from here:
`git clone https://github.com/olesalscheider/traffic-sign-tutorial`

Please follow these preparation steps before the hands-on tutorial so that you come with a prepared system:

* Please install all required dependencies:
  * Python 3
  * Jupyter
    * `pip3 install --upgrade jupyter`
  * Pillow
    * `pip3 install --upgrade pillow`
  * urllib3
    * `pip3 install --upgrade urllib3`
  * numpy
    * `pip3 install --upgrade numpy`
  * Tensorflow
    * `pip3 install --upgrade tensorflow` for the CPU variant
    * `pip3 install --upgrade tensorflow-gpu` if you have a GPU with CUDA and CUDNN support
    * More details on https://www.tensorflow.org/install/
* Download and extract the traffic sign dataset (GTSRB). Execute the first cell with Python code the jupyter notebook to do so.

## Prepare the dataset

Let's start by splitting the data into a train and a test dataset. We store the filenames in two CSV files and use approximately 80% of the data for training:

In [1]:
import os
import numpy as np
from PIL import Image
import io
import urllib.request
import zipfile

DATA_PATH = 'data'
DATA_URL = 'http://benchmark.ini.rub.de/Dataset/GTSRB_Final_Training_Images.zip'

print('Downloading and extracting data...')
with urllib.request.urlopen(DATA_URL) as response:
    archive = response.read()
    with zipfile.ZipFile(io.BytesIO(archive)) as zip_ref:
        zip_ref.extractall(DATA_PATH)

print('Converting images and splitting datasets...')
train_file_path = os.path.join(DATA_PATH, 'train')
test_file_path = os.path.join(DATA_PATH, 'test')
with open(train_file_path, 'w') as train_file, open(test_file_path, 'w') as test_file:
    # Iterate over all image files in the training data
    # directory and store the paths in the CSV files
    for dirpath, dirnames, files in os.walk(DATA_PATH):
        is_train_example = {}
        for file in files:
            if file.endswith('.ppm'):
                # Convert ppm to png (Tensorflow cannot read ppm)
                newfile = file.replace('ppm', 'png')
                im = Image.open(os.path.join(dirpath, file))
                im.save(os.path.join(dirpath, newfile))
                im.close()

                # The last directory name encodes the class
                # of the training example.
                _, label = os.path.split(dirpath)
                
                # Convert it to an integer (this strips the leading zeros).
                label = int(label)

                # There are multiple images of each sign.
                # The number before the '_' gives the sign number.
                # Make sure that different images of the same sign are
                # only stored either in the training or the test set.
                sign_no = int(file.split('_')[0])

                # Generate the string that should be stored in the CSV file.
                # It is the image path and the class label.
                line = os.path.join(dirpath, newfile) + '\t' + str(label) + '\n'

                # Store the line either in the training or test CSV file
                if not sign_no in is_train_example.keys():
                    # keep 80% of the data for training and 20% for testing.
                    is_train_example[sign_no] = np.random.randint(0, 10) > 1
                if is_train_example[sign_no]:
                    train_file.writelines(line)
                else:
                    test_file.writelines(line)
print('Finished.')

Downloading and extracting data...
Converting images and splitting datasets...
Finished.


## Define a data reader

The data reader reads the CSV files for training and testing. These CSV files contain one example per line. This line contains the file name to the image file (png) and the class label (integer number betwenn 0 and 42).

The data reader is implemented as a `tf.data.Dataset` for each dataset and a generic `tf.data.Iterator` to iterate over the elements in the dataset.

In [2]:
import tensorflow as tf

# Create a Tensorflow graph and add all operations to it from now on.
graph = tf.Graph()
with graph.as_default():

    # Define a function that takes a line from the CSV file
    # and returns the decoded image and label
    def read_data(line):
        # Decode the line from the CSV file
        path, label = tf.decode_csv(line, [[''], [0]], field_delim='\t')
        # Read the binary data from the image file
        file = tf.read_file(path)
        # Decode the image
        image = tf.image.decode_png(file, 3)

        # Resize the image to 48x48 pixels
        image = tf.expand_dims(image, axis=0)
        image = tf.image.resize_bilinear(image, [48, 48])
        image = tf.squeeze(image, axis=0)
        image.set_shape([48, 48, 3])
        return image, label

    ## Create the training dataset
    train_dataset = tf.data.TextLineDataset(os.path.join(DATA_PATH, 'train'))
    # Shuffle the training dataset
    train_dataset = train_dataset.shuffle(30000)
    # Call the read_data function for each entry
    train_dataset = train_dataset.map(read_data, 2)
    # Repeat the dataset 2 times
    train_dataset = train_dataset.repeat(2)
    # Create batches with 32 training examples
    train_dataset = train_dataset.batch(32)

    ## Create the test dataset
    test_dataset = tf.data.TextLineDataset(os.path.join(DATA_PATH, 'test'))
    # Call the read_data function for each entry
    test_dataset = test_dataset.map(read_data, 2)
    test_dataset = test_dataset.batch(1)

    ## Create a generic iterator
    iterator = tf.data.Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes)

    ## Create initializer operations for the iterator.
    ## These assign either the test of train dataset.
    train_init_op = iterator.make_initializer(train_dataset)
    test_init_op = iterator.make_initializer(test_dataset)

## Define the model

First we define a class for one ResNet module. A ResNet module looks like this:

![ResNet module](img/resnet_module.png)

Variant 1 is used when the number of input channels equals the number of output channels.
But if the number of channels is different, the tensors cannot be summed element-wise.
In this case we have to use variant 2. Here, we add a 1x1 convolution in the skip connection. This convolution adjusts the number of channels so that both tensors can be summed element-wise.

Let's implement this module as a `tf.keras.Model`:

In [3]:
REGULARIZER_WEIGHT = 1e-5

class ResnetModule(tf.keras.Model):
    def __init__(self, name, num_output_channels):
        super().__init__(name=name)
        self.num_output_channels = num_output_channels

    # The build function is called before using the
    # model for the first time. When it is called, the
    # input shapes are (partially) known and passed as
    # parameter.
    # We instantiate the sub-layers in this function.
    def build(self, input_shapes):
        self.bn1 = tf.keras.layers.BatchNormalization()
        self.bn2 = tf.keras.layers.BatchNormalization()

        self.conv1 = tf.keras.layers.Conv2D(self.num_output_channels,
            (3, 3),
            padding='same',
            kernel_initializer=tf.keras.initializers.glorot_normal(),
            kernel_regularizer=tf.keras.regularizers.l2(REGULARIZER_WEIGHT),
            name='conv1')

        self.conv2 = tf.keras.layers.Conv2D(self.num_output_channels,
            (3, 3),
            padding='same',
            kernel_initializer=tf.keras.initializers.glorot_normal(),
            kernel_regularizer=tf.keras.regularizers.l2(REGULARIZER_WEIGHT),
            name='conv2')

        self.conv3 = None
        if input_shapes[-1] != self.num_output_channels:
            self.conv3 = tf.keras.layers.Conv2D(self.num_output_channels,
                (1, 1),
                kernel_initializer=tf.keras.initializers.glorot_normal(),
                kernel_regularizer=tf.keras.regularizers.l2(REGULARIZER_WEIGHT),
                name='conv3')

        # Call the build() function of our parent class
        super().build(input_shapes)

    # The call function is called when the model is
    # evaluated. We call the sub-layers and simple
    # functions to perform the operations of the model.
    def call(self, x):
        y = x
        x = self.bn1(x, training=True)
        x = tf.keras.activations.relu(x)
        if self.conv3:
            y = self.conv3(x)
        x = self.conv1(x)
        x = self.bn2(x, training=True)
        x = tf.keras.activations.relu(x)
        x = self.conv2(x)
        return x + y

Now we define a simple model that consists of some ResNet modules. It will look like this:

![Network](img/network.png)

The first layer is a convolution with a 7x7 kernel, stride 2 and 32 output channels.
This is followed by a batch normalization layer.
Then three ResNet modules and three 2x2 maxpool layers follow in an alternating fashion. The ResNet Modules have 64, 128 and 256 channels respectively.
The last layer is a fully-connected (dense) layer. It reduces the number of channels to the number of traffic sign classes in our dataset (43).

Let's also implement this as a `tf.keras.Model`:

In [4]:
class TrafficSignModel(tf.keras.Model):
    def __init__(self, name):
        super().__init__(name=name)

    # The build function is called before using the
    # model for the first time. When it is called, the
    # input shapes are (partially) known and passed as
    # parameter.
    # We instantiate the sub-layers in this function.
    def build(self, input_shapes):
        self.first_conv = tf.keras.layers.Conv2D(32,
            (7, 7),
            strides=(2, 2),
            activation=tf.keras.activations.relu,
            kernel_initializer=tf.keras.initializers.glorot_normal(),
            kernel_regularizer=tf.keras.regularizers.l2(REGULARIZER_WEIGHT),
            name='first_conv')

        self.bn1 = tf.keras.layers.BatchNormalization()

        self.maxpool1 = tf.keras.layers.MaxPooling2D(2, 2)
        self.maxpool2 = tf.keras.layers.MaxPooling2D(2, 2)
        self.maxpool3 = tf.keras.layers.MaxPooling2D(2, 2)
        self.flatten = tf.keras.layers.Flatten()

        self.module1 = ResnetModule('rm1', 64)
        self.module2 = ResnetModule('rm2', 128)
        self.module3 = ResnetModule('rm3', 256)

        self.fc = tf.keras.layers.Dense(43) # We have 43 classes

        # Call the build() function of our parent class
        super().build(input_shapes)

    # The call function is called when the model is
    # evaluated. We call the sub-layers and simple
    # functions to perform the operations of the model.
    def call(self, image):
        # Cast the image to float
        x = tf.cast(image, tf.float32)
        # normalize it to a range between -1 and 1
        x = (x - tf.constant(128.0, tf.float32)) / tf.constant(128.0, tf.float32)

        # Run the neural network layers on the image
        x = self.first_conv(x)
        x = self.bn1(x, training=True)
        x = self.module1(x)
        x = self.maxpool1(x)
        x = self.module2(x)
        x = self.maxpool2(x)
        x = self.module3(x)
        x = self.maxpool3(x)
        x = self.flatten(x)
        x = self.fc(x)
        return x

## Train and evaluate the model

Now that we have a data reader and defined the model, we can train it.

We will instantiate the TrafficSignModel class, pass the data from the dataset iterator as input and get the predictions as output.

From this and the ground truth, we can calculate the loss. Here we will use the cross-entropy loss.
Then we instantiate the Adam optimizer to minimize the loss function by adjusting the weights of the model.

Finally, we will initialize all variables and execute the optimizer in a training loop to get the trained network.
We will then evaluate this trained model on the test dataset.

In [5]:
with graph.as_default():
    session = tf.Session()

    # Instantiate the model we want to train
    net = TrafficSignModel('net')
    images, labels = iterator.get_next()
    logits = net(images)

    # Define the loss
    loss_op = tf.reduce_sum(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits))
    loss_op += tf.reduce_sum(net.losses) # Add regularizer losses

    # Define an OP to calculate the accuracy
    correct_prediction = tf.equal(tf.cast(tf.argmax(logits, 1), tf.int32), labels)
    accuracy_op = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    # Define the learning rate schedule and the optimizer OP.
    global_step = tf.Variable(0, name='global_step', trainable=False)
    learning_rate = tf.train.exponential_decay(1e-3, global_step, 500, 0.5)
    train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss_op, global_step=global_step)
    train_op = tf.group(train_op, net.updates) # Add batch norm updates

    # Initialize the model variables (randomly).
    session.run(tf.global_variables_initializer())
    
    print('Train the traffic light classifier. This might take a while...')
    # Initialize the dataset iterator for training.
    session.run(train_init_op)
    i = 0
    while True:
        try:
            _, accuracy, loss = session.run([train_op, accuracy_op, loss_op])
            if i % 20 == 0:
                print('Step %4i - Accuracy: %.4f, loss: %.4f' % (i, accuracy, loss))
            i += 1
        except tf.errors.OutOfRangeError:
            break # We finished!
    
    print('Evaluate the classifier. This might take a while...')
    # Initialize the dataset iterator for training.
    session.run(test_init_op)
    i = 0
    total_accuracy = 0.0
    while True:
        try:
            accuracy = session.run(accuracy_op)
            total_accuracy += accuracy
            i += 1
        except tf.errors.OutOfRangeError:
            break # We finished!
    print('Mean accuracy on the test dataset: %.4f' % (total_accuracy / i))

Train the traffic light classifier. This might take a while...
Step    0 - Accuracy: 0.0312, loss: 134.4945
Step   20 - Accuracy: 0.1875, loss: 115.2175
Step   40 - Accuracy: 0.1250, loss: 106.6475
Step   60 - Accuracy: 0.2812, loss: 80.0061
Step   80 - Accuracy: 0.5625, loss: 55.2583
Step  100 - Accuracy: 0.5625, loss: 46.9011
Step  120 - Accuracy: 0.7188, loss: 30.2140
Step  140 - Accuracy: 0.9062, loss: 14.3047
Step  160 - Accuracy: 0.8438, loss: 16.3402
Step  180 - Accuracy: 0.8750, loss: 11.6525
Step  200 - Accuracy: 0.8750, loss: 9.5030
Step  220 - Accuracy: 0.9375, loss: 8.1932
Step  240 - Accuracy: 0.9062, loss: 7.1703
Step  260 - Accuracy: 0.9062, loss: 7.1646
Step  280 - Accuracy: 0.9688, loss: 3.5025
Step  300 - Accuracy: 1.0000, loss: 0.4277
Step  320 - Accuracy: 0.9688, loss: 2.5462
Step  340 - Accuracy: 0.9062, loss: 3.9800
Step  360 - Accuracy: 0.9688, loss: 1.4479
Step  380 - Accuracy: 1.0000, loss: 1.3221
Step  400 - Accuracy: 1.0000, loss: 2.5511
Step  420 - Accuracy: