# Introduction

In this notebook, I want to continue working with the model form the experiment 1. The model was able to learn the steering angles for the three hand-picked images but the question is can it learn to actually steer the car in the simulator's autonomous mode. Given the discussion about recovery in the project material, it is unlikely that the provided sample training data is enough to teach the model to drive, but doing a test with that data would give at least a baseline to work from.

Here is the overall plan
1. Recreate the model from experiment 1
1. Create training data using the provided sample data
1. Train the model using the whole training data and see if it any learning takes place
1. If needed, tweak the model to get better training performance
1. Test the model with the simulator to see how it performs

Here are some utility functions.

In [17]:
import os
from PIL import Image

def get_record_and_image(index):
    record = df.iloc[index]
    path = os.path.join('data', record.center)
    return record, Image.open(path)

def layer_info(model):
    for n, layer in enumerate(model.layers, 1):
        print('Layer {:2} {:16} input shape {} output shape {}'.format(n, layer.name, layer.input_shape, layer.output_shape))

## Step 1: Recreate the model from experiment 1

This is an exact copy of the model from experiment 1 with one difference: the input image size is halved, because the images will be downscaled this time. The reason for the downscaling is explained in Step 2.

In [41]:
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D

model = Sequential()
model.add(Convolution2D(6, 5, 5, border_mode='valid', subsample=(5, 5), input_shape=(80, 160, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(16, 5, 5, border_mode='valid', subsample=(2, 2)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(120))
model.add(Activation('relu'))
model.add(Dense(84))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('tanh'))

layer_info(model)

Layer  1 convolution2d_25 input shape (None, 80, 160, 3) output shape (None, 16, 32, 6)
Layer  2 activation_61    input shape (None, 16, 32, 6) output shape (None, 16, 32, 6)
Layer  3 maxpooling2d_21  input shape (None, 16, 32, 6) output shape (None, 8, 16, 6)
Layer  4 convolution2d_26 input shape (None, 8, 16, 6) output shape (None, 2, 6, 16)
Layer  5 activation_62    input shape (None, 2, 6, 16) output shape (None, 2, 6, 16)
Layer  6 maxpooling2d_22  input shape (None, 2, 6, 16) output shape (None, 1, 3, 16)
Layer  7 flatten_13       input shape (None, 1, 3, 16) output shape (None, 48)
Layer  8 dense_37         input shape (None, 48) output shape (None, 120)
Layer  9 activation_63    input shape (None, 120) output shape (None, 120)
Layer 10 dense_38         input shape (None, 120) output shape (None, 84)
Layer 11 activation_64    input shape (None, 84) output shape (None, 84)
Layer 12 dense_39         input shape (None, 84) output shape (None, 1)
Layer 13 activation_65    input shape

## Step 2: Create training set

In [21]:
import numpy as np
import pandas as pd

df = pd.read_csv('data/driving_log.csv')

Now I need to create the actual training data, X_train and y_train. I will just read all the images and store them as NumPy arrays to X_train. Similary, I read the corresponding steering angles and store them to y_train.

Note: I ended up scaling the images down to half size to conserve memory and speed up training. This was also mentioned in the project cheat sheet (https://carnd-forums.udacity.com/questions/26214464/behavioral-cloning-cheatsheet).

In [22]:
from tqdm import tqdm

X_train = []
y_train = []
for i in tqdm(range(len(df))):
    record, image = get_record_and_image(i)
    image = image.resize((image.width // 2, image.height // 2))
    X_train.append(np.array(image))
    image.close()
    y_train.append(record['steering'])
    

100%|██████████| 8036/8036 [00:40<00:00, 200.22it/s]


Some preprocessing: normalize the images and convert the y_train to a NumPy array because that is what the Keras fit() seems to want. This step takes some time and consumes also a lot of memory; downscaling the images above helps.

In [23]:
X_min = np.min(X_train)
X_max = np.max(X_train)
X_normalized = (X_train - X_min) / (X_max - X_min) - 0.5
y_train = np.array(steering_angles)

## Step 3: Train the model

Here I use all the data from the sample training data, 8036 images and their steering angles. Instead of using the training data generator as in the experiment 1, I just give the whole training set to model.fit and let it split it to training and validation sets. After training, I save the model so it can be loaded to the simulator for testing if the training seems to proceed well.

In [48]:
def train(model, nb_epoch=10):
    model.compile('adam', 'mse')
    model.fit(X_normalized, y_train, validation_split=0.2, nb_epoch=nb_epoch, verbose=2)
    model.save('model.h5')

In [42]:
train(model)

Train on 6428 samples, validate on 1608 samples
Epoch 1/10
12s - loss: 0.0132 - val_loss: 0.0122
Epoch 2/10
11s - loss: 0.0102 - val_loss: 0.0108
Epoch 3/10
11s - loss: 0.0093 - val_loss: 0.0110
Epoch 4/10
11s - loss: 0.0086 - val_loss: 0.0111
Epoch 5/10
11s - loss: 0.0080 - val_loss: 0.0109
Epoch 6/10
12s - loss: 0.0075 - val_loss: 0.0107
Epoch 7/10
11s - loss: 0.0072 - val_loss: 0.0124
Epoch 8/10
11s - loss: 0.0068 - val_loss: 0.0106
Epoch 9/10
11s - loss: 0.0064 - val_loss: 0.0116
Epoch 10/10
11s - loss: 0.0061 - val_loss: 0.0113


The validation error does not get much lower after epoch 4 or so, whereas the training error keeps falling. This indicates overtraining and poor generalization ability. 

Lets do a bit of random sampling of the predicted steering angles to get a feeling how they match with the actual angles.

In [43]:
from random import randrange

def sample_predictions(model):
    for i in range(10):
        index = randrange(len(df))
        X = np.expand_dims(X_normalized[index], axis=0)
        y = y_train[index]
        print('Actual steering angle {} model prediction {}'.format(y, model.predict(X)[0][0]))
        
sample_predictions(model)

Actual steering angle 0.0 model prediction 0.1719207465648651
Actual steering angle 0.0 model prediction 0.028600577265024185
Actual steering angle 0.0 model prediction 0.017018768936395645
Actual steering angle 0.0 model prediction -0.05043351650238037
Actual steering angle -0.41104840000000004 model prediction -0.14591500163078308
Actual steering angle 0.0 model prediction -0.03481806069612503
Actual steering angle 0.0 model prediction 0.023184774443507195
Actual steering angle 0.0 model prediction -0.05616610124707222
Actual steering angle 0.1287396 model prediction 0.16128838062286377
Actual steering angle -0.0787459 model prediction -0.05938925966620445


The sample predictions do not look very good. Some tweaks to the model are in place.

## Step 4: Tweaking the model

So what could be done to the model to improve it? Basically there are three different approaches for changing the model:

1. Keep the model as it is, but try to improve its generalization ability
2. Keep the current architecture, but increase the amount of weights
3. Do some changes to the model's architecture

Before going for options 2 or 3, let's consider option 1 as it is more conservative than the other. A simple way to try to increase the generalization ability is add dropout layers, which force the model to learn redundant connections. Let's try that.

In [44]:
model_2 = Sequential()
model_2.add(Convolution2D(6, 5, 5, border_mode='valid', subsample=(5, 5), input_shape=(80, 160, 3)))
model_2.add(Dropout(0.5))
model_2.add(Activation('relu'))
model_2.add(MaxPooling2D(pool_size=(2, 2)))
model_2.add(Convolution2D(16, 5, 5, border_mode='valid', subsample=(2, 2)))
model_2.add(Dropout(0.5))
model_2.add(Activation('relu'))
model_2.add(MaxPooling2D(pool_size=(2, 2)))
model_2.add(Flatten())
model_2.add(Dense(120))
model_2.add(Activation('relu'))
model_2.add(Dense(84))
model_2.add(Activation('relu'))
model_2.add(Dense(1))
model_2.add(Activation('tanh'))

layer_info(model_2)

Layer  1 convolution2d_27 input shape (None, 80, 160, 3) output shape (None, 16, 32, 6)
Layer  2 dropout_11       input shape (None, 16, 32, 6) output shape (None, 16, 32, 6)
Layer  3 activation_66    input shape (None, 16, 32, 6) output shape (None, 16, 32, 6)
Layer  4 maxpooling2d_23  input shape (None, 16, 32, 6) output shape (None, 8, 16, 6)
Layer  5 convolution2d_28 input shape (None, 8, 16, 6) output shape (None, 2, 6, 16)
Layer  6 dropout_12       input shape (None, 2, 6, 16) output shape (None, 2, 6, 16)
Layer  7 activation_67    input shape (None, 2, 6, 16) output shape (None, 2, 6, 16)
Layer  8 maxpooling2d_24  input shape (None, 2, 6, 16) output shape (None, 1, 3, 16)
Layer  9 flatten_14       input shape (None, 1, 3, 16) output shape (None, 48)
Layer 10 dense_40         input shape (None, 48) output shape (None, 120)
Layer 11 activation_68    input shape (None, 120) output shape (None, 120)
Layer 12 dense_41         input shape (None, 120) output shape (None, 84)
Layer 13 a

In [45]:
train(model_2)
sample_predictions(model_2)

Train on 6428 samples, validate on 1608 samples
Epoch 1/10
13s - loss: 0.0176 - val_loss: 0.0156
Epoch 2/10
12s - loss: 0.0145 - val_loss: 0.0145
Epoch 3/10
12s - loss: 0.0135 - val_loss: 0.0135
Epoch 4/10
12s - loss: 0.0128 - val_loss: 0.0139
Epoch 5/10
12s - loss: 0.0118 - val_loss: 0.0156
Epoch 6/10
12s - loss: 0.0116 - val_loss: 0.0143
Epoch 7/10
12s - loss: 0.0111 - val_loss: 0.0134
Epoch 8/10
12s - loss: 0.0111 - val_loss: 0.0127
Epoch 9/10
12s - loss: 0.0107 - val_loss: 0.0139
Epoch 10/10
12s - loss: 0.0107 - val_loss: 0.0131
Actual steering angle 0.1478767 model prediction 0.026851575821638107
Actual steering angle 0.0 model prediction 0.03950202092528343
Actual steering angle 0.1765823 model prediction 0.04238557070493698
Actual steering angle 0.0 model prediction 0.013100355863571167
Actual steering angle -0.002791043 model prediction -0.016838598996400833
Actual steering angle 0.1957194 model prediction 0.07659861445426941
Actual steering angle 0.1765823 model prediction 0.0

The performace is even poorer now so the model is probably not complex enough to learn the given data set. I could increase the layer dimensions directly, but there is another way: remove the pooling layers. Pooling is analogous to downsampling and it reduces the amount of weights in the model. Let's strip the pooling layers and see what happens.

In [46]:
model_3 = Sequential()
model_3.add(Convolution2D(6, 5, 5, border_mode='valid', subsample=(5, 5), input_shape=(80, 160, 3)))
model_3.add(Dropout(0.5))
model_3.add(Activation('relu'))
#model_3.add(MaxPooling2D(pool_size=(2, 2)))
model_3.add(Convolution2D(16, 5, 5, border_mode='valid'))
model_3.add(Dropout(0.5))
model_3.add(Activation('relu'))
#model_3.add(MaxPooling2D(pool_size=(2, 2)))
model_3.add(Flatten())
model_3.add(Dense(120))
model_3.add(Activation('relu'))
model_3.add(Dense(84))
model_3.add(Activation('relu'))
model_3.add(Dense(1))
model_3.add(Activation('tanh'))

layer_info(model_3)

Layer  1 convolution2d_29 input shape (None, 80, 160, 3) output shape (None, 16, 32, 6)
Layer  2 dropout_13       input shape (None, 16, 32, 6) output shape (None, 16, 32, 6)
Layer  3 activation_71    input shape (None, 16, 32, 6) output shape (None, 16, 32, 6)
Layer  4 convolution2d_30 input shape (None, 16, 32, 6) output shape (None, 12, 28, 16)
Layer  5 dropout_14       input shape (None, 12, 28, 16) output shape (None, 12, 28, 16)
Layer  6 activation_72    input shape (None, 12, 28, 16) output shape (None, 12, 28, 16)
Layer  7 flatten_15       input shape (None, 12, 28, 16) output shape (None, 5376)
Layer  8 dense_43         input shape (None, 5376) output shape (None, 120)
Layer  9 activation_73    input shape (None, 120) output shape (None, 120)
Layer 10 dense_44         input shape (None, 120) output shape (None, 84)
Layer 11 activation_74    input shape (None, 84) output shape (None, 84)
Layer 12 dense_45         input shape (None, 84) output shape (None, 1)
Layer 13 activation

In [49]:
train(model_3, 20)
sample_predictions(model_3)

Train on 6428 samples, validate on 1608 samples
Epoch 1/20
26s - loss: 0.0085 - val_loss: 0.0108
Epoch 2/20
25s - loss: 0.0083 - val_loss: 0.0105
Epoch 3/20
24s - loss: 0.0082 - val_loss: 0.0106
Epoch 4/20
24s - loss: 0.0081 - val_loss: 0.0101
Epoch 5/20
25s - loss: 0.0079 - val_loss: 0.0100
Epoch 6/20
27s - loss: 0.0080 - val_loss: 0.0100
Epoch 7/20
25s - loss: 0.0074 - val_loss: 0.0104
Epoch 8/20
26s - loss: 0.0077 - val_loss: 0.0098
Epoch 9/20
26s - loss: 0.0075 - val_loss: 0.0099
Epoch 10/20
24s - loss: 0.0073 - val_loss: 0.0103
Epoch 11/20
25s - loss: 0.0076 - val_loss: 0.0098
Epoch 12/20
25s - loss: 0.0073 - val_loss: 0.0098
Epoch 13/20
24s - loss: 0.0072 - val_loss: 0.0102
Epoch 14/20
25s - loss: 0.0072 - val_loss: 0.0101
Epoch 15/20
25s - loss: 0.0071 - val_loss: 0.0100
Epoch 16/20
25s - loss: 0.0071 - val_loss: 0.0100
Epoch 17/20
25s - loss: 0.0069 - val_loss: 0.0101
Epoch 18/20
25s - loss: 0.0068 - val_loss: 0.0103
Epoch 19/20
25s - loss: 0.0065 - val_loss: 0.0100
Epoch 20/20

A bit better but even after 20 epochs not that much of an improvement. I begin to suspect that I need to increase the model's complexity quite a bit. At this point I will try to replicate the architecture from the NVidia paper (http://images.nvidia.com/content/tegra/automotive/images/2016/solutions/pdf/end-to-end-dl-using-px.pdf) and see what kind of difference it makes.

In [57]:
model_4 = Sequential()
model_4.add(Convolution2D(24, 5, 5, border_mode='valid', subsample=(2, 2), input_shape=(80, 160, 3)))
model_4.add(Activation('relu'))
model_4.add(Convolution2D(36, 5, 5, border_mode='valid', subsample=(2, 2)))
model_4.add(Activation('relu'))
model_4.add(Convolution2D(48, 5, 5, border_mode='valid', subsample=(2, 2)))
model_4.add(Activation('relu'))
model_4.add(Convolution2D(64, 3, 3, border_mode='valid'))
model_4.add(Activation('relu'))
model_4.add(Convolution2D(64, 3, 3, border_mode='valid'))
model_4.add(Activation('relu'))
model_4.add(Flatten())
model_4.add(Dense(100))
model_4.add(Activation('relu'))
model_4.add(Dense(50))
model_4.add(Activation('relu'))
model_4.add(Dense(10))
model_4.add(Activation('relu'))
model_4.add(Dense(1))
model_4.add(Activation('tanh'))

layer_info(model_4)

Layer  1 convolution2d_51 input shape (None, 80, 160, 3) output shape (None, 38, 78, 24)
Layer  2 activation_117   input shape (None, 38, 78, 24) output shape (None, 38, 78, 24)
Layer  3 convolution2d_52 input shape (None, 38, 78, 24) output shape (None, 17, 37, 36)
Layer  4 activation_118   input shape (None, 17, 37, 36) output shape (None, 17, 37, 36)
Layer  5 convolution2d_53 input shape (None, 17, 37, 36) output shape (None, 7, 17, 48)
Layer  6 activation_119   input shape (None, 7, 17, 48) output shape (None, 7, 17, 48)
Layer  7 convolution2d_54 input shape (None, 7, 17, 48) output shape (None, 5, 15, 64)
Layer  8 activation_120   input shape (None, 5, 15, 64) output shape (None, 5, 15, 64)
Layer  9 convolution2d_55 input shape (None, 5, 15, 64) output shape (None, 3, 13, 64)
Layer 10 activation_121   input shape (None, 3, 13, 64) output shape (None, 3, 13, 64)
Layer 11 flatten_23       input shape (None, 3, 13, 64) output shape (None, 2496)
Layer 12 dense_67         input shape (

In [58]:
train(model_4)
sample_predictions(model_4)

Train on 6428 samples, validate on 1608 samples
Epoch 1/10
137s - loss: 0.0118 - val_loss: 0.0110
Epoch 2/10
133s - loss: 0.0095 - val_loss: 0.0104
Epoch 3/10
130s - loss: 0.0090 - val_loss: 0.0099
Epoch 4/10
131s - loss: 0.0084 - val_loss: 0.0111
Epoch 5/10
130s - loss: 0.0081 - val_loss: 0.0111
Epoch 6/10
133s - loss: 0.0077 - val_loss: 0.0117
Epoch 7/10
131s - loss: 0.0071 - val_loss: 0.0106
Epoch 8/10
131s - loss: 0.0065 - val_loss: 0.0111
Epoch 9/10
131s - loss: 0.0061 - val_loss: 0.0107
Epoch 10/10
130s - loss: 0.0052 - val_loss: 0.0127
Actual steering angle 0.0 model prediction -0.012478272430598736
Actual steering angle 0.1765823 model prediction 0.15921710431575775
Actual steering angle 0.0 model prediction -0.024579377844929695
Actual steering angle 0.0 model prediction -0.008382455445826054
Actual steering angle 0.0 model prediction 0.041543230414390564
Actual steering angle 0.1670138 model prediction 0.09181094169616699
Actual steering angle 0.0 model prediction 0.083860784

In [61]:
model_4 = Sequential()
model_4.add(Convolution2D(24, 5, 5, border_mode='valid', subsample=(2, 2), input_shape=(80, 160, 3)))
model_4.add(Activation('relu'))
model_4.add(Dropout(0.5))
model_4.add(Convolution2D(36, 5, 5, border_mode='valid', subsample=(2, 2)))
model_4.add(Activation('relu'))
model_4.add(Dropout(0.5))
model_4.add(Convolution2D(48, 5, 5, border_mode='valid', subsample=(2, 2)))
model_4.add(Activation('relu'))
model_4.add(Dropout(0.5))
model_4.add(Convolution2D(64, 3, 3, border_mode='valid'))
model_4.add(Activation('relu'))
model_4.add(Dropout(0.5))
model_4.add(Convolution2D(64, 3, 3, border_mode='valid'))
model_4.add(Activation('relu'))
model_4.add(Dropout(0.5))
model_4.add(Flatten())
model_4.add(Dense(100))
model_4.add(Activation('relu'))
model_4.add(Dense(50))
model_4.add(Activation('relu'))
model_4.add(Dense(10))
model_4.add(Activation('relu'))
model_4.add(Dense(1))
model_4.add(Activation('tanh'))

layer_info(model_4)

Layer  1 convolution2d_61 input shape (None, 80, 160, 3) output shape (None, 38, 78, 24)
Layer  2 activation_135   input shape (None, 38, 78, 24) output shape (None, 38, 78, 24)
Layer  3 dropout_24       input shape (None, 38, 78, 24) output shape (None, 38, 78, 24)
Layer  4 convolution2d_62 input shape (None, 38, 78, 24) output shape (None, 17, 37, 36)
Layer  5 activation_136   input shape (None, 17, 37, 36) output shape (None, 17, 37, 36)
Layer  6 dropout_25       input shape (None, 17, 37, 36) output shape (None, 17, 37, 36)
Layer  7 convolution2d_63 input shape (None, 17, 37, 36) output shape (None, 7, 17, 48)
Layer  8 activation_137   input shape (None, 7, 17, 48) output shape (None, 7, 17, 48)
Layer  9 dropout_26       input shape (None, 7, 17, 48) output shape (None, 7, 17, 48)
Layer 10 convolution2d_64 input shape (None, 7, 17, 48) output shape (None, 5, 15, 64)
Layer 11 activation_138   input shape (None, 5, 15, 64) output shape (None, 5, 15, 64)
Layer 12 dropout_27       inpu

In [None]:
train(model_4)
sample_predictions(model_4)

Train on 6428 samples, validate on 1608 samples
Epoch 1/10
173s - loss: 0.0147 - val_loss: 0.0130
Epoch 2/10
167s - loss: 0.0113 - val_loss: 0.0112
Epoch 3/10
