# State Farm Distracted Driver Detection: Take 3
_Can computer vision spot distracted drivers?_

---

## Lesson 3 Homework Assignment

Dataset: https://www.kaggle.com/c/state-farm-distracted-driver-detection

### Dealing with Overfitting
In [`lesson2-hmwk.ipynb`](https://github.com/iconix/fast.ai/blob/master/nbs/lesson2-hmwk.ipynb), my final results (after 5 epochs) were as follows:
> `loss: 0.6260 - acc: 0.7907 - val_loss: 1.6719 - val_acc: 0.4978`

When `val_acc >> acc`, this is a clear indicator of overfitting on the training data.

On the bright side, this means that my neural net architecture is complex enough to model the data. The next step is to generalize my architecture a bit more.

Here is the prioritized list of approaches to reducing overfitting provided during class:
1. Add more data
2. Use data augmentation
3. Use architectures that generalize well
4. Add regularization (dropout, L2/L1 regularization)
5. Reduce architecture complexity

### Downloading and creating the datasets

This week, I am skipping the download, split, and create of training, validation, test, and sample datasets, relying instead on the data split from last week. See [`lesson2-hmwk.ipynb`](https://github.com/iconix/fast.ai/blob/master/nbs/lesson2-hmwk.ipynb) for those steps, if needed.

Additionally, I will be starting with the weights from last week below (`finetune2.h5`).

### Basic Configuration

In [1]:
import os

current_dir = os.getcwd()
LESSON_HOME_DIR = current_dir
DATA_HOME_DIR = current_dir + '/data/statefarm/'

# point to your training images
train_dir = DATA_HOME_DIR + 'train'

# point to the 'driver_imgs_list.csv'
lookup = DATA_HOME_DIR + 'driver_imgs_list.csv'

# point to the validation directory, which will be created in the next block
val_dir = DATA_HOME_DIR + 'valid'

sample_dir = DATA_HOME_DIR + 'sample'

test_dir = DATA_HOME_DIR + 'test'

#path = DATA_HOME_DIR + 'sample/'
path = DATA_HOME_DIR
model_path = path + 'models/'
if not os.path.exists(model_path): os.mkdir(model_path)

In [2]:
%matplotlib inline

In [3]:
import utils; reload(utils)
from utils import *

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)
Using Theano backend.


In [4]:
batch_size=64

### Load pre-trained model layers

In [5]:
def get_week2_model():
    model = vgg_ft(10)
    finetune2_path = model_path+'finetune2.h5'
    model.load_weights(finetune2_path);
    return model

In [None]:
model = get_week2_model()

In [None]:
model.summary()

In [None]:
layers = model.layers
last_conv_idx = [index for index,layer in enumerate(layers) 
                    if type(layer) is Convolution2D][-1]

conv_layers = layers[:last_conv_idx+1]
# Dense layers - also known as fully connected or 'FC' layers
fc_layers = layers[last_conv_idx+1:]

### Data Augmentation

Define a generator that includes data augmentation (convenient feature of Keras).

In [6]:
gen = image.ImageDataGenerator(rotation_range=10, width_shift_range=0.1, 
       height_shift_range=0.1, shear_range=0.15, zoom_range=0.1, 
       channel_shift_range=10., horizontal_flip=True)

In [7]:
trn_batches = get_batches(path+'train', gen, batch_size=batch_size)
# NB: We don't want to augment or shuffle the validation set
val_batches = get_batches(path+'valid', shuffle=False, batch_size=batch_size)

Found 18298 images belonging to 10 classes.
Found 4126 images belonging to 10 classes.


Jeremy's [explanation](http://forums.fast.ai/t/lesson-3-discussion/186/33) as to why we aren't training the convolutional layers here:
> The early layers are so general (e.g. remember Zeiler's visualizations - layer 1 just finds edges and gradients) that it's extremely unlikely that you'll need to change them, unless you're looking at very different kinds of images. e.g. if you're classifying line drawings, instead of photos, you'll probably need to retrain many conv layers too.

In [None]:
for layer in layers[:last_conv_idx+1]: layer.trainable=False

In [None]:
# Updating slowly because it is finely tuned
K.set_value(model.optimizer.lr, 0.00001)

In [None]:
model.fit_generator(trn_batches, samples_per_epoch=trn_batches.nb_sample, nb_epoch=8, 
                        validation_data=val_batches, nb_val_samples=val_batches.nb_sample)

After 8 epochs, we are overfitting much much less, which is great!
> `loss: 1.9278 - acc: 0.5002 - val_loss: 1.7463 - val_acc: 0.5099`

In fact, now we're underfitting very slightly. This seems like a good time to try decreasing dropout a smidge, just to see what happens.

Let's save our intermediate weights first, just in case.

In [None]:
finetune_hw3_1_path = model_path+'finetune_hw3_1.h5'
if not os.path.exists(finetune_hw3_1_path):
    model.save_weights(finetune_hw3_1_path)
model.load_weights(finetune_hw3_1_path)

#### Intermediate Kaggle submission

Interestingly, despite the significant decrease in overfitting, `val_acc` only improved a small amount from `0.4978`... I'd like to try submitting these results to Kaggle, just to see how this result reflects in the rankings.

In [None]:
from IPython.display import FileLink

subm_name = 'subm_hmwk3_1.gz'
subm_path = path + 'results/' + subm_name

#Get the classes from the validation batch
val_preprocess = get_batches(path+'valid', shuffle=False, batch_size=1)
classes = sorted(val_preprocess.class_indices, key=val_preprocess.class_indices.get)

Note on `.predict` vs `.predict_generator`: "the precomputed data will be large. Then it is likely that you will encounter errors such as OOM or **kernel death** during training. In this case, you might want to use model.fit_generator() instead of model.fit()" ([source](http://forums.fast.ai/t/fine-tuning-vgg-taking-very-long/3825/11)) - I kept running into kernel death with `.predict`...

In [None]:
test_batches = get_batches(path+'test', shuffle=False, batch_size = batch_size)
preds = model.predict_generator(test_batches, test_batches.nb_sample)

In [70]:
# (number of classes - 1) = 9 (http://forums.fast.ai/t/moving-up-the-ncfm-leaderboard-by-100-positions-do-clip/1035/6)
def do_clip(arr, mx): return np.clip(arr, (1-mx)/9, mx)

In [None]:
subm = do_clip(preds, 0.9) # still unsure how to set mx

In [None]:
submission = pd.DataFrame(subm, columns=classes)
submission.insert(0, 'img', [a[8:] for a in test_batches.filenames])
submission.head()

In [None]:
submission.to_csv(subm_path, index=False, compression='gzip')
FileLink(subm_path)

Private Score: `1.41527` ... comparable to my week 2 results.

And interestingly, weeks ago, my Public Score tended to be the lower/better score, or at least comparable; now, it is `1.48239` in comparison. See [this Quora post](https://www.quora.com/What-is-the-difference-between-public-and-private-leaderboard-in-Kaggle/answer/Giuliano-Janson) for an explanation:
> The public LB is computed on a portion of the test set, the private is computed on the remainder of the test set (not the whole test set).
"Fitting the LB" is a Kaggle term used to describe when you're tuning your models to perform well on the public LB. There is an art and a science in doing so and experience Kagglers are able to make the most out of it without overfitting. If not done well, that usually lends itself to worse scores on the private LB, sometimes disasters. In general the key is to build a model that generalizes well.

### Data Augmentation + Decrease Dropout
We're going to experiment with decreasing the dropout rate from Vgg16's 50% to 25%.

In [8]:
# will start again with conv_layers from last week's model
model = get_week2_model()

In [9]:
def get_fc_model(fcl, conv_input_shape):
    m = Sequential([
        MaxPooling2D(input_shape=conv_input_shape),
        Flatten(),
        Dense(4096, activation='relu'),
        Dropout(0.25),
        Dense(4096, activation='relu'),
        Dropout(0.25),
        Dense(10, activation='softmax')
        ])

    for l1,l2 in zip(m.layers, fcl): l1.set_weights(l2.get_weights())

    m.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return m

In [10]:
layers = model.layers
last_conv_idx = [index for index,layer in enumerate(layers) 
                    if type(layer) is Convolution2D][-1]

conv_layers = layers[:last_conv_idx+1]
fc_layers = layers[last_conv_idx+1:]

In [11]:
conv_model = Sequential(conv_layers)
fc_model = get_fc_model(fc_layers, conv_layers[-1].output_shape[1:])

Add the new `fc_model` with less dropout to the original `conv_model` from last week...

In [12]:
for layer in conv_model.layers: layer.trainable=False
conv_model.add(fc_model)

Then compile the new combined model and fit!

In [13]:
conv_model.compile(optimizer=Adam(0.00001), loss='categorical_crossentropy', metrics=['accuracy'])

In [14]:
conv_model.fit_generator(trn_batches, samples_per_epoch=trn_batches.nb_sample, nb_epoch=8, 
                        validation_data=val_batches, nb_val_samples=val_batches.nb_sample)

Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8


<keras.callbacks.History at 0x7f947ea4e650>

In [15]:
conv_model.optimizer.lr = 0.001 # back to default learning rate
conv_model.fit_generator(trn_batches, samples_per_epoch=trn_batches.nb_sample, nb_epoch=1, 
                        validation_data=val_batches, nb_val_samples=val_batches.nb_sample)

Epoch 1/1


<keras.callbacks.History at 0x7f947260f710>

Drumroll...

    `loss: 0.0554 - acc: 0.9819 - val_loss: 1.3206 - val_acc: 0.7038`

Wow, back to massive overfitting. But I was expecting this since I reduced dropout in a model that was already pretty well balanced in fit thanks to data augmentation in the last section.

Fortunately as I was also hoping, there has been a _huge_ increase in the upper bounds of both `acc` and `val_acc` - dropout was stunting the model's learning ability! So I'm hoping this means that my next final attempt at reducing overfitting will work again, but with much higher accuracies than the `~0.5` I was seeing earlier.

Let's save weights again.

In [None]:
finetune_hw3_2_path = model_path+'finetune_hw3_2.h5'
if not os.path.exists(finetune_hw3_2_path):
    conv_model.save_weights(finetune_hw3_2_path)

### Add Batch Normalization

In [16]:
# start with the model just trained above
model = conv_model

In [34]:
# verify that we have the correct model loaded
val_data = load_array(model_path+'valid_data.bc')
model.evaluate(val_data, val_labels)



[1.6006800007369411, 0.62990790118711182]

In [None]:
def get_bn_layers(p):
    return [
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dense(4096, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(4096, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(1000, activation='softmax')
        ]

In [None]:
def load_fc_weights_from_vgg16bn(m):
    "Load weights for model from the dense layers of the Vgg16BN model."
    # See imagenet_batchnorm.ipynb for info on how the weights for
    # Vgg16BN can be generated from the standard Vgg16 weights.
    from vgg16bn import Vgg16BN
    vgg16_bn = Vgg16BN()
    _, fcl = split_at(vgg16_bn.model, Convolution2D)
    copy_weights(fcl, m.layers)

In [35]:
bn_fc_model = Sequential(get_bn_layers(0.25))

In [36]:
load_fc_weights_from_vgg16bn(bn_fc_model)

In [37]:
bn_fc_model.pop()
for layer in bn_fc_model.layers: layer.trainable=False
bn_fc_model.add(Dense(10,activation='softmax'))

In [38]:
bn_fc_model.compile(Adam(), 'categorical_crossentropy', metrics=['accuracy'])

In [40]:
layers = model.layers
last_conv_idx = [index for index,layer in enumerate(layers) 
                     if type(layer) is Convolution2D][-1]
conv_layers = layers[:last_conv_idx+1]
conv_model = Sequential(conv_layers)

In [50]:
bn_fc_layers = get_bn_layers(0.25)

In [51]:
bn_fc_layers.pop()
bn_fc_layers.append(Dense(10,activation='softmax'))

In [52]:
final_model = Sequential(conv_layers)
for layer in final_model.layers: layer.trainable = False
for layer in bn_fc_layers: final_model.add(layer)

In [56]:
for l1,l2 in zip(bn_fc_model.layers, bn_fc_layers):
    l2.set_weights(l1.get_weights())

In [60]:
final_model.compile(optimizer=Adam(0.00001), loss='categorical_crossentropy', metrics=['accuracy'])

In [61]:
final_model.fit_generator(trn_batches, samples_per_epoch=trn_batches.nb_sample, nb_epoch=1, 
                        validation_data=val_batches, nb_val_samples=val_batches.nb_sample)

Epoch 1/1


<keras.callbacks.History at 0x7f9434a28390>

In [62]:
final_model.fit_generator(trn_batches, samples_per_epoch=trn_batches.nb_sample, nb_epoch=3, 
                        validation_data=val_batches, nb_val_samples=val_batches.nb_sample)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f9452f5d150>

In [64]:
final_model.fit_generator(trn_batches, samples_per_epoch=trn_batches.nb_sample, nb_epoch=2, 
                        validation_data=val_batches, nb_val_samples=val_batches.nb_sample)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f94343e9cd0>

In [65]:
final_model.optimizer.lr = 0.001
final_model.fit_generator(trn_batches, samples_per_epoch=trn_batches.nb_sample, nb_epoch=2, 
                        validation_data=val_batches, nb_val_samples=val_batches.nb_sample)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f94343e9bd0>

In [66]:
final_model.fit_generator(trn_batches, samples_per_epoch=trn_batches.nb_sample, nb_epoch=1, 
                        validation_data=val_batches, nb_val_samples=val_batches.nb_sample)

Epoch 1/1


<keras.callbacks.History at 0x7f94343e9ad0>

I was feeling disappointed after Epoch 1 - it looked like the accuracy upper bound had regressed from the awesome results after reducing dropout in the last section.
    
    `loss: 7.9419 - acc: 0.3832 - val_loss: 9.5732 - val_acc: 0.3303`
Thankfully after Epoch 9, things are looking much better.
    
    `loss: 0.4997 - acc: 0.8909 - val_loss: 2.2943 - val_acc: 0.6837`
Although we're still overfitting a ton (but not as much as before: the `acc - val_acc` delta dropped from `~0.36` to `~0.21` after 9 epochs).

I bet I could increase dropout again (somewhere between the current `0.25` and the original `0.5`) to reduce overfitting more.

In [67]:
finetune_hw3_3_path = model_path+'finetune_hw3_3.h5'
if not os.path.exists(finetune_hw3_3_path):
    final_model.save_weights(finetune_hw3_3_path)

## Submissions to Kaggle

I'm really interested to compare Kaggle performance between Data Augmentation + Decrease Dropout results and Add Batch Normalization results. The former had better accuracy results overall with more overfitting; the latter had less overfitting with a hit to the accuracy upper bound (although `val_acc` improved, which is likely what really matters).

My guess is that Add Batch Normalization results will be the better of the two (which is not that brave of a guess, considering the contents of this week's lecture).

In [71]:
subm_model_2 = model
subm_model_3 = final_model

In [72]:
test_batches = get_batches(path+'test', shuffle=False, batch_size = batch_size)
preds_2 = subm_model_2.predict_generator(test_batches, test_batches.nb_sample)

Found 79726 images belonging to 1 classes.


In [73]:
preds_3 = subm_model_3.predict_generator(test_batches, test_batches.nb_sample)

In [74]:
from IPython.display import FileLink

subm_name_2 = 'subm_hmwk3_2.gz'
subm_path_2 = path + 'results/' + subm_name_2

subm_name_3 = 'subm_hmwk3_3.gz'
subm_path_3 = path + 'results/' + subm_name_3

In [89]:
subm_2 = do_clip(preds_2, 0.9) # still unsure how to set mx
subm_3 = do_clip(preds_3, 0.9)

In [90]:
val_preprocess = get_batches(path+'valid', shuffle=False, batch_size=1)
classes = sorted(val_preprocess.class_indices, key=val_preprocess.class_indices.get)

submission2 = pd.DataFrame(subm_2, columns=classes)
submission2.insert(0, 'img', [a[8:] for a in test_batches.filenames])
submission2.head()

submission3 = pd.DataFrame(subm_3, columns=classes)
submission3.insert(0, 'img', [a[8:] for a in test_batches.filenames])
submission3.head()

Found 4126 images belonging to 10 classes.


Unnamed: 0,img,c0,c1,c2,c3,c4,c5,c6,c7,c8,c9
0,img_81601.jpg,0.011111,0.011111,0.011111,0.011111,0.011111,0.011111,0.9,0.011111,0.011111,0.011111
1,img_14887.jpg,0.011111,0.011111,0.011111,0.011111,0.011111,0.011111,0.011111,0.893773,0.104057,0.011111
2,img_62885.jpg,0.011111,0.011111,0.011111,0.011111,0.011111,0.011111,0.011111,0.011111,0.9,0.011111
3,img_45125.jpg,0.011111,0.011111,0.011111,0.011111,0.011111,0.011111,0.011111,0.011111,0.011111,0.9
4,img_22633.jpg,0.011111,0.9,0.011111,0.011111,0.011111,0.011111,0.011111,0.011111,0.011111,0.011111


In [79]:
submission2.to_csv(subm_path_2, index=False, compression='gzip')
FileLink(subm_path_2)

In [91]:
submission3.to_csv(subm_path_3, index=False, compression='gzip')
FileLink(subm_path_3)

I'm not going to submit to Kaggle with Jupyter anymore because it caused `OSError: [Errno 12] Cannot allocate memory` last time. But this is the command:

    `kg submit $subm_path -c state-farm-distracted-driver-detection`

So! My guess appears to be wrong... but I'm pretty sure I must have a bug somewhere.

`subm_hmwk3_2.gz: Private Score: 1.07994, Public Score: 1.34504`

^ that's awesome! best score yet (558/1440= top 38.75%)

`subm_hmwk3_3.gz: Private Score: 4.00094, Public Score: 3.98079`

^ that is one of my worst scores to date.

I find it hard to believe that the moral here is, Batch Normalization makes things terrible. Let's evaluate the models again to make sure I didn't accidently overwrite something and submit the wrong model for `subm_hmwk3_3.gz`:

In [81]:
subm_model_2.evaluate(val_data, val_labels)



[1.6006800007369411, 0.62990790118711182]

In [82]:
subm_model_3.evaluate(val_data, val_labels)



[2.2942806485204286, 0.6837130393643317]

Interesting... `subm_hmwk3_3.gz` gave me my highest validation accuracy so far, but simultaneously a pretty terrible `categorical_crossentropy` loss score. How can that be?

> Let's say you have a label [0 0 1] and your first prediction is [0 0.5 0.1]. Low accuracy and low loss. Then later you predict [0.3 0.3 0.4]. Higher loss and higher accuracy. Of course it depends on how you calculate loss.. - [forums](http://forums.fast.ai/t/kaggle-intel-mobileodt-cervical-cancer-screening-competition/2355/28)

This makes sense... meanwhile, my training loss is so low! Perhaps even though it doesn't look like my accuracy is overfitting as badly, my loss is telling another story. If I want to improve this, I'd like to mess with dropout some more, I think.

When I stopped training `final_model` last, my loss was still trending down. Let's run a few more epochs just to see if that continues.

In [92]:
final_model.fit_generator(trn_batches, samples_per_epoch=trn_batches.nb_sample, nb_epoch=2, 
                        validation_data=val_batches, nb_val_samples=val_batches.nb_sample)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f944a0b2d90>

In [93]:
final_model.optimizer.lr = 0.01
final_model.fit_generator(trn_batches, samples_per_epoch=trn_batches.nb_sample, nb_epoch=2, 
                        validation_data=val_batches, nb_val_samples=val_batches.nb_sample)

Epoch 1/2
Epoch 2/2
  640/18298 [>.............................] - ETA: 443s - loss: 0.2279 - acc: 0.9453

KeyboardInterrupt: 