# Training a better model

In [2]:
from theano.sandbox import cuda

In [3]:
%matplotlib inline
import utils; reload(utils)
from utils import *
from __future__ import division, print_function

Using Theano backend.


In [6]:
path = "data/redux/sample/"
#path = "data/redux/"
model_path = path + 'models/'
if not os.path.exists(model_path): os.mkdir(model_path)

#batch_size=64
batch_size=4

## Are we underfitting?

So far, our validation accuracy > training accuracy. That leads to two obvious questions:

1. How is this possible?
2. Is this desirable?

Answer(1): it happens because of *dropout*. 
- Dropout = a layer that randomly deletes (i.e. sets to zero) each activation in the previous layer with probability *p* (generally 0.5). 
- This only happens during training, not when calculating the accuracy on the validation set. That's why validation accuracy > training accuracy.

The purpose of dropout is to avoid overfitting.
- Doing Dropout during training ensures that no one part of the neural network can overfit to one part of the training set. 
- Dropout allows us to create rich models without overfitting. 
- If overused, it can result in underfitting. --> Be careful with our model.

Answer(2): this is probably not desirable. 
- validation accuracy is higher than training accuracy = a strong sign of underfitting.
- If this happens, it's likely that we can get better validation set results with less (or no) dropout. 

Let's try removing dropout entirely, and see what happens!
(The VGG model has Dropout because the VGG authors found it necessary for the imagenet competition. But it may not be necessary for dogs v cats,.)

## Removing Dropout

High-level approach:
- Start with our fine-tuned cats vs dogs model (with dropout).
- Remove dropout from the dense layers.
- Fine-tune all the dense layers.

Steps:
1. Re-create and load our modified VGG model with binary dependent (i.e. dogs v cats)
2. Split the model between the convolutional (*conv*) layers and the dense layers
3. Pre-calculate the output of the conv layers (*OCVL*) (to avoid re-calculating them on every epoch)
4. Create a new model with just the dense layers, and dropout p set to zero
5. Train this new model using *OCVL* as training data.

### (1) Load finetuned VGG model with binary dependent
Start with a working model. Load VGG 16 model and change it to predict our binary dependent (dogs v cats):

In [4]:
model = vgg_ft(2)

Load our fine-tuned weights:

In [9]:
model.load_weights(model_path+'finetune3.h5')

### (2) Split the model between the convolutional (*conv*) layers and the dense layers

It is best for us to pre-calculate the input to the fully connected layers, i.e. the Flatten() layer.

#### (2.1) Find the index of the last conv layer

In [10]:
layers = model.layers

In [11]:
last_conv_idx = [index for index,layer in enumerate(layers)
                    if type(layer) is Convolution2D][-1]

In [12]:
last_conv_idx

30

In [13]:
layers[last_conv_idx]

<keras.layers.convolutional.Convolution2D at 0x7f1b608c3e50>

#### (2.2) Create a new model that contains just the layers up to and including this layer

In [14]:
conv_layers = layers[:last_conv_idx+1]
conv_model = Sequential(conv_layers)
# Dense layers - aka. fully connected (FC) layers
fc_layers = layers[last_conv_idx+1:]

### (3) Pre-calculate the output of the model with only the conv layers: conv_model
Use the same approach to creating features as we used when we created the linear model from the imagenet predictions in the last lesson - it's only the model that has changed.
- There's a small number of "recipes" that can get us a long way!

In [15]:
batches = get_batches(path+'train', shuffle=False, batch_size=batch_size)
val_batches = get_batches(path+'valid', shuffle=False, batch_size=batch_size)

trn_classes = batches.classes
val_classes = val_batches.classes
trn_labels = onehot(trn_classes)
val_labels = onehot(val_classes)

Found 16 images belonging to 2 classes.
Found 8 images belonging to 2 classes.


In [18]:
val_features = conv_model.predict_generator(val_batches, val_batches.nb_sample)

In [19]:
trn_features = conv_model.predict_generator(batches, batches.nb_sample)

In [21]:
save_array(model_path + 'train_convlayer_features.bc', trn_features)
save_array(model_path + 'valid_convlayer_features.bc', val_features)

In [22]:
trn_features = load_array(model_path+'train_convlayer_features.bc')
val_features = load_array(model_path+'valid_convlayer_features.bc')

In [23]:
trn_features.shape

(16, 512, 14, 14)

### (4) Create a new model with just the dense layers, and dropout p set to zero
For our new fully connected model, we'll create it using the exact same architecture as the last layers of VGG16, so that we can copy the pre-trained weights over from that model.
- Set the dropout layer's p values to zero, so as to remove dropout.

In [24]:
# Half the weights because we'll remove dropout.
def proc_wgts(layer):
    return [o/2 for o in layer.get_weights()]

In [25]:
# A finely tuned model needs to be updated very slowly.
opt = RMSprop(lr=0.00001, rho=0.7)

In [27]:
# Copy the weights from the pre-trained model.
def get_fc_model():
    #based on: vgg16.py :: create, FCBlock
    model = Sequential([
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dense(4096, activation='relu'),
        Dropout(0.),
        Dense(4096, activation='relu'),
        Dropout(0.),
        Dense(2, activation='softmax')
        ])
    
    for l1,l2 in zip(model.layers, fc_layers):
        l1.set_weights(proc_wgts(l2))
    
    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [28]:
fc_model = get_fc_model()

### (5) Train this model using the output of conv layers as training data.
Fit the model:

In [29]:
fc_model.fit(trn_features, trn_labels, nb_epoch=8,
            batch_size=batch_size, validation_data=(val_features, val_labels))

Train on 16 samples, validate on 8 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8


<keras.callbacks.History at 0x7f1b4c1a6210>

In [30]:
fc_model.save_weights(model_path+'no_dropout.h5')

In [31]:
fc_model.load_weights(model_path+'no_dropout.h5')