# Statefarm Data - Phase 4B - All-Convolutional Models

Vgg16 performed better than InceptionV3 and Resnet50 in Phase4 experiments.  It was obvious that Vgg became over-fitted quite quickly.  It's no wonder when trying to train over 3 million parameters in the dense layers based on only 50 subjects (in turn providing approx. 22000 training images). Using dropout to control over-fitting is not an efficient way of creating a stable model either. Overfitting results when there is not enough data for the quantity of parameters requiring training.  (Though with infinitely flexible non-linear models, over-fitting will eventually happen with too much training unless is done to disrupt that process). Most of the parameters from the Vgg16 neural network are contributed by the dense fully connected layers, and comparatively few from the convolutional layers.  All convolutional model architectures are a way to reduce the number of parameters in a model and help eliminate an overfitting problem when not much training data is available.  

#### In this notebook, my objective is comparing the performance of various all convolutional models based on Vgg19. 

In [1]:
import theano
from theano.sandbox import cuda
cuda.use('gpu0')

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)


In [2]:
%matplotlib inline
IMPORT_DIR = '/home/ubuntu/nbs'
%cd $IMPORT_DIR

/home/ubuntu/nbs


In [3]:
from __future__ import division,print_function

import os, json
from glob import glob
import numpy as np
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt
import daveutils
from daveutils import *
import davenet
from davenet import *
import my_cv_modeler
from my_cv_modeler import *

Using Theano backend.


In [4]:
ALL_DATA_DIR = '/home/ubuntu/'
DATA_HOME_DIR = ALL_DATA_DIR+'statefarm1/'
TRAIN_DIR = DATA_HOME_DIR+'train/'
VALID_DIR = DATA_HOME_DIR+'valid/'
SAMPLE_DIR = DATA_HOME_DIR+'sample/'
MODELS_DIR = DATA_HOME_DIR+'models/'
RESULTS_DIR = DATA_HOME_DIR+'results/'
TEST_DIR = DATA_HOME_DIR+'test/'

# 1. Prepare Data

#### Identify and remove poor quality training data

Previously Identified Data that is badly classified or multi-class:

In [5]:
bad_img_nums=np.array(['16927','101091','31121','27454','49471','47068','18737','14223','68147','68040','54867',
                  '38427', '8131', '62871', '99733', '92769','75819', '79819'])
#n.b. some of these image numbers at in the validation folder

In [6]:
%cd $DATA_HOME_DIR

/home/ubuntu/statefarm1


Move bad images from /train to /bad folder

In [7]:
from shutil import move
from shutil import copytree #(src, dst, symlinks=False, ignore=None)
%cd $DATA_HOME_DIR
def move_bad_to_bad_folder(from_dir, bad_filenames, bad_dir = 'bad_train'):  #bad_dir must not already exist
    count = 0
    copytree(from_dir, bad_dir)
    g = glob(from_dir+'/c?/*.jpg')
    for filename in g:
        if filename[len(from_dir)+8:][:-4] in bad_filenames:
            print(filename[len(from_dir)+1:])
            move(filename, bad_dir+'/'+filename[len(from_dir)+1:])
            count+=1
    print(count,"items successfully moved from /",from_dir,"folder to: ../",bad_dir)

/home/ubuntu/statefarm1


In [8]:
move_bad_to_bad_folder('train', bad_img_nums, 'bad_train')

c7/img_68040.jpg
c7/img_75819.jpg
c8/img_18737.jpg
c8/img_49471.jpg
c8/img_68147.jpg
c0/img_14223.jpg
c0/img_47068.jpg
c0/img_79819.jpg
c0/img_31121.jpg
c0/img_101091.jpg
c0/img_16927.jpg
c2/img_62871.jpg
c4/img_92769.jpg
c4/img_38427.jpg
c9/img_99733.jpg
c1/img_54867.jpg
c5/img_8131.jpg
c5/img_27454.jpg
18 items successfully moved from / train folder to: ../ bad_train


# 2. Create a Sequential Vgg Model 

### 1. Add fc_bn layers, and train only the final layer

Import the fully trained Vgg16bn network from Imagenet

In [18]:
from keras.applications.vgg19 import VGG19
from keras.applications.vgg19 import preprocess_input
from keras.models import Model

vgg19layers = VGG19(include_top=True, weights='imagenet')
#base_model = VGG19(weights='imagenet')
#model = Model(input=base_model.input, output=base_model.get_layer('block4_pool').output)

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_th_dim_ordering_th_kernels.h5


In [19]:
save_model(vgg19layers,1,'vgg19')

In [20]:
for i, layer in enumerate(vgg19layers.layers):
    print(i, layer)

0 <keras.engine.topology.InputLayer object at 0x7f263fc00c90>
1 <keras.layers.convolutional.Convolution2D object at 0x7f263fb9d050>
2 <keras.layers.convolutional.Convolution2D object at 0x7f263fbb1350>
3 <keras.layers.pooling.MaxPooling2D object at 0x7f263fb1b650>
4 <keras.layers.convolutional.Convolution2D object at 0x7f263fb1b7d0>
5 <keras.layers.convolutional.Convolution2D object at 0x7f263fb41a10>
6 <keras.layers.pooling.MaxPooling2D object at 0x7f263fb44d10>
7 <keras.layers.convolutional.Convolution2D object at 0x7f263fb44ed0>
8 <keras.layers.convolutional.Convolution2D object at 0x7f263fad3190>
9 <keras.layers.convolutional.Convolution2D object at 0x7f263fafa450>
10 <keras.layers.convolutional.Convolution2D object at 0x7f2643a0d4d0>
11 <keras.layers.pooling.MaxPooling2D object at 0x7f26439d3fd0>
12 <keras.layers.convolutional.Convolution2D object at 0x7f26439f01d0>
13 <keras.layers.convolutional.Convolution2D object at 0x7f2643488bd0>
14 <keras.layers.convolutional.Convolution2D 

In [21]:
vgg19layers.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_3 (InputLayer)             (None, 3, 224, 224)   0                                            
____________________________________________________________________________________________________
block1_conv1 (Convolution2D)     (None, 64, 224, 224)  1792        input_3[0][0]                    
____________________________________________________________________________________________________
block1_conv2 (Convolution2D)     (None, 64, 224, 224)  36928       block1_conv1[0][0]               
____________________________________________________________________________________________________
block1_pool (MaxPooling2D)       (None, 64, 112, 112)  0           block1_conv2[0][0]               
___________________________________________________________________________________________

Make it so that the convoluted layers are not trainable

# Freeze Conv Layers to FC1 and Add new Dense Layer

In [28]:
count_frozen = 0
for layer in vgg19layers.layers:
    layer.trainable = False
    if layer.trainable == False: count_frozen+=1
print(count_frozen,"layers are frozen") 

26 layers are frozen


Create a functional model

In [86]:
#model = Model(input=vgg19layers.input, output=vgg19layers.output)
model = Model(input=vgg19layers.input, output=vgg19layers.get_layer('fc1').output)

In [52]:
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_3 (InputLayer)             (None, 3, 224, 224)   0                                            
____________________________________________________________________________________________________
block1_conv1 (Convolution2D)     (None, 64, 224, 224)  0           input_3[0][0]                    
____________________________________________________________________________________________________
block1_conv2 (Convolution2D)     (None, 64, 224, 224)  0           block1_conv1[0][0]               
____________________________________________________________________________________________________
block1_pool (MaxPooling2D)       (None, 64, 112, 112)  0           block1_conv2[0][0]               
___________________________________________________________________________________________

In [91]:
x = model.get_layer('fc1')

# Baseline Model: Finetune a truncated Vgg19 model (1x4096 fc hidden layer)

In [93]:
predictions = Dense(10, activation='softmax')(x.output)

In [96]:
vgg19short = Model(input=vgg19layers.input,output=predictions)

In [97]:
vgg19short.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_3 (InputLayer)             (None, 3, 224, 224)   0                                            
____________________________________________________________________________________________________
block1_conv1 (Convolution2D)     (None, 64, 224, 224)  0           input_3[0][0]                    
____________________________________________________________________________________________________
block1_conv2 (Convolution2D)     (None, 64, 224, 224)  0           block1_conv1[0][0]               
____________________________________________________________________________________________________
block1_pool (MaxPooling2D)       (None, 64, 112, 112)  0           block1_conv2[0][0]               
___________________________________________________________________________________________

### Train the Baseline Model (1 hidden dense layer w 4096 filters)- including use of 14k pseudo label test cases

Use ImageGenerator because there are too many training images to store (resized) in an array.
1. Not using data augmentation at this stage.
2. Not using validation data for training at this stage.

n.b. Mixiterator was not used.  Only test data having a prediction probability >0.995 has been used.
This data is considered to be of such good quality that it can be mixed with real data. The pseudo training data will make up 43% of the training data at this stage (39% after validation data is added). Yes, it's a little high, but lets see how it goes.. 

Create the image generator (no augmentation)

In [98]:
TRAIN_DIR = ALL_DATA_DIR+'statefarm/train' # yes, this still includes the pseudo labelled data
VALID_DIR = ALL_DATA_DIR+'statefarm/valid' #nb Notice that I've gone back to the orginal directory here

In [100]:
gen = ImageDataGenerator()

In [101]:
generator = gen.flow_from_directory(
        TRAIN_DIR,
        target_size=(224, 224),
        batch_size=64,
        class_mode='categorical',
        shuffle=True)

Found 32821 images belonging to 10 classes.


In [102]:
val_generator = gen.flow_from_directory(
        VALID_DIR,
        target_size=(224, 224),
        batch_size=64,
        class_mode='categorical',
        shuffle=True)
val_generator.N

Found 3827 images belonging to 10 classes.


3827

In [104]:
vgg19short.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

In [105]:
vgg19short.fit_generator(
        generator,
        samples_per_epoch=generator.N,
        nb_epoch=2,
        validation_data=val_generator,
        nb_val_samples=1000)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f263ccb99d0>

In [106]:
vgg19short.optimizer.lr=0.0001
vgg19short.fit_generator(
        generator,
        samples_per_epoch=generator.N,
        nb_epoch=2,
        validation_data=val_generator,
        nb_val_samples=2000)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f2631f749d0>

### Improve the Baseline Vgg19 Model - 2 x hidden fully-connected layers (nf = 256 + 256)

Even though the validation data has completely different test subject, It still appears to be massively overfitting judging by the difference between training and validation accuracy as compared to previous modeling attemps where a validation accuracy of up to 0.84 was achieved.  So lets reduce the number of parameters in the hidden dense layer and add another hidden dense layer.

In [107]:
conv_model = Model(input=vgg19layers.input, output=vgg19layers.get_layer('block5_pool').output)
x = conv_model.get_layer('block5_pool').output

x = MaxPooling2D((2, 2))(x)
x = BatchNormalization()(x)
x = Flatten()(x)
x = Dense(256, activation='softmax')(x)
x = BatchNormalization()(x)
x = Dense(256, activation='softmax')(x)
x = BatchNormalization()(x)
outp = Dense(10, activation='softmax')(x)
vgg19shorter = Model(input=vgg19layers.input,output=outp)

In [112]:
vgg19shorter.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
#vgg19shorter.optimizer.lr=0.001
vgg19shorter.fit_generator(
        generator,
        samples_per_epoch=generator.N,
        nb_epoch=1,
        validation_data=val_generator,
        nb_val_samples=2000)

Epoch 1/1


<keras.callbacks.History at 0x7f263d3746d0>

In [115]:
conv_model = Model(input=vgg19layers.input, output=vgg19layers.get_layer('block5_pool').output)
x = conv_model.get_layer('block5_pool').output

x = MaxPooling2D((2, 2))(x)
x = BatchNormalization()(x)
x = Flatten()(x)
x = Dense(256, activation='softmax')(x)
x = BatchNormalization()(x)
outp = Dense(10, activation='softmax')(x)
vgg19shortest = Model(input=vgg19layers.input,output=outp)
vgg19shortest.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

In [116]:
vgg19shortest.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_3 (InputLayer)             (None, 3, 224, 224)   0                                            
____________________________________________________________________________________________________
block1_conv1 (Convolution2D)     (None, 64, 224, 224)  0           input_3[0][0]                    
____________________________________________________________________________________________________
block1_conv2 (Convolution2D)     (None, 64, 224, 224)  0           block1_conv1[0][0]               
____________________________________________________________________________________________________
block1_pool (MaxPooling2D)       (None, 64, 112, 112)  0           block1_conv2[0][0]               
___________________________________________________________________________________________

In [117]:
vgg19shortest.optimizer.lr=0.01
vgg19shortest.fit_generator(
        generator,
        samples_per_epoch=generator.N,
        nb_epoch=1,
        validation_data=val_generator,
        nb_val_samples=2000)

Epoch 1/1


<keras.callbacks.History at 0x7f2615489f50>

In [121]:
vgg19shortest.optimizer.lr=0.001
vgg19shortest.fit_generator(
        generator,
        samples_per_epoch=generator.N,
        nb_epoch=1,
        validation_data=val_generator,
        nb_val_samples=2000)

Epoch 1/1


<keras.callbacks.History at 0x7f261daac790>

Doesn't seem to converge when using bigger or smaller learning rates. Abandon this model here for now. Use vgg19short as the baseline for measuring the all convolutional models below.

# Vgg19 Fully Convolutional Model

In [130]:
from keras.layers.pooling import GlobalMaxPooling2D
conv_model = Model(input=vgg19layers.input, output=vgg19layers.get_layer('block5_pool').output)
x = conv_model.get_layer('block5_pool').output

nf = 512
x = BatchNormalization()(x)
x = Convolution2D(nf,3,3, border_mode='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
x = Convolution2D(nf//2,3,3, border_mode='same')(x)
x = BatchNormalization()(x)
x = Convolution2D(nf//4,3,3, border_mode='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
        
x = Convolution2D(10,3,3, border_mode='same')(x)
x = GlobalAveragePooling2D()(x)#x = GlobalMaxPooling2D()(x)#GlobalMaxPooling2D(dim_ordering='default')#
outp = Activation('softmax')(x)

th


In [131]:
vgg19allConv = Model(input=vgg19layers.input,output=outp)
vgg19allConv.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

vgg19allConv.fit_generator(
        generator,
        samples_per_epoch=generator.N,
        nb_epoch=2,
        validation_data=val_generator,
        nb_val_samples=1000
)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f2608cb5790>

In [132]:
save_model(vgg19allConv, 1, cross='vgg19allConv')

In [133]:
conv_model = Model(input=vgg19layers.input, output=vgg19layers.get_layer('block5_pool').output)
x = conv_model.get_layer('block5_pool').output

nf = 512
x = BatchNormalization()(x)
x = Convolution2D(nf,3,3, border_mode='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
        
x = Convolution2D(10,3,3, border_mode='same')(x)
x = GlobalAveragePooling2D()(x)#x = GlobalMaxPooling2D()(x)#GlobalMaxPooling2D(dim_ordering='default')#
outp = Activation('softmax')(x)

th


In [134]:
vgg19smallConv = Model(input=vgg19layers.input,output=outp)
vgg19smallConv.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

vgg19smallConv.fit_generator(
        generator,
        samples_per_epoch=generator.N,
        nb_epoch=1,
        validation_data=val_generator,
        nb_val_samples=1000
)

Epoch 1/1


<keras.callbacks.History at 0x7f261205d150>

In [135]:
vgg19smallConv.optimizer.lr=0.0001
vgg19smallConv.fit_generator(
        generator,
        samples_per_epoch=generator.N,
        nb_epoch=1,
        validation_data=val_generator,
        nb_val_samples=1000
)

Epoch 1/1


<keras.callbacks.History at 0x7f261205d910>

In [136]:
save_model(vgg19smallConv, 1, cross='vgg19smallConv')

Prepare for final modelling stage. Add the validation images back into the training images.

# Conclusions

Two all convolutional models were built and trained.  In addition to the frozen convolution layers of Vgg19, the custom models defined by the architecture of the custom 'top' are described by these hidden layers:
0. vgg19short(0.63) = Frozen( vgg19conv + flatten + Dense(4096) ) + Dense(10)
1. vgg19allConv(0.75) = Frozen( vgg19conv ) + Conv2D(512,3,3) + Conv2D(256,3,3) + Conv2D(127,3,3) + GAP
2. vgg19smallConv(0.67) = Frozen( vgg19conv ) + Conv2D(512,3,3) + GAP(i.e. GlobAveragePooling2D())

Additionally these models all contained MaxPooling and BatchNormalization layers in their custom tops.

In the list above, the best achieved validation dataset classification accuracy is shown in parenthesis after the model name.  All of these models were trained with the assistance of approx 14k high quality (>0.995 prob) pseudo labelled test images, plus the original training data less approx 3600 validation cases.  All of the models quickly showed evidence of overfitting.   In the case of the all-convolutional models (particularly vgg19smallConv) the overfitting can hardly be blamed on too many parameters for the amount of training data. See other notebooks in this series to see what could be achieved with a similar number of parameters. 

In retrospect, further model training using data augmentation could have done to prevent overfitting. However, it was noted the all-convolutional models (Vgg16 based) trained by Jeremy Howard to classify the statefarm distracted driver data set also resulted in slighly lower classification accuracy than a lightweight model with some fixed layers.   Hence, I shelved all convolutional modeling at this point. 

Recommended further work for All convolutional model:
1. Additional training of the models described in this notebook using augmentation of the training and pseudo imgs.
2. Optimization of the number of filters and architecture of these customised tops.
3. Unfreezing and annealing of the upper Vgg19 original convolutional layers with a very small learning rate.
3. Repeat work using the Vgg16 network as a basis for the transfer learning.  
4. Use of GlobalMaximumPooling2D instead of GlobalAveragePooling 

The reason for the final recommendation is that most of the classes can be determined by just one instance of a key features such as a hand and what that hand is holding/touching. Other important features involve much larger areas such as the torso and face direction.  Use of GlobalAveragePooling may dilute the importance of the hand as a predictor.

Naturally, if an all convolutional model is found to be competitive, final training should be performed by returning the validation images to the training dataset.

Next step: include the validation images in the model.  See ..phase5.  Use data in statefarm1 because the validation data was not split off.  Use Vgg model because it is better than ResNet50 so far.  