# Statefarm Data - Phase7C - Remove Dropout & Fine Tune

Comparing various models after removal of marginal quality data and using 14000 cases of pseudo labeled data

In [1]:
import theano
from theano.sandbox import cuda
cuda.use('gpu0')

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)


In [2]:
%matplotlib inline
IMPORT_DIR = '/home/ubuntu/nbs'
%cd $IMPORT_DIR

/home/ubuntu/nbs


In [3]:
from __future__ import division,print_function

import os, json
from glob import glob
import numpy as np
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt
import daveutils
from daveutils import *
import davenet
from davenet import *
import my_cv_modeler
from my_cv_modeler import *

Using Theano backend.


In [4]:
ALL_DATA_DIR = '/home/ubuntu/'
DATA_HOME_DIR = ALL_DATA_DIR+'statefarm1/'
TRAIN_DIR = DATA_HOME_DIR+'train/'
VALID_DIR = DATA_HOME_DIR+'valid/'
SAMPLE_DIR = DATA_HOME_DIR+'sample/'
MODELS_DIR = DATA_HOME_DIR+'models/'
RESULTS_DIR = DATA_HOME_DIR+'results/'
TEST_DIR = DATA_HOME_DIR+'test/'
CACHE_DIR = DATA_HOME_DIR+'cache/'

# 1. Prepare Data

#### Identify and remove poor quality training data

Previously Identified Data that is badly classified or multi-class:

In [5]:
%cd $DATA_HOME_DIR

/home/ubuntu/statefarm1


# 2. Reload our previous best Sequential Vgg16 Model 

In [6]:
#del vgg; 
vgg = Dave16()
model = vgg.model

In [7]:
#del model; 
last_conv_idx = [i for i,l in enumerate(model.layers) if type(l) is Convolution2D][-1]
conv_layers = model.layers[:last_conv_idx+1]
count_frozen = 0
for layer in conv_layers:
    layer.trainable = False
    if layer.trainable == False: 
        count_frozen+=1
    if count_frozen == 10: break  
print(count_frozen,"layers are frozen")  

10 layers are frozen


In [8]:
model = Sequential(conv_layers)

In [9]:
def add_bn_layers(p, model):
    new_model = model
    new_model.add(MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]))
    new_model.add(Flatten())
    new_model.add(Dropout(p/2))
    new_model.add(Dense(128, activation='relu'))
    #new_model.layers[len(new_model.layers)].set_weights(top.layers[3].get_weights())
    new_model.add(BatchNormalization())
    new_model.add(Dropout(p/2))
    new_model.add(Dense(128, activation='relu'))
    #new_model.layers[len(new_model.layers)].set_weights(top.layers[6].get_weights())
    new_model.add(BatchNormalization())
    new_model.add(Dropout(p))
    new_model.add(Dense(10, activation='softmax'))  
    #new_model.layers[len(new_model.layers)].set_weights(top.layers[9].get_weights())
    return new_model

In [10]:
model = add_bn_layers(0, model)

In [11]:
DATA_HOME_DIR = ALL_DATA_DIR+'statefarm/'
CACHE_DIR = DATA_HOME_DIR+'cache/'

In [12]:
#model = read_model(4, cross='vgg16final') # Error - vgg_mean is not global
model.load_weights(os.path.join(CACHE_DIR, 'model_weights4vgg16final.h5'))

In [13]:
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
lambda_1 (Lambda)                (None, 3, 224, 224)   0           lambda_input_1[0][0]             
____________________________________________________________________________________________________
zeropadding2d_1 (ZeroPadding2D)  (None, 3, 226, 226)   0           lambda_1[0][0]                   
                                                                   lambda_1[0][0]                   
____________________________________________________________________________________________________
convolution2d_1 (Convolution2D)  (None, 64, 224, 224)  0           zeropadding2d_1[0][0]            
                                                                   zeropadding2d_1[1][0]            
___________________________________________________________________________________________

maxpooling2d_6 (MaxPooling2D)    (None, 512, 7, 7)     0           convolution2d_13[1][0]           
____________________________________________________________________________________________________
flatten_2 (Flatten)              (None, 25088)         0           maxpooling2d_6[0][0]             
____________________________________________________________________________________________________
dropout_3 (Dropout)              (None, 25088)         0           flatten_2[0][0]                  
____________________________________________________________________________________________________
dense_4 (Dense)                  (None, 128)           3211392     dropout_3[0][0]                  
____________________________________________________________________________________________________
batchnormalization_3 (BatchNormal(None, 128)           256         dense_4[0][0]                    
___________________________________________________________________________________________

In [14]:
for i, layer in enumerate(model.layers):
    #if "batch" in layer.name:
        #new_model.add(BatchNormalization())
    #else:
        #new_model.add(layer)
    if "dropout" in layer.name:
        layer.set_weights([o*0.5 for o in layer.get_weights()])
        print("layer#",i, layer.name, "weights halved")  


layer# 33 dropout_3 weights halved
layer# 36 dropout_4 weights halved
layer# 39 dropout_5 weights halved


In [15]:
model.compile(Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])

# 3. Train the Model - without dropout

n.b. Mixiterator was not used.  Only test data having a prediction probability >0.995 has been used.
This data is considered to be of such good quality that it can be mixed with real data. The pseudo training data will make up 43% of the training data at this stage (39% after validation data is added). Yes, it's a little high, but lets see how it goes.. 

In [16]:
%cd $ALL_DATA_DIR'statefarm/'

/home/ubuntu/statefarm


In [17]:
gen = ImageDataGenerator()
val_generator = gen.flow_from_directory(
        'valid',
        target_size=(224, 224),
        batch_size=64,
        class_mode='categorical',
        shuffle=True)
val_generator.N

Found 3827 images belonging to 10 classes.


3827

In [18]:
dgen = ImageDataGenerator(  rotation_range=3,
                            width_shift_range=0.1,
                            height_shift_range=0.05,
                         )
tgenerator = dgen.flow_from_directory(
        'train',
        target_size=(224, 224),
        batch_size=64,
        class_mode='categorical',
        shuffle=True
)

Found 36648 images belonging to 10 classes.


In [19]:
model.optimizer.lr=0.00001
model.fit_generator(
        tgenerator,
        samples_per_epoch=tgenerator.N,
        nb_epoch=3,
)

Epoch 1/3
Epoch 2/3
Epoch 3/3


KeyboardInterrupt: 

In [20]:
save_model(model, 7, cross='vgg16final_c')

# Submit Results

In [21]:
def do_clip(arr, mx): return np.clip(arr, (1-mx)/9, mx)

In [22]:
%cd $ALL_DATA_DIR'statefarm1/'

/home/ubuntu/statefarm1


In [23]:
test_generator = gen.flow_from_directory(
        'test',
        target_size=(224, 224),
        batch_size=1,
        class_mode='categorical',
        shuffle=False)
test_generator.N

Found 79726 images belonging to 1 classes.


79726

In [24]:
preds = model.predict_generator(test_generator, test_generator.N)
#generator, val_samples, max_q_size=10, nb_worker=1

In [25]:
subm = do_clip(preds,0.93)

In [26]:
subm_name = os.path.join(ALL_DATA_DIR,'statefarm1','results','submodel7c.gz')

In [27]:
batches = get_batches(os.path.join(ALL_DATA_DIR,'statefarm','train'), batch_size=64)

Found 36648 images belonging to 10 classes.


In [28]:
(val_classes, trn_classes, val_labels, trn_labels, 
    val_filenames, filenames, test_filenames) = get_classes(os.path.join(ALL_DATA_DIR,'statefarm1'))

Found 20996 images belonging to 10 classes.
Found 1410 images belonging to 10 classes.
Found 79726 images belonging to 1 classes.


In [29]:
classes = sorted(batches.class_indices, key=batches.class_indices.get)

In [30]:
submission = pd.DataFrame(subm, columns=classes)
submission.insert(0, 'img', [a[8:] for a in test_filenames])
submission.head()

Unnamed: 0,img,c0,c1,c2,c3,c4,c5,c6,c7,c8,c9
0,img_81601.jpg,0.007778,0.022368,0.007778,0.007778,0.007778,0.007778,0.518309,0.017359,0.007778,0.436324
1,img_14887.jpg,0.010862,0.007778,0.007778,0.007778,0.007778,0.007778,0.007778,0.007778,0.007778,0.93
2,img_62885.jpg,0.007778,0.007778,0.007778,0.007778,0.93,0.007778,0.007778,0.007778,0.007778,0.007778
3,img_45125.jpg,0.007778,0.007778,0.052559,0.007778,0.007778,0.007778,0.05774,0.007778,0.882181,0.007778
4,img_22633.jpg,0.013276,0.082433,0.007778,0.007778,0.007778,0.007778,0.007778,0.007778,0.882602,0.016254


In [31]:
submission.to_csv(subm_name, index=False, compression='gzip')

In [None]:
FileLink(subm_name)

# Conclusion

In conclusion, the above validation dataset looks highly accurate, but do not be misled, there same subject appear in the training dataset and validation dataset, albiet the versions in the training dataset are augmented.  

Pose estimation is difficult when harnessing transfer learning from Vgg16 (or Vgg19) because different poses were not different classes in the original Vgg modeles, nor were hands, hands holding cups, hand holding mobile phone down low, hands holding mobile phone up near ears. 22k images is hardly sufficient training data for classifying vastly similar poses in 224x224 images with vastly different test subjects; especially when bounding boxes are not provided; and especially when there are 10 different classes hence a random uniform guess at the actual class is only expected to be correct around 10% of the time, i.e. there is much bigger gap (to 100% c.c.r.) as compared to a two-class classifier.  

The next logical step is to incorporate hand, steering wheel, face, phone, make-up mirror/gaze direction bounding boxes (or segmentation) into a multi-label neural network model to improve performance.  For example, a model  bounding boxes for hands could be trained on an annotated hand dataset (e.g. http://www.robots.ox.ac.uk/~vgg/data/hands/) then used to predict the bounding boxes for hands in the images of the distracted-driver data set.  Similarly, the bounding boxes for steering wheel, face, phone, make-up mirror and gaze direction (can just be a two points forming a vector from the estimated centre of the eyeball to the centre of the pupil). Using the functional model API of Keras, the outputs of the bounding boxes can be connected with regression activation function to the second last layer of the model for the classification output.