# Enter State Farm

In [1]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

In [1]:
%matplotlib inline
from __future__ import print_function, division
path = "data/state/"
#path = "data/state/sample/"
import imp
import utils
from utils import *
from IPython.display import FileLink

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
batch_size= 16

## Setup batches

In [3]:
batches = get_batches(path+'train', batch_size=batch_size)
val_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=False)

Found 20924 images belonging to 10 classes.
Found 1500 images belonging to 10 classes.


In [4]:
(val_classes, trn_classes, val_labels, trn_labels, 
    val_filenames, filenames, test_filenames) = get_classes(path)

Found 20924 images belonging to 10 classes.
Found 1500 images belonging to 10 classes.
Found 79726 images belonging to 1 classes.


Rather than using batches, we could just import all the data into an array to save some processing time. (In most examples I'm using the batches, however - just because that's how I happened to start out.)

与其使用批处理，不如将所有数据导入到数组中，以节省一些处理时间。(不过，在大多数例子中，我使用批次——只是因为我恰好就是这样开始的。)

In [None]:
trn = get_data(path+'train')
val = get_data(path+'valid')

In [None]:
save_array(path+'results/val.dat', val)
save_array(path+'results/trn.dat', trn)

In [None]:
val = load_array(path+'results/val.dat')
trn = load_array(path+'results/trn.dat')

## Re-run sample experiments on full dataset

We should find that everything that worked on the sample (see statefarm-sample.ipynb), works on the full dataset too. Only better! Because now we have more data. So let's see how they go - the models in this section are exact copies of the sample notebook models.

### 单个卷积层

In [5]:
def conv1(batches):
    model = Sequential([
            BatchNormalization(axis=1, input_shape=(3,224,224)),
            Conv2D(filters=32,kernel_size=(3,3), activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D((3,3)),
            Conv2D(filters=64,kernel_size=(3,3), activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D((3,3)),
            Flatten(),
            Dense(200, activation='relu'),
            BatchNormalization(),
            Dense(10, activation='softmax')
        ])

    model.compile(Adam(lr=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
    #model.fit_generator(batches, epochs=2, validation_data=val_batches)
    model.fit_generator(batches, steps_per_epoch=batches.n // batch_size, epochs=2, validation_data=val_batches, 
                    validation_steps=val_batches.n // batch_size)
    model.optimizer.lr = 0.001
    model.fit_generator(batches, steps_per_epoch=batches.n // batch_size, epochs=4, validation_data=val_batches, 
                    validation_steps=val_batches.n // batch_size)
    return model

In [6]:
batch_size

64

In [7]:
batches.n //batch_size

326

In [8]:
len(batches)

327

In [9]:
batches.n

20924

In [10]:
model = conv1(batches)

Epoch 1/2
Epoch 2/2
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Interestingly, with no regularization or augmentation we're getting some reasonable results from our simple convolutional model. So with augmentation, we hopefully will see some very good results.

有趣的是，由于没有正规化或增强，我们从简单的卷积模型得到了一些合理的结果。因此，通过增加，我们希望能看到一些很好的结果。

### 数据增广

In [11]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)

Found 20924 images belonging to 10 classes.


In [12]:
model = conv1(batches)

Epoch 1/2
Epoch 2/2
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


In [None]:
1e-4

In [13]:
# 在之前的模型上，变更学习率为0.00001
model.optimizer.lr = 0.0001
# model.fit_generator(batches, steps_per_epoch=batches.n // batch_size, epochs=15, validation_data=val_batches, 
#                     validation_steps=val_batches.n // batch_size)
model.fit_generator(batches, epochs=15, validation_data=val_batches)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x2af7ffe4588>

I'm shocked by *how* good these results are! We're regularly seeing 75-80% accuracy on the validation set, which puts us into the top third or better of the competition. With such a simple model and no dropout or semi-supervised learning, this really speaks to the power of this approach to data augmentation.

我对这些结果如此之好感到震惊!我们经常在验证集上看到75-80%的准确率，这使我们进入了竞争中排名前三或更高的位置。有了这样一个简单的模型，没有dropout或半监督学习，这就证明了这种方法对数据增强的作用。

### Four conv/pooling pairs + dropout 显卡不行不能完成

Unfortunately, the results are still very unstable - the validation accuracy jumps from epoch to epoch. Perhaps a deeper model with some dropout would help.

由于显卡太烂，这个部分无法训练完成

In [50]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)

Found 20924 images belonging to 10 classes.


In [51]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Conv2D(32,3, activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D(),
        Conv2D(64,3, activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D(),
        Conv2D(128,3, activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D(),
        Flatten(),
        Dense(200, activation='relu'),
        BatchNormalization(),
        Dropout(0.5),
        Dense(200, activation='relu'),
        BatchNormalization(),
        Dropout(0.5),
        Dense(10, activation='softmax')
    ])

In [52]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization_15 (Batc (None, 3, 224, 224)       12        
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 32, 222, 222)      896       
_________________________________________________________________
batch_normalization_16 (Batc (None, 32, 222, 222)      128       
_________________________________________________________________
max_pooling2d_13 (MaxPooling (None, 32, 111, 111)      0         
_________________________________________________________________
conv2d_22 (Conv2D)           (None, 64, 109, 109)      18496     
_________________________________________________________________
batch_normalization_17 (Batc (None, 64, 109, 109)      256       
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 64, 54, 54)        0         
__________

In [17]:
10e-5

0.0001

In [18]:
10e-4

0.001

In [19]:
model.compile(Adam(lr=10e-5), loss='categorical_crossentropy', metrics=['accuracy'])

In [20]:
model.fit_generator(batches, steps_per_epoch=batches.n // batch_size, epochs=2, validation_data=val_batches, 
                    validation_steps=val_batches.n // batch_size)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x2af7e6cdfd0>

In [21]:
model.optimizer.lr=0.001 # 提高学习率

In [22]:
model.fit_generator(batches, steps_per_epoch=batches.n // batch_size, epochs=10, validation_data=val_batches, 
                    validation_steps=val_batches.n // batch_size)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2af7e6cd940>

In [24]:
model.optimizer.lr=0.00001 # 降低学习率

In [25]:
model.fit_generator(batches, steps_per_epoch=batches.n // batch_size, epochs=10, validation_data=val_batches, 
                    validation_steps=val_batches.n // batch_size)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2af7e6cd630>

This is looking quite a bit better - the accuracy is similar, but the stability is higher. There's still some way to go however...

### Imagenet conv features 有报错暂时无法尝试这里

Since we have so little data, and it is similar to imagenet images (full color photos), using pre-trained VGG weights is likely to be helpful - in fact it seems likely that we won't need to fine-tune the convolutional layer weights much, if at all. So we can pre-compute the output of the last convolutional layer, as we did in lesson 3 when we experimented with dropout. (However this means that we can't use full data augmentation, since we can't pre-compute something that changes every image.)


由于我们的数据如此之少，而且它与imagenet图像(全彩照片)相似，使用预先训练的VGG权重很可能是有用的——事实上，我们似乎不需要调整卷积层的权重，如果有的话。所以我们可以预先计算最后一个卷积层的输出，就像我们在第3课中做的那样，当我们做dropout实验的时候。(然而，这意味着我们不能使用全数据增强，因为我们不能预先计算改变每一张图像的东西。)

In [2]:
vgg = Vgg16()

In [3]:
model = vgg.model

In [28]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lambda_1 (Lambda)            (None, 3, 224, 224)       0         
_________________________________________________________________
zero_padding2d_1 (ZeroPaddin (None, 3, 226, 226)       0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 64, 224, 224)      1792      
_________________________________________________________________
zero_padding2d_2 (ZeroPaddin (None, 64, 226, 226)      0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 64, 224, 224)      36928     
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 64, 112, 112)      0         
_________________________________________________________________
zero_padding2d_3 (ZeroPaddin (None, 64, 114, 114)      0         
__________

In [55]:
# 寻找最后一个卷积层

for i,layer in enumerate(model.layers):
    if type(layer) is Conv2D:
        print(i,type(layer))

2 <class 'keras.layers.convolutional.Conv2D'>
4 <class 'keras.layers.convolutional.Conv2D'>
7 <class 'keras.layers.convolutional.Conv2D'>
9 <class 'keras.layers.convolutional.Conv2D'>
12 <class 'keras.layers.convolutional.Conv2D'>
14 <class 'keras.layers.convolutional.Conv2D'>
16 <class 'keras.layers.convolutional.Conv2D'>
19 <class 'keras.layers.convolutional.Conv2D'>
21 <class 'keras.layers.convolutional.Conv2D'>
23 <class 'keras.layers.convolutional.Conv2D'>
26 <class 'keras.layers.convolutional.Conv2D'>
28 <class 'keras.layers.convolutional.Conv2D'>
30 <class 'keras.layers.convolutional.Conv2D'>


In [30]:
[i for i,layer in enumerate(model.layers) if type(layer) is Conv2D]

[2, 4, 7, 9, 12, 14, 16, 19, 21, 23, 26, 28, 30]

In [4]:
last_conv_idx = [i for i,layer in enumerate(model.layers) if type(layer) is Conv2D][-1]

In [32]:
last_conv_idx

30

In [5]:
conv_layers = model.layers[:last_conv_idx+1]

In [6]:
# 建立顺序模型
conv_model = Sequential(conv_layers)

In [7]:
batch_size = 16

In [8]:
# batches shuffle must be set to False when pre-computing features
# 在预训练模型中，获取批量数据不能打乱
batches = get_batches(path+'train', batch_size=batch_size, shuffle=False)

Found 20924 images belonging to 10 classes.


In [16]:
(val_classes, trn_classes, val_labels, trn_labels, 
    val_filenames, filenames, test_filenames) = get_classes(path)

Found 20924 images belonging to 10 classes.
Found 1500 images belonging to 10 classes.
Found 79726 images belonging to 1 classes.


In [9]:
conv_feat = conv_model.predict_generator(batches,steps=batches.n //batch_size,verbose=1) # 训练



InternalError: Dst tensor is not initialized.
	 [[Node: conv2d_13_1/Relu/_195 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_162_conv2d_13_1/Relu", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

In [38]:

conv_val_feat = conv_model.predict_generator(val_batches,steps=val_batches.n//batch_size) # 验证
conv_test_feat = conv_model.predict_generator(test_batches,steps=test_batches.n //batch_size) # 测试

InternalError: Dst tensor is not initialized.
	 [[Node: conv2d_20_1/Relu/_1149 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_162_conv2d_20_1/Relu", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

In [None]:
save_array(path+'results/conv_val_feat.dat', conv_val_feat)
save_array(path+'results/conv_test_feat.dat', conv_test_feat)
save_array(path+'results/conv_feat.dat', conv_feat)

In [None]:
conv_feat = load_array(path+'results/conv_feat.dat')
conv_val_feat = load_array(path+'results/conv_val_feat.dat')
conv_val_feat.shape

### Batchnorm dense layers on pretrained conv layers 有问题

Since we've pre-computed the output of the last convolutional layer, we need to create a network that takes that as input, and predicts our 10 classes. Let's try using a simplified version of VGG's dense layers.

In [None]:
def get_bn_layers(p):
    return [
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dropout(p/2),
        Dense(128, activation='relu'),
        BatchNormalization(),
        Dropout(p/2),
        Dense(128, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(10, activation='softmax')
        ]

In [None]:
p=0.8

In [None]:
bn_model = Sequential(get_bn_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=1, 
             validation_data=(conv_val_feat, val_labels))

In [None]:
bn_model.optimizer.lr=0.01

In [None]:
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=2, 
             validation_data=(conv_val_feat, val_labels))

In [None]:
bn_model.save_weights(path+'models/conv8.h5')

Looking good! Let's try pre-computing 5 epochs worth of augmented data, so we can experiment with combining dropout and augmentation on the pre-trained model.

### Pre-computed data augmentation + dropout

We'll use our usual data augmentation parameters:

In [3]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
da_batches = get_batches(path+'train', gen_t, batch_size=batch_size, shuffle=False)

Found 20924 images belonging to 10 classes.


We use those to create a dataset of convolutional features 5x bigger than the training set.

In [4]:
da_conv_feat = conv_model.predict_generator(da_batches, da_batches.nb_sample*5)

NameError: name 'conv_model' is not defined

In [None]:
save_array(path+'results/da_conv_feat2.dat', da_conv_feat)

In [None]:
da_conv_feat = load_array(path+'results/da_conv_feat2.dat')

Let's include the real training data as well in its non-augmented form.

In [None]:
da_conv_feat = np.concatenate([da_conv_feat, conv_feat])

Since we've now got a dataset 6x bigger than before, we'll need to copy our labels 6 times too.

In [None]:
da_trn_labels = np.concatenate([trn_labels]*6)

Based on some experiments the previous model works well, with bigger dense layers.

In [None]:
def get_bn_da_layers(p):
    return [
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dropout(p),
        Dense(256, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(256, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(10, activation='softmax')
        ]

In [None]:
p=0.8

In [None]:
bn_model = Sequential(get_bn_da_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

Now we can train the model as usual, with pre-computed augmented data.

In [None]:
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, nb_epoch=1, 
             validation_data=(conv_val_feat, val_labels))

In [None]:
bn_model.optimizer.lr=0.01

In [None]:
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, nb_epoch=4, 
             validation_data=(conv_val_feat, val_labels))

In [None]:
bn_model.optimizer.lr=0.0001

In [None]:
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, nb_epoch=4, 
             validation_data=(conv_val_feat, val_labels))

Looks good - let's save those weights.

In [None]:
bn_model.save_weights(path+'models/da_conv8_1.h5')

### Pseudo labeling

We're going to try using a combination of [pseudo labeling](http://deeplearning.net/wp-content/uploads/2013/03/pseudo_label_final.pdf) and [knowledge distillation](https://arxiv.org/abs/1503.02531) to allow us to use unlabeled data (i.e. do semi-supervised learning). For our initial experiment we'll use the validation set as the unlabeled data, so that we can see that it is working without using the test set. At a later date we'll try using the test set.

To do this, we simply calculate the predictions of our model...

In [None]:
val_pseudo = bn_model.predict(conv_val_feat, batch_size=batch_size)

...concatenate them with our training labels...

In [None]:
comb_pseudo = np.concatenate([da_trn_labels, val_pseudo])

In [None]:
comb_feat = np.concatenate([da_conv_feat, conv_val_feat])

...and fine-tune our model using that data.

In [None]:
bn_model.load_weights(path+'models/da_conv8_1.h5')

In [None]:
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, nb_epoch=1, 
             validation_data=(conv_val_feat, val_labels))

In [None]:
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, nb_epoch=4, 
             validation_data=(conv_val_feat, val_labels))

In [None]:
bn_model.optimizer.lr=0.00001

In [None]:
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, nb_epoch=4, 
             validation_data=(conv_val_feat, val_labels))

That's a distinct improvement - even although the validation set isn't very big. This looks encouraging for when we try this on the test set.

In [None]:
bn_model.save_weights(path+'models/bn-ps8.h5')

### Submit

We'll find a good clipping amount using the validation set, prior to submitting.

In [None]:
def do_clip(arr, mx): return np.clip(arr, (1-mx)/9, mx)

In [None]:
keras.metrics.categorical_crossentropy(val_labels, do_clip(val_preds, 0.93)).eval()

In [None]:
conv_test_feat = load_array(path+'results/conv_test_feat.dat')

In [None]:
preds = bn_model.predict(conv_test_feat, batch_size=batch_size*2)

In [None]:
subm = do_clip(preds,0.93)

In [None]:
subm_name = path+'results/subm.gz'

In [None]:
classes = sorted(batches.class_indices, key=batches.class_indices.get)

In [None]:
submission = pd.DataFrame(subm, columns=classes)
submission.insert(0, 'img', [a[4:] for a in test_filenames])
submission.head()

In [None]:
submission.to_csv(subm_name, index=False, compression='gzip')

In [None]:
FileLink(subm_name)

This gets 0.534 on the leaderboard.

## The "things that didn't really work" section

You can safely ignore everything from here on, because they didn't really help.

### Finetune some conv layers too

In [None]:
for l in get_bn_layers(p): conv_model.add(l)

In [None]:
for l1,l2 in zip(bn_model.layers, conv_model.layers[last_conv_idx+1:]):
    l2.set_weights(l1.get_weights())

In [None]:
for l in conv_model.layers: l.trainable =False

In [None]:
for l in conv_model.layers[last_conv_idx+1:]: l.trainable =True

In [None]:
comb = np.concatenate([trn, val])

In [None]:
gen_t = image.ImageDataGenerator(rotation_range=8, height_shift_range=0.04, 
                shear_range=0.03, channel_shift_range=10, width_shift_range=0.08)

In [None]:
batches = gen_t.flow(comb, comb_pseudo, batch_size=batch_size)

In [None]:
val_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=False)

In [None]:
conv_model.compile(Adam(lr=0.00001), loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
conv_model.fit_generator(batches, batches.N, nb_epoch=1, validation_data=val_batches, 
                 nb_val_samples=val_batches.N)

In [None]:
conv_model.optimizer.lr = 0.0001

In [None]:
conv_model.fit_generator(batches, batches.N, nb_epoch=3, validation_data=val_batches, 
                 nb_val_samples=val_batches.N)

In [None]:
for l in conv_model.layers[16:]: l.trainable =True

In [None]:
conv_model.optimizer.lr = 0.00001

In [None]:
conv_model.fit_generator(batches, batches.N, nb_epoch=8, validation_data=val_batches, 
                 nb_val_samples=val_batches.N)

In [None]:
conv_model.save_weights(path+'models/conv8_ps.h5')

In [None]:
conv_model.load_weights(path+'models/conv8_da.h5')

In [None]:
val_pseudo = conv_model.predict(val, batch_size=batch_size*2)

In [None]:
save_array(path+'models/pseudo8_da.dat', val_pseudo)

### Ensembling

In [None]:
drivers_ds = pd.read_csv(path+'driver_imgs_list.csv')
drivers_ds.head()

In [None]:
img2driver = drivers_ds.set_index('img')['subject'].to_dict()

In [None]:
driver2imgs = {k: g["img"].tolist() 
               for k,g in drivers_ds[['subject', 'img']].groupby("subject")}

In [None]:
def get_idx(driver_list):
    return [i for i,f in enumerate(filenames) if img2driver[f[3:]] in driver_list]

In [None]:
drivers = driver2imgs.keys()

In [None]:
rnd_drivers = np.random.permutation(drivers)

In [None]:
ds1 = rnd_drivers[:len(rnd_drivers)//2]
ds2 = rnd_drivers[len(rnd_drivers)//2:]

In [None]:
models=[fit_conv([d]) for d in drivers]
models=[m for m in models if m is not None]

In [None]:
all_preds = np.stack([m.predict(conv_test_feat, batch_size=128) for m in models])
avg_preds = all_preds.mean(axis=0)
avg_preds = avg_preds/np.expand_dims(avg_preds.sum(axis=1), 1)

In [None]:
keras.metrics.categorical_crossentropy(val_labels, np.clip(avg_val_preds,0.01,0.99)).eval()

In [None]:
keras.metrics.categorical_accuracy(val_labels, np.clip(avg_val_preds,0.01,0.99)).eval()