# Lesson 2: Linear models with CNN features

## Linear model in Keras

* Dense() layers are just linear models, followed by a simple "activation function".
* Example linear model:

In [68]:
import os
import numpy as np

from keras.models import Sequential
from keras.layers.core import Dense
from keras.optimizers import SGD, RMSprop
from keras.preprocessing.image import ImageDataGenerator

from sklearn.preprocessing import OneHotEncoder

import utils; reload(utils)
from utils import plots, get_batches

In [28]:
x = np.random.random((60, 2))
y = np.dot(x, [3., 7.]) + 1

In [29]:
x[:5]

array([[ 0.5086,  0.1983],
       [ 0.2424,  0.9294],
       [ 0.5754,  0.0879],
       [ 0.091 ,  0.6535],
       [ 0.6799,  0.1307]])

In [30]:
y[:5]

array([ 3.9137,  8.2332,  3.3416,  5.8474,  3.9543])

Can create a simple linear model (Dense() - with no action) and optimise using stochastic gradient descent, minimising mean squared error (mse):

In [31]:
lm = Sequential([Dense(1, input_shape=(2,))])
lm.compile(optimizer=SGD(lr=0.1), loss='mse')

In [32]:
lm.evaluate(x, y, verbose=0)

45.158372751871745

In [33]:
lm.fit(x, y, nb_epoch=5, batch_size=1)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f017e241890>

In [34]:
lm.evaluate(x, y, verbose=0)

0.002923798778404792

In [35]:
# Note that the weighs are close to the y weights (3, 7) with an intercept of 1.
lm.get_weights()

[array([[ 2.8937],
        [ 6.842 ]], dtype=float32), array([ 1.1387], dtype=float32)]

Can use a Dense() layer to convert the 1000 ImageNet predictions into probably of Dog vs Cat, by training a linear model to take 1000 predictions as input and return Dog or Cat as output.

In [36]:
path = "/home/ubuntu/nbs/data/male_female_training_set_20170601/"
model_path = path + 'models'
if not os.path.exists(model_path): os.mkdir(model_path)

In [37]:
batch_size = 64

In [38]:
from vgg16 import Vgg16
vgg = Vgg16()
model = vgg.model

Approach:

1. Get true labels for every image.
2. Get 1,000 ImageNet category predictions for each image.
3. Feed predictions as input to simple linear model.

In [39]:
val_batches = get_batches(path+'test', shuffle=False, batch_size=1)
batches = get_batches(path+'train', shuffle=False, batch_size=1)

Found 668 images belonging to 3 classes.
Found 6450 images belonging to 3 classes.


Can save the preprocessed arrays using bcolz, which also compresses the arrays.

In [40]:
import bcolz

def save_array(fname, arr):
    c = bcolz.carray(arr, rootdir=fname, mode='w')
    c.flush()
    
def load_array(fname):
    return bcolz.open(fname)[:]

In [41]:
def get_data(path, target_size=(224, 224)):
    batches = get_batches(path, shuffle=False, batch_size=1, class_mode=None, target_size=target_size)
    return np.concatenate([batches.next() for i in range(batches.nb_sample)])

In [42]:
val_data = get_data(path+'test')
trn_data = get_data(path+'train')

Found 668 images belonging to 3 classes.
Found 6450 images belonging to 3 classes.


In [43]:
trn_data.shape

(6450, 3, 224, 224)

In [44]:
save_array(model_path+ 'train_data.bc', trn_data)
save_array(model_path + 'valid_data.bc', val_data)

Can now load the training and validation data without having to recalculate.

In [45]:
trn_data = load_array(model_path+'train_data.bc')
val_data = load_array(model_path+'valid_data.bc')

Need to convert classes to one hot encoding, since Keras returns them as single columns.

In [46]:
val_batches.classes[:5]

array([0, 0, 0, 0, 0], dtype=int32)

In [47]:
def onehot(x):
    return np.array(OneHotEncoder().fit_transform(x.reshape(-1,1)).todense())

val_labels = onehot(val_batches.classes)
trn_labels = onehot(batches.classes)

In [48]:
val_labels[:2]

array([[ 1.,  0.,  0.],
       [ 1.,  0.,  0.]])

Grab the 1000 Imagenet probabilities from VGG16 (for some reason?).

In [49]:
trn_features = model.predict(trn_data, batch_size=batch_size)
val_features = model.predict(val_data, batch_size=batch_size)

In [50]:
trn_features.shape

(6450, 1000)

In [51]:
save_array(model_path+ 'train_lastlayer_features.bc', trn_features)
save_array(model_path + 'valid_lastlayer_features.bc', val_features)

In [52]:
trn_features = load_array(model_path+'train_lastlayer_features.bc')
val_features = load_array(model_path+'valid_lastlayer_features.bc')

Now to define a simple linear model that takes 1000 labels as inputs and outputs 3 categories (female, male, none).

In [56]:
# 1000 inputs, since that's the saved features, and 2 outputs, for dog and cat
lm = Sequential([ Dense(3, activation='softmax', input_shape=(1000,)) ])
lm.compile(optimizer=RMSprop(lr=0.1), loss='categorical_crossentropy', metrics=['accuracy'])

In [57]:
lm.fit(trn_features, trn_labels, nb_epoch=3, batch_size=4, 
       validation_data=(val_features, val_labels))

Train on 6450 samples, validate on 668 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f016ed0f590>

In [59]:
lm.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
dense_6 (Dense)                  (None, 3)             3003        dense_input_3[0][0]              
Total params: 3003
____________________________________________________________________________________________________


### About activation functions

* Adding an activation parameter to a layer means another function is run on the layer after it's calculated.
* Nearly all deep model layers have an activation function that's non-linear like ``tanh``, ``sigmoid (1 / (1 + exp(x)))`` or ``relu (max(0,x)``.
  * Because if you stack linear layers, you end up with a linear layer. Eg ```(2 * x) * (-2 * x) = -4*x```.
  * But, by adding an activation function, you could get: ```-2 * max(0, 2 * x)```, which does not simplify.
  * This let's you create arbitrarily complex functions.
* The last layer should have a different activation function to the other layers, to ensure the output is the appropriate form.
  * Since our data is one-hot encoded, we want to ensure a single activation, is way higher than the rest, so softmax is used.
  * Softmax is defined as ```exp(x[i]) / sum(exp(x))```. As below:
  
  ```
  output	exp	softmax
   4.51	    91.05	0.99
  -2.11	    0.12	0.00
  -0.57	    0.57	0.01
  -4.06	    0.02	0.00
  -3.85	    0.02	0.00
            91.78   1.00
  ```

## Modifying the model

### Retrain the last layer's linear model.

* In the original VGG16 network, the last layer is Dense (linear model). In the example above, we are simply adding a linear layer on top of it, which is inefficient.
* What we want to do instead is replace the final layer with one specific to our problem.
* To do it, you remove the last layer from the model and fix all the others.

In [60]:
vgg.model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
lambda_1 (Lambda)                (None, 3, 224, 224)   0           lambda_input_1[0][0]             
____________________________________________________________________________________________________
zeropadding2d_1 (ZeroPadding2D)  (None, 3, 226, 226)   0           lambda_1[0][0]                   
____________________________________________________________________________________________________
convolution2d_1 (Convolution2D)  (None, 64, 224, 224)  1792        zeropadding2d_1[0][0]            
____________________________________________________________________________________________________
zeropadding2d_2 (ZeroPadding2D)  (None, 64, 226, 226)  0           convolution2d_1[0][0]            
___________________________________________________________________________________________

In [61]:
model.pop()
for layer in model.layers:
    layer.trainable = False

In [62]:
model.add(Dense(3, activation='softmax'))

Now compile the model and retrain.

In [63]:
model.compile(optimizer=RMSprop(lr=0.1), loss='categorical_crossentropy', metrics=['accuracy'])

In [69]:
gen = ImageDataGenerator()
batches = gen.flow(trn_data, trn_labels, batch_size=batch_size, shuffle=True)
val_batches = gen.flow(val_data, val_labels, batch_size=batch_size, shuffle=False)

In [71]:
model.fit_generator(batches, samples_per_epoch=batches.N, nb_epoch=1, 
                    validation_data=val_batches, nb_val_samples=val_batches.N)

Epoch 1/1


<keras.callbacks.History at 0x7f016cc65f10>