This tutorial will guide you through the process of using _transfer learning_ to learn an accurate image classifier from a small number of training samples. Transfer learning refers to the process of taking an existing neural network which was previously trained to good performance on a larger dataset, and using it as the basis for a new model which leverages the previous model's accuracy for the new task. Another name for this procedure is called "fine-tuning" because we will take a previous neural network and fine-tune its weights to a new dataset.

How it works: we will first load a previously-trained neural net, Keras's built-in VGG16 model which was trained to do ImageNet classification on 1000 classes, and remove its final layer, the 1000-neuron softmax classification layer, and replace it with a new classification layer for the number of classes we are training on. We then "freeze" all the weights in the network except the ones connecting to the new classification layer, and then re-train the model on our new dataset. By keeping the earlier weights fixed, we get to use the features that were discoverd in the previous model.

We will compare using this method to training a small neural network from scratch on the new dataset, and as we shall see, it will dramatically improve our accuracy.

#http://stackoverflow.com/questions/41378461/how-to-use-models-from-keras-applications-for-transfer-learnig/41386444#41386444

In [1]:
import os
os.environ["THEANO_FLAGS"] = "mode=FAST_RUN,device=gpu,floatX=float32"

import random
import numpy as np
import keras
from keras.preprocessing import image
from keras.applications.imagenet_utils import preprocess_input
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.layers import Conv2D, MaxPooling2D
from keras.models import Model

Using Theano backend.
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GTX TITAN X (CNMeM is disabled, cuDNN 5103)


    train, test, val split

In [2]:
root = '../data/101_ObjectCategories'
exclude = ['BACKGROUND_Google', 'Motorbikes', 'airplanes', 'Faces_easy', 'Faces']
train_split, val_split = 0.7, 0.15

categories = [x[0] for x in os.walk(root) if x[0]][1:]
categories = [c for c in categories 
              if c not in [os.path.join(root, e) for e in exclude]]

# helper function to load image and return it and input vector
def get_image(path):
    img = image.load_img(path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    return img, x

# load all the images from root folder
data = []
for c, category in enumerate(categories):
    images = [os.path.join(dp, f) for dp, dn, filenames 
              in os.walk(category) for f in filenames 
              if os.path.splitext(f)[1].lower() in ['.jpg','.png','.jpeg']]
    for img_path in images:
        img, x = get_image(img_path)
        data.append({'x':np.array(x[0]), 'y':c})

# count the number of classes
num_classes = len(categories)

# shuffle 
random.shuffle(data)

# create training / validation / test split (70%, 15%, 15%)
idx_val = int(train_split * len(data))
idx_test = int((train_split + val_split) * len(data))
train = data[:idx_val]
val = data[idx_val:idx_test]
test = data[idx_test:]

# separate data for labels
x_train, y_train = np.array([t["x"] for t in train]), [t["y"] for t in train]
x_val, y_val = np.array([t["x"] for t in val]), [t["y"] for t in val]
x_test, y_test = np.array([t["x"] for t in test]), [t["y"] for t in test]

# normalize data
x_train = x_train.astype('float32') / 255.
x_val = x_val.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

# convert labels to one-hot vectors
y_train = keras.utils.to_categorical(y_train, num_classes)
y_val = keras.utils.to_categorical(y_val, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# summary
print("finished loading %d images from %d categories"%(len(data), num_classes))
print("train / validation / test split: %d, %d, %d"%(len(x_train), len(x_val), len(x_test)))
print("training data shape: ", x_train.shape)
print("training labels shape: ", y_train.shape)


finished loading 6209 images from 97 categories
train / validation / test split: 4346, 931, 932
('training data shape: ', (4346, 224, 224, 3))
('training labels shape: ', (4346, 97))


Let's quickly look at a few sample images.

Now we are going to build a neural network from scratch for doing classification on our dataset. This will give us a baseline to compare our fine-tuned network later.

This network will contain 4 convolutional and max-pooling layers, followed by a dropout after every other conv/pooling pair. After the last pooling layer, we will attach a fully-connected layer with 256 neurons, another dropout layer, then finally a softmax classification layer for our classes.

Our loss function will be, as usual, `categorical_crossentropy` loss, and our learning algorithm will be adadelta. Various things about this network can be changed to get better performance, perhaps using a larger network or a different optimizer will help, but for the purposes of this notebook, the goal is to just get an understanding of an approximate baseline for comparison's sake.

Upon compiling the network, let's run `model.summary()` to get a snapshot of its layers.

In [3]:
model = Sequential()

model.add(Conv2D(32, (3, 3), input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Dropout(0.25))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(256))
model.add(Activation('relu'))

model.add(Dropout(0.5))

model.add(Dense(num_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 222, 222, 32)      896       
_________________________________________________________________
activation_1 (Activation)    (None, 222, 222, 32)      0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 111, 111, 32)      0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 109, 109, 32)      9248      
_________________________________________________________________
activation_2 (Activation)    (None, 109, 109, 32)      0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 54, 54, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 54, 54, 32)        0         
__________

We've created a medium-sized network with >1.2 million weights and biases. Most of them are leading into the one pre-softmax fully-connected layer "dense_5". 

We can now go ahead and train our model. We'll also record its history so we can plot the loss over time later.

In [4]:
history = model.fit(x_train, y_train,
                    batch_size=64,
                    epochs=50,
                    validation_data=(x_val, y_val))

Train on 4346 samples, validate on 931 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7f2858c161d0>

Let's plot the loss function the loss over time.

In [None]:
5

After 50 epochs, we've got a final validation accuracy of around 50%. We've held out a test set for final evaluation which we do in the following cell.

In [5]:
loss, accuracy = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', loss)
print('Test accuracy:', accuracy)

('Test loss:', 3.7721598874857496)
('Test accuracy:', 0.49463519313304721)


Finally, we've achieved a top-1 accuracy of around 50%. That's not too bad, considering that if we were to use a baseline strategy of taking random guesses, we would have only gotten around 1% accuracy. 

Now we can move on to a better strategy for training an image classifier on our small dataset: by starting with a larger and already trained network, previously 

To start, we will load the [VGG16 network](https://arxiv.org/pdf/1409.1556.pdf) from keras, which was trained on ImageNet and the weights saved online. If this is your first time loading VGG16, you'll need to wait a bit for the weights to download from the internet. Once the network is loaded, we can inspect the layers with the `summary()` method again.

In [6]:
vgg = keras.applications.VGG16(weights='imagenet', include_top=True)
vgg.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

Notice that VGG16 is much bigger than the network we constructed. It contains 13 convolutional layers and two fully connected layers at the end, and has over 138 million parameters, around 100 times more parameters than the network we made above. Like our first network, a large proportion of the weights are stored in the connections leading into the first fully-connected layer.

VGG16 was made to solve ImageNet, and achieves a [8.8% top-5 error rate](https://github.com/jcjohnson/cnn-benchmarks), which means that 91.2% of test samples were classified correctly within the top 5 predictions for each image. It's top-1 accuracy--equivalent to the accuracy we've been using (the top prediction is correct)--is 73%. This is especially impressive since there are 1000 classes, meaning that random guesses would get us only 0.1% accuracy.

In order to use this network for our task, we remove the final classification layer, the 1000-neuron softmax layer at the end, which corresponds to ImageNet, and instead replace it with a new softmax layer for our dataset, which contains 97 neurons in the case of the 101_ObjectCategories dataset. This can be done in the following way, by using the keras `Model` class to initialize a new model whose input layer is the same as VGG but whose output layer is our new softmax layer, called `new_classification_layer`. Note: although it appears we are duplicating this large network, internally it is actually just copying all the layers by reference, and thus we don't need to worry about overloading the memory.

In [41]:
new_classification_layer = Dense(num_classes, activation='softmax')
inp = vgg.input
out = new_classification_layer(vgg.layers[-2].output)
model_new = Model(inp, out)

We are going to retrain this network on the new classifications. But first, we need to freeze the weights and biases in all the layers in the network, except our new one at the end. The point is that the features that were learned in VGG should still be fairly relevant to the new image classification task. Not perfectly optimal, but most likely better than what we can find in our limited dataset. By setting the `trainable` flag in each layer false (except our new classification layer), we ensure all the weights and biases in those layers remain fixed, and we simply "fine-tune" the parameters in the one layer at the end. 

Note that sometimes you don't have to freeze all the pre-classification layers. If you have enough data, it may be desirable to retrain more than one layer at the end. In our case, we don't have enough samples for this to be worth, so we simply go ahead with the plan.

After freezing the other layers, we compile the model with exactly the same optimizer and loss function as in our first network, for the sake of a fair comparison. We then run `summary` again to look at the network's architecture.

In [42]:
for l, layer in enumerate(model_new.layers[:-1]):
    layer.trainable = False
for l, layer in enumerate(model_new.layers[-1:]):
    layer.trainable = True
    
model_new.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model_new.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

Looking at the summary, we see the network is identical to the vgg model we instantiated earlier, except the last layer, formerly a 1000-neuron softmax, has been replaced by a new 97-neuron softmax. Additionally, we still have roughly 134 million weights, but now the vast majority of them are "non-trainable params" because we froze the layers they are contained in. We now only have 397,000 trainable parameters, which is actually only a fourth of the number of parameters needed to train the first model.

As before, we go ahead and train the new model, using the same hyperparametrs (batch size and number of epochs) as before, along with the same optimization algorithm. We also keep track of its history as we go.

In [35]:
history = model_new.fit(x_train, y_train, 
                        batch_size=16, 
                        epochs=50, 
                        validation_data=(x_val, y_val))

Train on 4346 samples, validate on 931 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [37]:




print(history.history['loss'])
print(history.history['acc'])

print(history.history['val_loss'])
print(history.history['val_acc'])




[4.2949959259673962, 2.4873292677642782, 1.9644532456575754, 1.6791490945627323, 1.5039151877891728, 1.3699527028185181, 1.2620284194448166, 1.1865798278552731, 1.1214212294388519, 1.0554262819895808, 1.0195639578911, 0.96521879693521528, 0.92043418778398867, 0.8921702675593377, 0.86708424102516701, 0.81102828983880937, 0.75015084647770258, 0.69931884568734926, 0.63562792278157643, 0.61131660624039075, 0.58806850390531917, 0.5609743296894345, 0.54923666146182271, 0.51905008086920112, 0.49481008632483153, 0.4915171367983307, 0.45482381573911229, 0.45004009403154643, 0.43933838564736249, 0.40959554736138598, 0.4239096541936585, 0.3998301382842176, 0.38497586167399039, 0.3687508155536498, 0.36423922145505189, 0.37044950948708405, 0.33821546537095526, 0.32558506235413448, 0.32490420896513666, 0.31721067352214416, 0.32447247091274461, 0.31139463429530811, 0.28598313295834016, 0.27618101911231002, 0.2814358420450595, 0.27332117898904862, 0.25761526321359679, 0.2609408490644366, 0.24704756533

In [38]:
loss, accuracy = model_new.evaluate(x_test, y_test, verbose=0)

print('Test loss:', loss)
print('Test accuracy:', accuracy)
# with rmsprop - 16

('Test loss:', 1.4147466119266887)
('Test accuracy:', 0.75214592274678116)


In [43]:





# with adam - 16
history = model_new.fit(x_train, y_train, 
                        batch_size=16, 
                        epochs=100, 
                        validation_data=(x_val, y_val))





Train on 4346 samples, validate on 931 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100


In [44]:

# with adam - 16
print(history.history['loss'])
print(history.history['acc'])

print(history.history['val_loss'])
print(history.history['val_acc'])

loss, accuracy = model_new.evaluate(x_test, y_test, verbose=0)

print('Test loss:', loss)
print('Test accuracy:', accuracy)


[3.7790937493925978, 2.0970398478391692, 1.6259035915479207, 1.2606343830778366, 1.1240255247871831, 1.0093628710938993, 0.89224485450223245, 0.85471659687549573, 0.76169356513122122, 0.71047449165426313, 0.68537774897367787, 0.64579114655278769, 0.65955451748034288, 0.6968140838624034, 0.67212362732143371, 0.64097165518294152, 0.60772312134572892, 0.49952661492242495, 0.48521878651578582, 0.5216190933056597, 0.49623357044473748, 0.50231457658341772, 0.4446272985539127, 0.45016871240492301, 0.43783838417787352, 0.45232719857223397, 0.43351669246017988, 0.39091518754396698, 0.36086868335828493, 0.42931456646842997, 0.37546639811304161, 0.49386546966005623, 0.32687784339786879, 0.36310052895640765, 0.33182020846701166, 0.42094313476647832, 0.6341937119822677, 0.71568800993195147, 0.6935976320115641, 0.74672852675578338, 0.70043119345223759, 0.68886478043521771, 0.56551276913853055, 0.60562490715931983, 0.65656631367835672, 0.54612868000764792, 0.51909095522194582, 0.54829271179158878, 0.

In [45]:




new_classification_layer = Dense(num_classes, activation='softmax')
inp = vgg.input
out = new_classification_layer(vgg.layers[-2].output)
model_new = Model(inp, out)


for l, layer in enumerate(model_new.layers[:-1]):
    layer.trainable = False
for l, layer in enumerate(model_new.layers[-1:]):
    layer.trainable = True
    
model_new.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model_new.summary()

history = model_new.fit(x_train, y_train, 
                        batch_size=8, 
                        epochs=100, 
                        validation_data=(x_val, y_val))



# with adam - 16
print(history.history['loss'])
print(history.history['acc'])

print(history.history['val_loss'])
print(history.history['val_acc'])

loss, accuracy = model_new.evaluate(x_test, y_test, verbose=0)

print('Test loss:', loss)
print('Test accuracy:', accuracy)



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

In [46]:






new_classification_layer = Dense(num_classes, activation='softmax')
inp = vgg.input
out = new_classification_layer(vgg.layers[-2].output)
model_new = Model(inp, out)


for l, layer in enumerate(model_new.layers[:-1]):
    layer.trainable = False
for l, layer in enumerate(model_new.layers[-1:]):
    layer.trainable = True
    
model_new.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])

model_new.summary()

history = model_new.fit(x_train, y_train, 
                        batch_size=8, 
                        epochs=100, 
                        validation_data=(x_val, y_val))



# with adam - 16
print(history.history['loss'])
print(history.history['acc'])

print(history.history['val_loss'])
print(history.history['val_acc'])

loss, accuracy = model_new.evaluate(x_test, y_test, verbose=0)

print('Test loss:', loss)
print('Test accuracy:', accuracy)





_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

In [47]:






new_classification_layer = Dense(num_classes, activation='softmax')
inp = vgg.input
out = new_classification_layer(vgg.layers[-2].output)
model_new = Model(inp, out)


for l, layer in enumerate(model_new.layers[:-1]):
    layer.trainable = False
for l, layer in enumerate(model_new.layers[-1:]):
    layer.trainable = True
    
model_new.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model_new.summary()

history = model_new.fit(x_train, y_train, 
                        batch_size=64, 
                        epochs=100, 
                        validation_data=(x_val, y_val))



# with adam - 16
print(history.history['loss'])
print(history.history['acc'])

print(history.history['val_loss'])
print(history.history['val_acc'])

loss, accuracy = model_new.evaluate(x_test, y_test, verbose=0)

print('Test loss:', loss)
print('Test accuracy:', accuracy)





_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

In [48]:









new_classification_layer = Dense(num_classes, activation='softmax')
inp = vgg.input
out = new_classification_layer(vgg.layers[-2].output)
model_new = Model(inp, out)


for l, layer in enumerate(model_new.layers[:-1]):
    layer.trainable = False
for l, layer in enumerate(model_new.layers[-1:]):
    layer.trainable = True
    
model_new.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])

model_new.summary()

history = model_new.fit(x_train, y_train, 
                        batch_size=100, 
                        epochs=100, 
                        validation_data=(x_val, y_val))



# with adam - 16
print(history.history['loss'])
print(history.history['acc'])

print(history.history['val_loss'])
print(history.history['val_acc'])

loss, accuracy = model_new.evaluate(x_test, y_test, verbose=0)

print('Test loss:', loss)
print('Test accuracy:', accuracy)





_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

In [28]:
import matplotlib.pyplot
from matplotlib.pyplot import imshow

matplotlib.pyplot.switch_backend('agg')

matplotlib.pyplot.figure(figsize=(16,4))
matplotlib.pyplot.plot(np.array([1,2,3,4,5]))
matplotlib.pyplot.show()

history.history['val_acc']



Improving results

Can improve by doing augmentation on data
running longer, trying different optimizers and hyperparams (keep val set)

compare to imagenet results


In [49]:
5

5

In [8]:
history = model_new.fit(x_train, y_train, 
                        batch_size=64, 
                        epochs=50, 
                        validation_data=(x_val, y_val))

Train on 4967 samples, validate on 621 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [29]:
loss, accuracy = model_new.evaluate(x_test, y_test, verbose=0)

print('Test loss:', loss)
print('Test accuracy:', accuracy)


('Test loss:', 5.8760007768229867)
('Test accuracy:', 0.53218884120171672)
