# Transfer learning (fine-tuning)

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

* flower dataset 으로 해 보자.

* Without data augmentation + epoch 60, batch size 32
    * Scratch: 70.54%
    * VGG16: 83.39%
    * InceptionV3: 72.92%

* With data augmentation + epoch 60, batch size 32
    * Scratch: 69.58%
    * VGG16: 83.86%
    * InceptionV3: 76.99%
    * InceptionV3 + slim hyperparams: 83.13% (diff epoch)

In [1]:
import keras
from keras import backend as K
from keras.layers import Conv2D, BatchNormalization, Activation, MaxPooling2D, Dense, Flatten, Dropout, Input
from keras.models import Model
from keras.preprocessing.image import ImageDataGenerator
import numpy as np

from keras import applications
from keras import optimizers
from keras.models import Sequential

from keras.layers import GlobalAveragePooling2D

Using TensorFlow backend.


In [2]:
keras.__version__

'2.0.6'

In [3]:
np.random.seed(0)

In [4]:
image_shape = [128, 128, 3]

## Setting datagenerator

* this setting should be fair to three model - learning from scratch, transfer learning from VGG16, and transfer learning from InceptionV3.

In [5]:
# without data augmentation
naive_datagen = ImageDataGenerator(rescale=1./255)

# with data augmentation
distort_datagen = ImageDataGenerator(rescale=1./255, 
                                     shear_range=0.2,
                                     zoom_range=0.2,
                                     horizontal_flip=True)

In [6]:
batch_size = 32

# 설정 바꿔주기 편하게 함수로 쓰자
# 생각해보니 의미없네...
def get_datagen_flow_from_dir(training):
    if training:
        return distort_datagen.flow_from_directory(directory='./data/flower_photos/train/', 
                                                   target_size=image_shape[:2], 
                                                   batch_size=batch_size)
    else:
        return naive_datagen.flow_from_directory(directory='./data/flower_photos/test/',
                                                 target_size=image_shape[:2],
                                                 batch_size=batch_size)

In [7]:
train_generator = get_datagen_flow_from_dir(training=True)
test_generator = get_datagen_flow_from_dir(training=False)

Found 3306 images belonging to 5 classes.
Found 364 images belonging to 5 classes.


## Learning from scratch

In [10]:
def build_model_functional(input_shape, output_units):
    input_tensor = Input(input_shape)
    net = input_tensor
    n_filters = 32

    for _ in range(3):
        net = Conv2D(n_filters, [3,3], padding='same', use_bias=False)(net)
        net = BatchNormalization()(net)
        net = Activation('relu')(net)
        net = Conv2D(n_filters, [3,3], padding='same', use_bias=False)(net)
        net = BatchNormalization()(net)
        net = Activation('relu')(net)
        net = MaxPooling2D(padding='same')(net)
        net = Dropout(0.3)(net)
        
        n_filters *= 2

    net = Flatten()(net)
    net = Dense(output_units, activation='softmax')(net)

    model = Model(input_tensor, net)
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    
    return model

In [11]:
model = build_model_functional(image_shape, 5)

In [12]:
result = model.fit_generator(generator=train_generator,
                             steps_per_epoch=train_generator.samples//batch_size,
                             epochs=60,
                             validation_data=test_generator,
                             validation_steps=test_generator.samples//batch_size,
                             verbose=2)

Epoch 1/60
12s - loss: 7.6030 - acc: 0.3419 - val_loss: 7.3710 - val_acc: 0.2415
Epoch 2/60
11s - loss: 3.5879 - acc: 0.4339 - val_loss: 3.4401 - val_acc: 0.1867
Epoch 3/60
11s - loss: 3.0145 - acc: 0.4657 - val_loss: 1.8344 - val_acc: 0.3283
Epoch 4/60
11s - loss: 2.6832 - acc: 0.5044 - val_loss: 1.9863 - val_acc: 0.3976
Epoch 5/60
11s - loss: 2.2068 - acc: 0.5493 - val_loss: 2.4137 - val_acc: 0.5723
Epoch 6/60
11s - loss: 2.1647 - acc: 0.5389 - val_loss: 1.8236 - val_acc: 0.5361
Epoch 7/60
11s - loss: 1.9690 - acc: 0.5791 - val_loss: 3.2194 - val_acc: 0.3946
Epoch 8/60
11s - loss: 1.8541 - acc: 0.5768 - val_loss: 2.1677 - val_acc: 0.5512
Epoch 9/60
11s - loss: 1.8348 - acc: 0.5885 - val_loss: 3.0305 - val_acc: 0.4639
Epoch 10/60
11s - loss: 1.7405 - acc: 0.6032 - val_loss: 1.3072 - val_acc: 0.5572
Epoch 11/60
11s - loss: 1.5155 - acc: 0.6211 - val_loss: 1.0913 - val_acc: 0.6446
Epoch 12/60
11s - loss: 1.2111 - acc: 0.6226 - val_loss: 1.0123 - val_acc: 0.6386
Epoch 13/60
11s - loss: 1

In [13]:
print "{:.2%}".format(np.average(result.history['val_acc'][-5:]))

69.58%


## Transfer learning

### VGG16

In [14]:
K.clear_session()

# Weights are downloaded automatically when instantiating a model. They are stored at `~/.keras/models/.`
model = applications.VGG16(weights='imagenet', include_top=False)

In [15]:
input_tensor = Input(image_shape)
net = model(input_tensor)
print net # check last conv block shape
net = Flatten()(net)
net = Dense(256, activation='relu')(net)
net = Dropout(0.5)(net)
net = Dense(5, activation='softmax')(net)

Tensor("vgg16/block5_pool/MaxPool:0", shape=(?, 4, 4, 512), dtype=float32)


In [16]:
# fine-tuning only last conv block + added 2 fc layers
# for layer in model.layers:
#     print layer.trainable
for layer in model.layers[:-4]:
    layer.trainable = False

In [17]:
model = Model(input_tensor, net)

In [18]:
# According to the tutorial, small learning rate is suitable for fine-tuning.
# In the same context, adaptive learning rate (like adam) is inappropriate.
# So, we use SGD with momentum.
model.compile(optimizer=optimizers.SGD(lr=1e-4, momentum=0.9), 
              loss='categorical_crossentropy', metrics=['accuracy'])

In [19]:
result = model.fit_generator(generator=train_generator,
                             steps_per_epoch=train_generator.samples//batch_size,
                             epochs=60,
                             validation_data=test_generator,
                             validation_steps=test_generator.samples//batch_size,
                             verbose=1)

Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60
Epoch 24/60
Epoch 25/60
Epoch 26/60
Epoch 27/60
Epoch 28/60
Epoch 29/60
Epoch 30/60
Epoch 31/60
Epoch 32/60
Epoch 33/60
Epoch 34/60
Epoch 35/60
Epoch 36/60
Epoch 37/60
Epoch 38/60
Epoch 39/60
Epoch 40/60
Epoch 41/60
Epoch 42/60
Epoch 43/60
Epoch 44/60
Epoch 45/60
Epoch 46/60
Epoch 47/60
Epoch 48/60
Epoch 49/60
Epoch 50/60
Epoch 51/60
Epoch 52/60
Epoch 53/60
Epoch 54/60
Epoch 55/60
Epoch 56/60
Epoch 57/60
Epoch 58/60
Epoch 59/60
Epoch 60/60


In [20]:
print "{:.2%}".format(np.average(result.history['val_acc'][-5:]))

83.86%


### InceptionV3

* https://keras.io/applications/

In [21]:
K.clear_session()

base_model = applications.inception_v3.InceptionV3(weights='imagenet', include_top=False)
# net = base_model.output
input_tensor = Input(image_shape)
net = base_model(input_tensor)
print net
net = GlobalAveragePooling2D()(net) # 2048
net = Dense(1024, activation='relu')(net)
net = Dense(5, activation='softmax')(net)

model = Model(inputs=input_tensor, outputs=net)

# freeze whole base model
for layer in base_model.layers:
    layer.trainable = False
    
# keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

# https://github.com/tensorflow/models/blob/master/slim/scripts/finetune_inception_v3_on_flowers.sh
# step=1000 => epoch=10
# model.compile(optimizer=optimizers.RMSprop(lr=0.01, decay=0.00004), 
#               loss='categorical_crossentropy', 
#               metrics=['accuracy'])

Tensor("inception_v3/mixed10/concat:0", shape=(?, 2, 2, 2048), dtype=float32)


In [22]:
result = model.fit_generator(generator=train_generator,
                             steps_per_epoch=train_generator.samples//batch_size,
                             epochs=60,
                             validation_data=test_generator,
                             validation_steps=test_generator.samples//batch_size,
                             verbose=1)

Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60
Epoch 24/60
Epoch 25/60
Epoch 26/60
Epoch 27/60
Epoch 28/60
Epoch 29/60
Epoch 30/60
Epoch 31/60
Epoch 32/60
Epoch 33/60
Epoch 34/60
Epoch 35/60
Epoch 36/60
Epoch 37/60
Epoch 38/60
Epoch 39/60
Epoch 40/60
Epoch 41/60
Epoch 42/60
Epoch 43/60
Epoch 44/60
Epoch 45/60
Epoch 46/60
Epoch 47/60
Epoch 48/60
Epoch 49/60
Epoch 50/60
Epoch 51/60
Epoch 52/60
Epoch 53/60
Epoch 54/60
Epoch 55/60
Epoch 56/60
Epoch 57/60
Epoch 58/60
Epoch 59/60
Epoch 60/60


In [23]:
print len(model.layers[1].layers)

311


In [24]:
# train the top 2 inception blocks.
# if set trainable in base_model, which indicates models.layers[1], 
# so models.layers[1].layers[249:] is set to trainable=True.
for layer in base_model.layers[249:]:
    layer.trainable = True

In [25]:
model.compile(optimizer=optimizers.SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy', metrics=['accuracy'])

# https://github.com/tensorflow/models/blob/master/slim/scripts/finetune_inception_v3_on_flowers.sh
# step=500 => epoch=5
# model.compile(optimizer=optimizers.RMSprop(lr=0.0001, decay=0.00004), 
#               loss='categorical_crossentropy', 
#               metrics=['accuracy'])

In [26]:
result = model.fit_generator(generator=train_generator,
                             steps_per_epoch=train_generator.samples//batch_size,
                             epochs=60,
                             validation_data=test_generator,
                             validation_steps=test_generator.samples//batch_size,
                             verbose=1)

Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60
Epoch 24/60
Epoch 25/60
Epoch 26/60
Epoch 27/60
Epoch 28/60
Epoch 29/60
Epoch 30/60
Epoch 31/60
Epoch 32/60
Epoch 33/60
Epoch 34/60
Epoch 35/60
Epoch 36/60
Epoch 37/60
Epoch 38/60
Epoch 39/60
Epoch 40/60
Epoch 41/60
Epoch 42/60
Epoch 43/60
Epoch 44/60
Epoch 45/60
Epoch 46/60
Epoch 47/60
Epoch 48/60
Epoch 49/60
Epoch 50/60
Epoch 51/60
Epoch 52/60
Epoch 53/60
Epoch 54/60
Epoch 55/60
Epoch 56/60
Epoch 57/60
Epoch 58/60
Epoch 59/60
Epoch 60/60


In [27]:
print "{:.2%}".format(np.average(result.history['val_acc'][-5:]))

76.99%


### InceptionV3 - tf suggested hyperparams

* https://github.com/tensorflow/models/blob/master/slim/scripts/finetune_inception_v3_on_flowers.sh

In [52]:
K.clear_session()

base_model = applications.inception_v3.InceptionV3(weights='imagenet', include_top=False)
# net = base_model.output
input_tensor = Input(image_shape)
net = base_model(input_tensor)
print net
net = GlobalAveragePooling2D()(net) # 2048
# net = Dense(1024, activation='relu')(net) # more like original inceptionV3
net = Dense(5, activation='softmax')(net)

model = Model(inputs=input_tensor, outputs=net)

# freeze whole base model
for layer in base_model.layers:
    layer.trainable = False
    
# model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

# keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
# https://github.com/tensorflow/models/blob/master/slim/scripts/finetune_inception_v3_on_flowers.sh
# step=1000 => epoch=10
model.compile(optimizer=optimizers.RMSprop(lr=0.01, decay=0.00004), 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

Tensor("inception_v3/mixed10/concat:0", shape=(?, 2, 2, 2048), dtype=float32)


In [53]:
result = model.fit_generator(generator=train_generator,
                             steps_per_epoch=train_generator.samples//batch_size,
                             epochs=20,
                             validation_data=test_generator,
                             validation_steps=test_generator.samples//batch_size,
                             verbose=1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [54]:
# trainable=True for whole model
for layer in base_model.layers:
    layer.trainable = True

In [55]:
# model.compile(optimizer=optimizers.SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy', metrics=['accuracy'])

# https://github.com/tensorflow/models/blob/master/slim/scripts/finetune_inception_v3_on_flowers.sh
# step=500 => epoch=5
model.compile(optimizer=optimizers.RMSprop(lr=0.0001, decay=0.00004), 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

In [60]:
result = model.fit_generator(generator=train_generator,
                             steps_per_epoch=train_generator.samples//batch_size,
                             epochs=10,
                             validation_data=test_generator,
                             validation_steps=test_generator.samples//batch_size,
                             verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [61]:
#3
print "{:.2%}".format(np.average(result.history['val_acc'][-5:]))

85.84%


In [59]:
#2
print "{:.2%}".format(np.average(result.history['val_acc'][-5:]))

86.75%


In [57]:
#1
print "{:.2%}".format(np.average(result.history['val_acc'][-5:]))

85.54%


# Check `top` of InceptionV3

* tf-slim 코드를 보니 auxLogit 이라는 게 있음 (auxiliary logit)
* 아마 이게 inceptionV3 에 auxiliary classifier 부분아닐까 싶음
* 이부분이 케라스에도 동일하게 있는지 체크해보자.

In [44]:
comp_model = applications.inception_v3.InceptionV3(weights='imagenet', include_top=True)

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.5/inception_v3_weights_tf_dim_ordering_tf_kernels.h5

In [45]:
len(comp_model.layers)

313

In [51]:
comp_model.layers[-8:]

[<keras.layers.normalization.BatchNormalization at 0x7efcea946d10>,
 <keras.layers.core.Activation at 0x7efceadb0f10>,
 <keras.layers.merge.Concatenate at 0x7efceabea910>,
 <keras.layers.merge.Concatenate at 0x7efcea9c9710>,
 <keras.layers.core.Activation at 0x7efcea973750>,
 <keras.layers.merge.Concatenate at 0x7efcea90af50>,
 <keras.layers.pooling.GlobalAveragePooling2D at 0x7efceb28ce10>,
 <keras.layers.core.Dense at 0x7efcea8c4f50>]

In [50]:
base_model.layers[-8:]

[<keras.layers.core.Activation at 0x7efcf539af50>,
 <keras.layers.core.Activation at 0x7efcf536bd50>,
 <keras.layers.normalization.BatchNormalization at 0x7efcf52fd410>,
 <keras.layers.core.Activation at 0x7efcf56d3f10>,
 <keras.layers.merge.Concatenate at 0x7efcf550dc50>,
 <keras.layers.merge.Concatenate at 0x7efcf532eb10>,
 <keras.layers.core.Activation at 0x7efcf5294e50>,
 <keras.layers.merge.Concatenate at 0x7efcf5252090>]