https://www.analyticsvidhya.com/blog/2017/06/transfer-learning-the-art-of-fine-tuning-a-pre-trained-model/

In [1]:
import pandas as pd
import numpy as np
import os
import keras
import matplotlib.pyplot as plt
from keras.layers import Dense,GlobalAveragePooling2D
from keras.applications import MobileNet
from keras.preprocessing import image
from keras.applications.mobilenet import preprocess_input
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model
from keras.optimizers import Adam
from keras.preprocessing.image import img_to_array, array_to_img


Using TensorFlow backend.


load the mnist dataset and common parts to all scenarios

In [25]:
train=pd.read_csv("data/mnist_train.csv")
test=pd.read_csv("data/mnist_test.csv")
train_path="data/"
test_path="data/"

#converting in numpy and apply some preprocess using keras_preprocess_input

train_img=train.iloc[:,1:].values
train_img=preprocess_input(train_img)


test_img=test.iloc[:,1:].values
test_img=preprocess_input(test_img)



In [29]:
# Convert the images into 3 channels

train_img=np.dstack([train_img] * 3)
test_img=np.dstack([test_img]*3)
train_img.shape,test_img.shape

((59999, 784, 3), (9999, 784, 3))

In [32]:
# Reshape images as per the tensor format required by tensorflow

train_img = train_img.reshape(-1, 28,28,3)
test_img= test_img.reshape (-1,28,28,3)
train_img.shape,test_img.shape

((59999, 28, 28, 3), (9999, 28, 28, 3))

In [37]:
# Resize the images 48*48 as required by VGG16

train_img = np.asarray([img_to_array(array_to_img(im, scale=False).resize((48,48))) for im in train_img])
test_img = np.asarray([img_to_array(array_to_img(im, scale=False).resize((48,48))) for im in test_img])
train_img.shape, test_img.shape

((59999, 48, 48, 3), (9999, 48, 48, 3))

In [42]:
# Normalise the data and change data type

train_img = train_img / 255.
test_img = test_img / 255.
train_img = train_img.astype('float32')
test_img = test_img.astype('float32')

# Scenario 1

size small similarity very high !! 

Here we use vgg16 as a feature extractor. We then use these features and send them to dense layers which are trained according to our data set. The output layer is also replaced with our new softmax layer relevant to our problem. The output layer in a vgg16 is a softmax activation with 1000 categories. We remove this layer and replace it with a softmax layer of 10 categories. We just train the weights of these layers and try to identify the digits.



In [58]:
model=MobileNet(weights='imagenet',include_top=False,input_shape=(48, 48, 3)) #imports the mobilenet model and discards the last 1000 neuron layer.


In [59]:
model.summary() # just the convolutional layers not the dens/ final ones 

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_7 (InputLayer)         (None, 48, 48, 3)         0         
_________________________________________________________________
conv1_pad (ZeroPadding2D)    (None, 49, 49, 3)         0         
_________________________________________________________________
conv1 (Conv2D)               (None, 24, 24, 32)        864       
_________________________________________________________________
conv1_bn (BatchNormalization (None, 24, 24, 32)        128       
_________________________________________________________________
conv1_relu (ReLU)            (None, 24, 24, 32)        0         
_________________________________________________________________
conv_dw_1 (DepthwiseConv2D)  (None, 24, 24, 32)        288       
_________________________________________________________________
conv_dw_1_bn (BatchNormaliza (None, 24, 24, 32)        128       
__________

In [60]:
features_train=model.predict(train_img)
# Extracting features from the train dataset using the VGG16 pre-trained model
features_train.shape

(59999, 1, 1, 1024)

In [64]:
# flattening the layers to conform to MLP input
train_x=features_train.reshape(59999,1024)

# converting target variable to array
train_y=np.asarray(train.iloc[:,0])
train_y=pd.get_dummies(train_y)
train_y=np.array(train_y)

# creating training and validation set
from sklearn.model_selection import train_test_split
X_train, X_valid, Y_train, Y_valid=train_test_split(train_x,train_y,test_size=0.3, random_state=42)

In [73]:
# creating a mlp model
from keras.layers import Dense, Activation
from keras.models import Sequential 

model=Sequential()

model.add(Dense(1000, input_dim=1024, activation='relu',kernel_initializer='uniform'))
keras.layers.core.Dropout(0.3, noise_shape=None, seed=None)

model.add(Dense(500,input_dim=1000,activation='sigmoid'))
keras.layers.core.Dropout(0.4, noise_shape=None, seed=None)

model.add(Dense(150,input_dim=500,activation='sigmoid'))
keras.layers.core.Dropout(0.2, noise_shape=None, seed=None)

model.add(Dense(units=10))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer="adam", metrics=['accuracy'])

# fitting the model 

model.fit(X_train, Y_train, epochs=20, batch_size=128,validation_data=(X_valid,Y_valid))

 

#  Scenario 2

 Here what we do is we freeze the weights of the first 8 layers of the vgg16 network, while we retrain the subsequent layers. This is because the first few layers capture universal features like curves and edges that are also relevant to our new problem. We want to keep those weights intact and we will get the network to focus on learning dataset-specific features in the subsequent layers.

In [2]:
from IPython.core.debugger import set_trace
from keras.applications.vgg16 import VGG16

In [3]:
model = VGG16(weights='imagenet', include_top=True)

W0129 16:19:22.544286  7068 deprecation_wrapper.py:119] From c:\users\camng3\appdata\local\programs\python\python37\lib\site-packages\keras\backend\tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0129 16:19:22.560311  7068 deprecation_wrapper.py:119] From c:\users\camng3\appdata\local\programs\python\python37\lib\site-packages\keras\backend\tensorflow_backend.py:519: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0129 16:19:22.562309  7068 deprecation_wrapper.py:119] From c:\users\camng3\appdata\local\programs\python\python37\lib\site-packages\keras\backend\tensorflow_backend.py:4140: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0129 16:19:22.582308  7068 deprecation_wrapper.py:119] From c:\users\camng3\appdata\local\programs\python\python37\lib\site-packages\keras\backend\tensorflow_backend.py:3978: The name tf.nn.max_pool is deprecate

In [4]:
len(model.layers)

23

In [5]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

In [6]:
#removing the output layer
model.layers.pop()

<keras.layers.core.Dense at 0x2284ee2d3c8>

In [7]:
model.layers

[<keras.engine.input_layer.InputLayer at 0x2284dc0a3c8>,
 <keras.layers.convolutional.Conv2D at 0x2284dc0a470>,
 <keras.layers.convolutional.Conv2D at 0x2282f4ec048>,
 <keras.layers.pooling.MaxPooling2D at 0x2284dc63278>,
 <keras.layers.convolutional.Conv2D at 0x2284dc0a780>,
 <keras.layers.convolutional.Conv2D at 0x2284ec5eb00>,
 <keras.layers.pooling.MaxPooling2D at 0x2284ec73e80>,
 <keras.layers.convolutional.Conv2D at 0x2284ec8f400>,
 <keras.layers.convolutional.Conv2D at 0x2284ecab9e8>,
 <keras.layers.convolutional.Conv2D at 0x2284ecc1be0>,
 <keras.layers.pooling.MaxPooling2D at 0x2284ecfa320>,
 <keras.layers.convolutional.Conv2D at 0x2284ecfaf28>,
 <keras.layers.convolutional.Conv2D at 0x2284ed29240>,
 <keras.layers.convolutional.Conv2D at 0x2284ed29e48>,
 <keras.layers.pooling.MaxPooling2D at 0x2284ed5d630>,
 <keras.layers.convolutional.Conv2D at 0x2284ed5dcf8>,
 <keras.layers.convolutional.Conv2D at 0x2284ed935c0>,
 <keras.layers.convolutional.Conv2D at 0x2284edac4e0>,
 <keras.

In [8]:
#the new output is the output of the last available layers keras.layers.core.Dense (x,4096)
model.outputs = [model.layers[-1].output]
model.outputs

[<tf.Tensor 'fc2/Relu:0' shape=(?, 4096) dtype=float32>]

In [9]:
model.layers[-1].outbound_nodes = [] # ???

In [15]:
x=Dense(10, activation='softmax')(model.output)

In [30]:
model.input

<tf.Tensor 'input_1:0' shape=(?, 224, 224, 3) dtype=float32>

In [16]:
model=Model(model.input,x)

In [19]:
model.output

<tf.Tensor 'dense_2/Softmax:0' shape=(?, 10) dtype=float32>

In [27]:
model.get_config()


{'name': 'model_2',
 'layers': [{'name': 'input_1',
   'class_name': 'InputLayer',
   'config': {'batch_input_shape': (None, 224, 224, 3),
    'dtype': 'float32',
    'sparse': False,
    'name': 'input_1'},
   'inbound_nodes': []},
  {'name': 'block1_conv1',
   'class_name': 'Conv2D',
   'config': {'name': 'block1_conv1',
    'trainable': False,
    'filters': 64,
    'kernel_size': (3, 3),
    'strides': (1, 1),
    'padding': 'same',
    'data_format': 'channels_last',
    'dilation_rate': (1, 1),
    'activation': 'relu',
    'use_bias': True,
    'kernel_initializer': {'class_name': 'VarianceScaling',
     'config': {'scale': 1.0,
      'mode': 'fan_avg',
      'distribution': 'uniform',
      'seed': None}},
    'bias_initializer': {'class_name': 'Zeros', 'config': {}},
    'kernel_regularizer': None,
    'bias_regularizer': None,
    'activity_regularizer': None,
    'kernel_constraint': None,
    'bias_constraint': None},
   'inbound_nodes': [[['input_1', 0, 0, {}]]]},
  {'name

In [29]:
model.layers[:8]

[<keras.engine.input_layer.InputLayer at 0x2284dc0a3c8>,
 <keras.layers.convolutional.Conv2D at 0x2284dc0a470>,
 <keras.layers.convolutional.Conv2D at 0x2282f4ec048>,
 <keras.layers.pooling.MaxPooling2D at 0x2284dc63278>,
 <keras.layers.convolutional.Conv2D at 0x2284dc0a780>,
 <keras.layers.convolutional.Conv2D at 0x2284ec5eb00>,
 <keras.layers.pooling.MaxPooling2D at 0x2284ec73e80>,
 <keras.layers.convolutional.Conv2D at 0x2284ec8f400>]

In [22]:
#To set the first 8 layers to non-trainable (weights will not be updated)

for layer in model.layers[:8]:
    layer.trainable = False



In [28]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________