# Transfer Learning

- CNN= Convolutional Layers+Fully Connected Layers
- Convolutional layers=Feature Extractor
- Fully Connected Layers= Classify Images
- Train a CNN on image data,top layers learn to extract general features from images such as edges, distribution of colours, etc. 
- As we keep going deep in the network, the layers tend to extract more specific features.
- we can use these pretrained models which already know how to extract features and avoid the training from scratch. This concept is known as transfer learning.

# Transfer Learning using Keras

- There are 2 ways to create models in Keras. One is the sequential model and the other is functional API. 
- The sequential model is a linear stack of layers. You can simply keep adding layers in a sequential model just by calling add method. 
- The other is functional API, which lets you create more complex models that might contain multiple input and output.

# VGG 16

In [2]:
from keras.applications.vgg16 import VGG16

## Sequential

In [3]:
import keras
model=keras.models.Sequential()
model.add(VGG16(weights='imagenet'))
model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels.h5
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
vgg16 (Functional)           (None, 1000)              138357544 
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________


## Functional

In [4]:
#input shape is the shape of an image in VGG16-224x224x3
inp=keras.Input(shape=(224,224,3))
out=VGG16(weights='imagenet')(inp)
model=keras.Model(inputs=inp,outputs=out)
model.summary()

Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
vgg16 (Functional)           (None, 1000)              138357544 
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________


To use the pretrained weights we have to set the argument weights to imagenet. The default value is also set to imagenet. But if we want to train the model from scratch, we can set the **weights argument to None**. This will initialize the weights randomly in the network.

# Attaching our own classifier

We can remove the default classifier and attach our own classifier in the pretrained model. To exclude the default classifier we have to set argument **include_top to false.**

- In the following example, I am removing default classifier from VGG, then attaching my own classifier which is just one dense layer. We also have to include a flatten layer before adding a dense layer to convert the 4D output from the Convolution layer to 2D, since the dense layer accepts 2D input.

In [5]:
model1=keras.models.Sequential()
model1.add(VGG16(include_top=False,input_shape=(224,224,3)))

model1.add(keras.layers.Flatten())
model1.add(keras.layers.Dense(10))

model1.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
vgg16 (Functional)           (None, 7, 7, 512)         14714688  
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
dense (Dense)                (None, 10)                250890    
Total params: 14,965,578
Trainable params: 14,965,578
Non-trainable params: 0
_________________________________________________________________


## Input Shape

- VGG16 is trained on RGB images of size (224, 224), which is a default input size of the network. We can also feed the input image other than the default size. But the height and width of the image should be more than 32 pixels. 

In [6]:
model1=keras.models.Sequential()
model1.add(VGG16(include_top=False,input_shape=(32,64,3)))

model1.add(keras.layers.Flatten())
model1.add(keras.layers.Dense(10))

model1.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
vgg16 (Functional)           (None, 1, 2, 512)         14714688  
_________________________________________________________________
flatten_1 (Flatten)          (None, 1024)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                10250     
Total params: 14,724,938
Trainable params: 14,724,938
Non-trainable params: 0
_________________________________________________________________


- We can also define input shape by providing input tensor as shown in the following example.

In [7]:
model=keras.models.Sequential()

inpu=keras.Input(shape=(32,64,3))

model.add(VGG16(include_top=False,input_tensor=inp))

model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
vgg16 (Functional)           (None, 7, 7, 512)         14714688  
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________


#Pooling

- We can apply 2 types of pooling on the final output from Convolution Layers. global average pooling and global maximum pooling.

- Global pooling is useful when we have a variable size of input images. Suppose we have 2 different sizes of output tensor from different sizes of images. The shape of the output tensor is (3, 3, 512) and (7, 7, 512). After applying global pooling on any of these tensors will get us a fixed-size vector of length 512. So the final output of variable size images will still be a fixed size vector after applying global pooling.

In [8]:
model=keras.models.Sequential()
model.add(VGG16(include_top=False,input_shape=(224,224,3),pooling='avg'))
model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
vgg16 (Functional)           (None, 512)               14714688  
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________


#Freezing layers

In [12]:
VGG= VGG16(include_top=False,input_shape=(224,224,3))

for layer in VGG.layers[:10]:
	layer.trainable=False

for layer in VGG.layers:
	sp=' '[len(layer.name)-9:]
	print(layer.name,sp,layer.trainable)

input_10   False
block1_conv1  False
block1_conv2  False
block1_pool  False
block2_conv1  False
block2_conv2  False
block2_pool  False
block3_conv1  False
block3_conv2  False
block3_conv3  False
block3_pool  True
block4_conv1  True
block4_conv2  True
block4_conv3  True
block4_pool  True
block5_conv1  True
block5_conv2  True
block5_conv3  True
block5_pool  True


- If the current dataset is similar to the dataset these networks were trained on, then its good to freeze all layers since images in both datasets would have similar features. But if the dataset if different then we should only freeze top layers and train bottom layers because top layers extract general features. More similar the dataset more layers we should freeze.

# Using specific layers

- In the following example, I am adding the 3rd layer of pretrained model (block1_conv2) to a sequential model.

In [14]:
model=keras.models.Sequential()
model.add(keras.layers.Conv2D(64,kernel_size=3,input_shape=(224,224,3)))
model.add(VGG16().layers[2])
model.summary()

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 222, 222, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        multiple                  36928     
Total params: 38,720
Trainable params: 38,720
Non-trainable params: 0
_________________________________________________________________
