# Transfer Learning 

It is the technique to transfer what has been learned previously to new related tasks. Generally, traditional models work in isolation. Transfer learning overcomes the isolated learning paradigm and utilises knowledge acquired for one problem to solve related ones. 

Transfer learning is usually expressed through the use of pre-trained models. A pre-trained model is a model that was trained on a large benchmark dataset(like ImageNet) to solve a problem similar to the one that we want to solve.

When you are repurposing a pre-trained model for your own needs, you start by removing the original classifier. You add a new classifier that fits your purposes, and finally, you have to fine-tune your model according to one of three strategies:
* Train the entire model. In this case, you use the architecture of the 
pre-trained model and train it according to your dataset. You are training the model from scratch and thus, need a large dataset (and much computational power).
* Train some layers and leave the others frozen. As you remember, lower layers refer to general features (problem independent), while higher layers refer to specific features (problem-dependent). Here, we play with that dichotomy by choosing how much we want to adjust the network's weights (a frozen layer does not change during training). Usually, if you have a small dataset and many parameters, you will leave more layers frozen to avoid overfitting. By contrast, if the dataset is large and the number of parameters is small, you can improve your model by training more layers to the new task since overfitting is not an issue.
* Freeze the convolutional base. This case corresponds to an extreme situation of the train/freeze trade-off. The main idea is to keep the convolutional base in its original form and then use its outputs to feed the classifier. You are using the pre-trained model as a fixed feature extraction mechanism, which can be helpful if you are short on computational power, your dataset is small, and/or a pre-trained model solves a problem very similar to the one you want to solve.


# A sample code explaination using VGG16 pre-trained model

In [None]:
import tensorflow
from tensorflow.keras.applications import VGG16

# Initialize the Pretrained Model
feature_extractor = VGG16(weights='imagenet', 
                             input_shape=(224, 224, 3),
                             include_top=False)

# Set this parameter to make sure it's not being trained
feature_extractor.trainable = False

# Set the input layer
input_ = tensorflow.keras.Input(shape=(224, 224, 3))

# Set the feature extractor layer
x = feature_extractor(input_, training=False)

# Set the pooling layer
x = tensorflow.keras.layers.GlobalAveragePooling2D()(x)

# Set the final layer with sigmoid activation function
output_ = tensorflow.keras.layers.Dense(1, activation='sigmoid')(x)

# Create the new model object
model = tensorflow.keras.Model(input_, output_)

# Compile it
model.compile(optimizer='adam',
             loss='binary_crossentropy',
             metrics=['accuracy'])

# Print The Summary of The Model
model.summary()


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_6 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 vgg16 (Functional)          (None, 7, 7, 512)         14714688  
                                                                 
 global_average_pooling2d_1   (None, 512)              0         
 (GlobalAveragePooling2D)                                        
                                                                 
 dense_1 (Dense)             (None, 1)                 513       
                                                                 
Total params: 14,715,201
Trainable params: 513
Non-trainable params: 14,714,688
______________________________________

We have used VGG16 as the backbone for our new model. We have created the input layer and changed the final linear layer of ResNet-50 with the new one based on the number of classes. Note that we set the base model (VGG16) as non trainable because we are using the pre-trained weights of the base model.

#Inception net
Inception Net has several versions. 
They are as follows:
 * [Inception v1](https://arxiv.org/pdf/1409.4842v1.pdf)
 * [Inception v2 and Inception v3](https://arxiv.org/pdf/1512.00567v3.pdf)
 * [Inception v4 and Inception-ResNet](https://arxiv.org/pdf/1602.07261.pdf)

Go through the papers if you want to know more about the architecture of Inception Net.




# This week's work
This week you have to train your transfer learning model using [InceptionNet v3](https://keras.io/api/applications/inceptionv3/) which is readily available in keras instead of VGG16 which has been shown here. 