# Transfer learning in CNN

Welcome to this new chapter in CNN.
Here you will learn how to use a trained model to stimulate the learning of your own model.

This course is divided into three steps:

1.   Import and implement a pre-train model in our own model.
2.   Freeze layers in the pre-train model
3.   Use your own previous models to improve the precision of a brand new one.


## General introduction

Have you heard of the term "don't reinvent the wheel" ?
As developers, we are often self determined to create a new application, a new tool or a new software from scratch.
However, as much as intriguing and exiting it is, there is a lot of chance that somebody already built your tool in a much better way. Yes, I know it's sad, but there is also a lot of chance that this same tool has been open sourced for you to use freely! Yeah!!!

In data science, it is even more the case, as training a model from scratch require a tremendous amount of data requiring weeks of preparations.

This is why having the possibility to share knowledge between one model to another became a must, and this phenomenon was baptised: transfer learning


#### Transfer learning 

Transfer learning refer to the situation where what has been learned in one setting … is exploited to improve generalization in another setting.

It allows a model trained on one task to be re-purposed on a second related task. It allow rapid progress or improved performance when modeling the second task.


Here is the difference between creating your model from scratch and using Transfer learning:

Bellow in Strategy 1, we are creating a model from scratch using only on input data and letting or Neural Network do the heavy lifting over all of our layers.

however in Strategy 3, We are actualy using a previous model where we **FREEZE** its layers to freeze the weight saved in each layer of the network, then we add the additional layers for our specific use case.

![alt text](https://miro.medium.com/max/5994/1*9t7Po_ZFsT5_lZj445c-Lw.png)


#### the 3 adventages of using transfer learning 



1.   **Higher start**: The initial skill (before refining the model) on the new model is higher than it otherwise would be. In order word you get a boost at the intial phase of the training

2.   **Higher slope**: The rate of improvement of skill during training of the source model is steeper than it otherwise would be. Using pre-trained layers you use previous learning to increase the speed of your new model learning rate

3.   **Higher asymptote**: The converged skill of the trained model is better than it otherwise would be. The predictions are better


![alt text](https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2017/09/Three-ways-in-which-transfer-might-improve-learning.png
)



### Project overview 

Here we will use a dataset that you are already familiar with: Cat and Dog.
You can of course follow this tutorial using any dataset you want. But I recommend following the course using the dataset provided.



We will start by talking about the project we would like to do:

We would like to create a model that will allow us to categorize an image as a cat or a dog at a high rate of prediction.

However, we only have a small number of images to do so: 4000 images of cats and 4000 images of dogs.

In order to increase or precision, we will use the VGG-16 model to "transfer its learning"


### But what is VGG-16 ?

Well I am glad you asked Timmy!


Here it is :

![alt text](https://neurohive.io/wp-content/uploads/2018/11/vgg16-1-e1542731207177.png)


VGG16 is a convolutionary neural network model presented by K. Simonyan and A. Zisserman of the University of Oxford in the newspaper "Very Deep Convolutional Networks for Large-Scale Image Recognition".

It trained more than 14 million images to predict 1,000 different classes. You could say it's a pretty good model to use as a basic model.


By the way, you need to be sure now that you understand the different layers described on the image!

Can you guess what the full neck + Relu is in Keras?





## Coding time

We will start by creating a new model by using VGG-16 as the base layer,
Then we will add a couple of Dense layers before finishing with a softmax layer with 2 nodes to predict wether we have a cat or a dog.

Let start with using the VGG-16 as our base layer.





In [1]:
from keras.models import Model, Sequential
from keras.layers import Dense,GlobalAveragePooling2D,Dropout,Flatten, Conv2D, MaxPooling2D
from keras.applications.vgg16 import VGG16

# lets initialize the VGG-16 model
# We then remove the final layer of the model as we will add our own to only classify cats and dogs
# We also decide the size of the input images: here they are 64px by 64px.

prior_model = VGG16(weights='imagenet',include_top=False, input_shape=(64,64,3))

# lets create our model

model = Sequential()

# and here we add a all the VGG16 as a layer

model.add(prior_model)



Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5


We can now Check how our model looks like by using the summary methods

In [2]:
# apply the summary method on the model
#----- HERE -------

model.summary()

#------------------

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
vgg16 (Functional)           (None, 2, 2, 512)         14714688  
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________


Pay attention to the layer type: Model!

With Keras, you can actually add a full model as a layer.

Lets check now how this layer/model is composed of.

In [3]:
model.layers[0].summary()

Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 64, 64, 3)]       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 64, 64, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 64, 64, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 32, 32, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 32, 32, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 32, 32, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 16, 16, 128)       0     

We see that VGG-16 is composed of Conv2D layers.
and its final layer is a MaxPooling2D layer.

In order to finish our model, we need to flatten it before providing it a Dense layer for the classification.

You could add a couple of additional layers such as a Dropout or an other Dense layer before adding the softmax's one just like bellow.

In [4]:
model.add(Flatten())
model.add(Dense(256,activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(2, activation='softmax'))

In [5]:
# lets check the summary of our model 
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
vgg16 (Functional)           (None, 2, 2, 512)         14714688  
_________________________________________________________________
flatten (Flatten)            (None, 2048)              0         
_________________________________________________________________
dense (Dense)                (None, 256)               524544    
_________________________________________________________________
dropout (Dropout)            (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 514       
Total params: 15,239,746
Trainable params: 15,239,746
Non-trainable params: 0
_________________________________________________________________


## Freezing the layers of our prior model

Now that we have all the layers set up, we need to freeze those within the prior model and this for a simple reason: **we don't want to train it again as we will need its knowledge to boost the learning and precision of our own model !!!**

To do so, we are going to loop over all the layers in the VGG model and set them to learning false, "freezing" the weights already saved inside the model.

In [6]:
for layers in model.layers[0].layers: # looping over each layers in layer 0 to freeze them
    layers.trainable = False

model.layers[0].trainable = False # freezing layer 0 as well for good measure

Now that our model is ready: it's time for you to do the next steps for image classifications.




In [7]:
from keras.preprocessing.image import ImageDataGenerator

# compiling the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


# defining the constants for the model training

BATCH_SIZE = 32
EPOCHS = 20
URL_TRAINING = './training_set' 
URL_TESTING = './test_set' 


# creating the image generator

generator = ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    rescale=1/255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest',
    validation_split=0.2
)
test_generator = ImageDataGenerator(
    rescale=1/255,
)


# creating the train and test sets

train_set = generator.flow_from_directory(URL_TRAINING, target_size=(64,64), batch_size=BATCH_SIZE)
test_set = test_generator.flow_from_directory(URL_TESTING, target_size=(64,64), batch_size=BATCH_SIZE)

 
# fitting the model

model.fit_generator(train_set, steps_per_epoch=len(train_set.filenames)//BATCH_SIZE, epochs=EPOCHS, validation_data = test_set, validation_steps=len(test_set.filenames)//BATCH_SIZE )


Found 1589 images belonging to 2 classes.
Found 378 images belonging to 2 classes.
Instructions for updating:
Please use Model.fit, which supports generators.
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
 2/49 [>.............................] - ETA: 16s - loss: 0.4647 - accuracy: 0.7736

KeyboardInterrupt: 

### Testing

Now it's time to test our brand new model using transfer learning!
Then save it!

In [7]:
import numpy as np
from keras.preprocessing import image

TEST_IMAGE_URL = './test_image.jpg'

test_image = image.load_img( TEST_IMAGE_URL , target_size = (64, 64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)
result = model.predict(test_image)
print(result)

[[0. 1.]]


In [8]:
model.evaluate_generator(test_set, steps=BATCH_SIZE)

[0.448882337615424, 0.7549407107556761]

In [9]:
# time to save the model

PATH = './64_by_64.h5'

model.save(PATH)

# Conclusion


Here we go! You have now mastered the the super power of transfer learning:

1.   You know how to use a previous model as the a launching pad for a new model.


2.   You know how to freeze layers in a model


Next stop we will use the model we've just saved in to use it as the base of a brand new model in order to increase its accuracy using Progressive Resizing .



