<a href="https://colab.research.google.com/github/https-deeplearning-ai/tensorflow-1-public/blob/master/C2/W3/ungraded_lab/C2_W3_Lab_1_transfer_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ungraded Lab: Transfer Learning

In this lab, you will see how you can use a pre-trained model to achieve good results even with a small training dataset. This is called _transfer learning_ and you do this by leveraging the trained layers of an existing model and adding your own layers to fit your application. For example, you can:

1. just get the convolution layers of one model
2. attach some dense layers onto it
3. train just the dense network
4. evaluate the results

Doing this will allow you to save time building your application because you will essentially skip weeks of training time of very deep networks. You will just use the features it has learned and tweak it for your dataset. Let's see how these are done in the next sections.

**IMPORTANT NOTE:** This notebook is designed to run as a Colab. Running the notebook on your local machine might result in some of the code blocks throwing errors.

## Setup the pretrained model

You will need to prepare pretrained model and configure the layers that you need. For this exercise, you will use the convolution layers of the [InceptionV3](https://arxiv.org/abs/1512.00567) architecture as your base model. To do that, you need to:

1. Set the input shape to fit your application. In this case. set it to `150x150x3` as you've been doing in the last few labs.

2. Pick and freeze the convolution layers to take advantage of the features it has learned already.

3. Add dense layers which you will train.

Let's see how to do these in the next cells.

First, in preparing the input to the model, you want to fetch the pretrained weights of the `InceptionV3` model and remove the fully connected layer at the end because you will be replacing it later. You will also specify the input shape that your model will accept. Lastly, you want to freeze the weights of these layers because they have been trained already.

In [None]:
# Download the pre-trained weights. No top means it excludes the fully connected layer it uses for classification.
!wget --no-check-certificate \
    https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5 \
    -O /tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5

In [None]:
from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras import layers

# Set the weights file you downloaded into a variable
local_weights_file = '/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'

# Initialize the base model.
# Set the input shape and remove the dense layers.
pre_trained_model = InceptionV3(input_shape = (150, 150, 3), 
                                include_top = False, 
                                weights = None)

# Load the pre-trained weights you downloaded.
pre_trained_model.load_weights(local_weights_file)

# Freeze the weights of the layers.
for layer in pre_trained_model.layers:
  layer.trainable = False

You can see the summary of the model below. You can see that it is a very deep network. You can then select up to which point of the network you want to use. As Laurence showed in the exercise, you will use up to `mixed_7` as your base model and add to that. This is because the original last layer might be too specialized in what it has learned so it might not translate well into your application. `mixed_7` on the other hand will be more generalized and you can start with that for your application. After the exercise, feel free to modify and use other layers to see what the results you get.

In [None]:
pre_trained_model.summary()


In [None]:
# Choose `mixed_7` as the last layer of your base model
last_layer = pre_trained_model.get_layer('mixed7')
print('last layer output shape: ', last_layer.output_shape)
last_output = last_layer.output
# last_output should be the output from the layer selected as the last one desired
  # Then, the object to be referrenced is now 'last_layer'.

## Add dense layers for your classifier

Next, you will add dense layers to your model. These will be the layers that you will train and is tasked with recognizing cats and dogs. You will add a [Dropout](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout) layer as well to regularize the output and avoid overfitting.

In [None]:
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras import Model
# Here, we must use the Tensorflow's Functional API, instead of the traditional
# Sequential model API because the original model that we are importing for
# transfer learning was created using it.
# If we print the type of the pre-trained model as:
# print(f"The pretrained model has type: {type(pre_trained_model)}")
# The following output is obtained, showing that the Functional API was used, 
# In this API, layers are declared as functions applied in sequence 
# to an input X. It is totally equivalent to the traditional API, though.
  
# Flatten the output layer to 1 dimension
# Flatten the results to feed into a DNN
# Convert the images to 1-dimension NumPy arrays to be processed by the dense
# neural networks.
# The input used in Flatten must be the last output of the pre-trained model.
# We manually set is as 'mixed7', instead of the the last layer of the model.
x = layers.Flatten()(last_output)
# Now, x will be the input for the next layer. We can save the obtained output
# again as x, creating a Sequential model (the inputs flow in sequence to the
# defined networks).

# Add a fully connected layer with 1024 hidden units and ReLU activation
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# It ramdomly removes 20% of the weights of the dense neural networks to avoid
# overfitting. Close neurons may show approximately equal weights due to overfitting,
# emphasizing wrong features. Image classification problems are particularly prone to
# it, even with image augmentation. So, the Dropout is an important technique
# to avoid overfitting in deep learning models, specially for image classification.

# Add a final sigmoid layer for classification
x = layers.Dense  (1, activation='sigmoid')(x)
# Only 1 output neuron. That is because we are dealing with a BINARY classification problem
# where the output gets a value from 0-1 (representing a probability) for an image belonging to
# a given class. We label the images as 0 for one class ('humans'); and as 1 for the other ('horses',
# or vice-versa).
# Remember that the last dense layer should have a number of neurons equals to the number of
# possible classes. For regression problems, the output is a scalar (a 1-dimensional number), so the
# last dense layer must have a single neuron. For a multi-classification problem with N classes,
# we should have N neurons activated through 'softmax' in the last dense layer. Therefore, we could
# use here a Dense(2, activation = 'softmax'). On the other hand, the Dense(1, activation = 'sigmoid')
# is equivalent to the logistic regression and shows better performance for the binary classification
# then the Dense(2, activation = 'softmax') layer.
# Notice, though, that the use of Dense(1, activation = 'sigmoid') in classifications is restricted to
# the situation where we have only two classes. For more (N) classes, use Dense(N, activation = 'softmax').       

# Append the dense network to the base model
# Create the complete model by using the Model class
model =  Model(pre_trained_model.input, x)
# Again, we must use this class because the model was obtained with TensorFlow's Functional API.
# We pass the pre-trained model's input and x (the inputs and outputs of our own Dense sequential
# model) to it.
# The last layers are more specific for the particular classification problem to which the model is 
# being trained, whereas the first convolutions are more general, destined to capture important features
# of data. 
  
# Then, we use the first convolutions obtained for a very broader dataset, which will have a wider
# diversity of information than ours, and combine it to dense layers trained for our particular situation.
# Then, even if the original model was trained for a different number of classes, we can simply adjust the total
# of neurons on the last dense (or use Dense(1) activated through 'sigmoid' for the binary classification)
# to our situation: the combined model will apply the general features learned when training the original
# model and perform specific adjusts for our own problem, like using the correct number of classifications,
# or acquiring a better capability on classifying the particular characteristics of the elements on our
# dataset, which may be very different from the one used on the original training.
  
# This methodology saves time and computational resources that may be not available, since we are using a
# a model previously trained with better resources and through a very longer time.  

# Print the model summary. See your dense network connected at the end.
model.summary()

In [None]:
# Set the training parameters
model.compile(optimizer = RMSprop(learning_rate=0.0001), 
              loss = 'binary_crossentropy', 
              metrics = ['accuracy'])

# 'adam' optimizer would automatically adjust the
#learning rates, i.e., the rate of correction of the weights.
#In the beginning, a too high learning rate leads to very intense errors and
#difficulty on finding the optimum. So, in the beginning, the learning rate 
#should be low until the model finds the best direction for adjusting its parameters.
#On the other hand, at the end of the process, the learning rate should be increased.
#If not, the model would adjust its weights in a very low rate, and would show very
#few improvement from one epoch to the other.
#RMSProp does not allow us to adjust the learning rate during training, but 'adam' does
#that for us.

# To track the image throughout the neural networks, call the model.summary() method and print it:
## print(model.summary())
# Alternatively, simply call the function and declares in another cell:
## model.summary()
# Notice that, for printing the whole track of the image, this method must have no arguments, 
# so it should be declared with empty parameters.

## Prepare the dataset

Now you will prepare the dataset. This is basically the same code as the one you used in the data augmentation lab.

In [None]:
# Download the dataset
!wget https://storage.googleapis.com/tensorflow-1-public/course2/cats_and_dogs_filtered.zip

In [None]:
import os
import zipfile
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Extract the archive
zip_ref = zipfile.ZipFile("./cats_and_dogs_filtered.zip", 'r')
zip_ref.extractall("tmp/")
zip_ref.close()

# Define our example directories and files
base_dir = 'tmp/cats_and_dogs_filtered'

train_dir = os.path.join( base_dir, 'train')
validation_dir = os.path.join( base_dir, 'validation')

# Directory with training cat pictures
train_cats_dir = os.path.join(train_dir, 'cats') 

# Directory with training dog pictures
train_dogs_dir = os.path.join(train_dir, 'dogs') 

# Directory with validation cat pictures
validation_cats_dir = os.path.join(validation_dir, 'cats') 

# Directory with validation dog pictures
validation_dogs_dir = os.path.join(validation_dir, 'dogs')

# Add our data-augmentation parameters to ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255.,
                                   rotation_range = 40,
                                   width_shift_range = 0.2,
                                   height_shift_range = 0.2,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

# Note that the validation data should not be augmented!
test_datagen = ImageDataGenerator( rescale = 1.0/255. )

# Flow training images in batches of 20 using train_datagen generator
train_generator = train_datagen.flow_from_directory(train_dir,
                                                    batch_size = 20,
                                                    class_mode = 'binary', 
                                                    target_size = (150, 150))     

# Flow validation images in batches of 20 using test_datagen generator
validation_generator =  test_datagen.flow_from_directory( validation_dir,
                                                          batch_size  = 20,
                                                          class_mode  = 'binary', 
                                                          target_size = (150, 150))

## Train the model

With that, you can now train the model. You will do 20 epochs and plot the results afterwards.

In [None]:
# Train the model.
history = model.fit(
            train_generator,
            validation_data = validation_generator,
            steps_per_epoch = 100,
            epochs = 20,
            validation_steps = 50,
            verbose = 2)

## Evaluate the results

You will use the same code to plot the results. As you can see, the validation accuracy is also trending upwards as your training accuracy improves. This is a good sign that your model is no longer overfitting!

In [None]:
import matplotlib.pyplot as plt
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'r', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend(loc=0)
plt.figure()


plt.show()