##### Copyright 2019 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Notice
Remember to enable GPU to make everything run faster (Runtime -> Change runtime type -> Hardware accelerator -> GPU).
Also, if you run into trouble, simply reset the entire environment and start from the beginning:
*   Edit -> Clear all outputs
*   Runtime -> Reset all runtimes

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/rses-dl-course/rses-dl-course.github.io/blob/master/notebooks/python/L05_tensorflow_hub_and_transfer_learning.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/rses-dl-course/rses-dl-course.github.io/blob/master/notebooks/python/L05_tensorflow_hub_and_transfer_learning.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

# Lab 06: TensorFlow Hub and Transfer Learning

[TensorFlow Hub](http://tensorflow.org/hub) is an online repository of already trained TensorFlow models that you can use. Many of these are available through the [Keras R package](https://keras.rstudio.com/index.html)
These models can either be used as is, or they can be used for Transfer Learning.

Transfer learning is a process where you take an existing trained model, and extend it to do additional work. This involves leaving the bulk of the model unchanged, while adding and retraining the final layers, in order to get a different set of possible outputs.

In this Colab we will do both.

Here, you can see all the models available on [TensorFlow Hub](https://tfhub.dev/). You can find a list of models available through the [Keras R package](https://keras.rstudio.com/index.html) under [**Applications** in the reference manual](https://keras.rstudio.com/reference/index.html#section-applications)

## Concepts that will be covered in this Colab

1. Use a TensorFlow Hub model for prediction.
2. Use a TensorFlow Hub model for Dogs vs. Cats dataset.
3. Do simple transfer learning with TensorFlow Hub.
4. Fine-tune your transferred model

Before starting this Colab, you should reset the Colab environment by selecting `Runtime -> Reset all runtimes...` from menu above.

# Install and load dependencies


First, you'll need to install and load R package Keras which will also install TensorFlow. We'll also install package fs which has useful functionality for working with our filesystem.

In [None]:
install.packages(c("keras", "fs"))
library(keras)

# Part 1: Use a TensorFlow Hub MobileNet for prediction

In this part of the Colab, we'll take a trained model, load it and try it out.

The model that we'll use is MobileNet v2 (but any model from [tf2 compatible image classifier URL from tfhub.dev](https://tfhub.dev/s?q=tf2&module-type=image-classification) would work).

## Load the classifier

We can load the MobileNet model and create a Keras model from it using `keras` function `application_mobilenet_v2()`.
MobileNet is expecting images of 224 $\times$ 224 pixels, in 3 color channels (RGB).

In [None]:
model <- application_mobilenet_v2()

In [None]:
model

## Run it on a single image

MobileNet has been trained on the ImageNet dataset. ImageNet has 1000 different output classes, and one of them is military uniforms.
Let's get an image containing a military uniform that is not part of ImageNet, and see if our model can predict that it is a military uniform.

First let's download and preprocess our image.

In [None]:
image_res = 224
grace_hopper <- get_file('image.jpg',
                         'https://storage.googleapis.com/download.tensorflow.org/example_images/grace_hopper.jpg') %>%
                  image_load() %>% # Load downloaded image
                  image_array_resize(image_res, image_res)/255  # resize to fit model input_shape and rescale

Let's inspect some of the processed image properties.

In [None]:
range(grace_hopper)
dim(grace_hopper)

Let's create a function to plot the image and have a look at it

In [None]:
plot_rgb_image <- function(image_array){
  image_array %>%
  array_reshape(dim = c(dim(.)[1:3])) %>%
  as.raster(max = 1) %>%
  plot()
}

plot_rgb_image(grace_hopper)

Now, let's feed the image into the mobilenet model we loaded to get a prediction. Remember, models always want a batch of images to process. So here, we add a batch dimension using `array_reshape()`, and pass the image to the model for prediction.

In [None]:
result <- model %>%
  predict(array_reshape(grace_hopper, c(1, dim(grace_hopper))))

str(result)

The result is a 1 $\times$ 1000 matrix of logits, each column rating the probability of each class for the image.

Because there is only one row, we can find the top class ID directly with `which.max()` (rather than applying `which.max` to each row). But how can we know what class this actually is and in particular if that class ID in the ImageNet dataset denotes a military uniform or something else?

In [None]:
predicted_class <- which.max(result)
predicted_class

## Decode the predictions

To see what our predicted_class is in the ImageNet dataset is, we can use `keras` function `imagenet_decode_predictions()`. By default it returns information on the top 5 predicted classes.

In [None]:
imagenet_decode_predictions(result)

Bingo. Our model correctly predicted **military uniform** as the most probable class with a much higher probability than the next class (_suit_)!

# Part 2: Use a TensorFlow Hub models for the Cats vs. Dogs dataset

Now we'll use the full MobileNet model and see how it can perform on the Dogs vs. Cats dataset.

## Dataset

We download the dataset again. The dataset we are using is a filtered version of <a href="https://www.kaggle.com/c/dogs-vs-cats/data" target="_blank">Dogs vs. Cats</a> dataset from Kaggle (ultimately, this dataset is provided by Microsoft Research).

In [None]:
URL <- "https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip"
zip_dir <- get_file('cats_and_dogs_filterted.zip', origin = URL, extract = TRUE)

The dataset we have downloaded has the following directory structure.

<pre style="font-size: 10.0pt; font-family: Arial; line-height: 2; letter-spacing: 1.0pt;" >
<b>cats_and_dogs_filtered</b>
|__ <b>train</b>
    |______ <b>cats</b>: [cat.0.jpg, cat.1.jpg, cat.2.jpg ...]
    |______ <b>dogs</b>: [dog.0.jpg, dog.1.jpg, dog.2.jpg ...]
|__ <b>validation</b>
    |______ <b>cats</b>: [cat.2000.jpg, cat.2001.jpg, cat.2002.jpg ...]
    |______ <b>dogs</b>: [dog.2000.jpg, dog.2001.jpg, dog.2002.jpg ...]
</pre>

We can list the directories with the following terminal command:

In [None]:
zip_dir_base <- dirname(zip_dir)
fs::dir_tree(zip_dir_base, recurse = 2)

We'll now assign variables with the proper file path for the training and validation sets. We'll also create some variables that hold information about the size of our datasets

In [None]:
base_dir <- fs::path(zip_dir_base, "cats_and_dogs_filtered")
train_dir <- fs::path(base_dir, "train")
validation_dir <- fs::path(base_dir, "validation")

train_cats_dir <- fs::path(train_dir, "cats")
train_dogs_dir <- fs::path(train_dir, "dogs")
validation_cats_dir <- fs::path(validation_dir, "cats")
validation_dogs_dir <- fs::path(validation_dir, "dogs")


num_cats_tr <- length(fs::dir_ls(train_cats_dir))
num_dogs_tr <- length(fs::dir_ls(train_dogs_dir))

num_cats_val <- length(fs::dir_ls(validation_cats_dir))
num_dogs_val <- length(fs::dir_ls(validation_dogs_dir))

total_train <- num_cats_tr + num_dogs_tr
total_val <- num_cats_val + num_dogs_val

Lets create an image generator to read our images from their directories which rescales our image to values from 0 to 1. Let's also create a flow from our training directories which also resizes our images to the resolution expected by our mobilenet model. Let's also set the batch to 10

In [None]:
pred_image_generator <- image_data_generator(rescale = 1/255)

batch_size <- 10
pred_data_gen <- flow_images_from_directory(directory = train_dir,
                                             generator = pred_image_generator,
                                             target_size = c(image_res, image_res),
                                             class_mode = "binary",
                                             batch_size = batch_size)

Now that we've created an image generator to read our images from their directories, lets get the first batch.

In [None]:
pred_batch <- pred_data_gen[0]
str(pred_batch)

Now, we can use model to predict a class for the images contained in the first element of the generator output. Let's also decode the prediction, returning only the top predicted class for each image, and bind all rows together into a single data.frame.

In [None]:
predictions <- model %>%
  predict(pred_batch[[1]]) %>%
  imagenet_decode_predictions(top = 1) %>%
  do.call(rbind, .)


Let's add the actual labels for each row to the data.frame and have a look at our predictions. We can see that the mobilenet predictions are alot more fine-grained but overall it seems to be doing a pretty good job at discriminating between cats and dogs!

In [None]:
predictions$label <- names(pred_data_gen$class_indices)[pred_batch[[2]] + 1]
predictions

Let's now plot the images from our Dogs vs Cats dataset and put the ImageNet predicted labels above them and their actual labels below by modifying our rgb plotting function

In [None]:
plot_rgb_image_ttl <- function(image_array, prediction){
  image_array %>%
  array_reshape(dim = c(dim(.)[1:3])) %>%
  as.raster(max = 1) %>%
  plot()
  title(main = paste0(prediction$class_description, 
                      " (", format(prediction$score * 100, digits = 2), "%)"),
        sub = prediction$label)
}

In [None]:
options(repr.plot.width = 16, repr.plot.height = 8)

# Set the layout
layout(matrix(1:batch_size, ncol = 5))

# Loop plotting over the batch of images
for(i in 1:batch_size){
  plot_rgb_image_ttl(pred_batch[[1]][i,,,], predictions[i,])
  }

# Part 3: Do simple transfer learning with pretrained MobileNet V2 model architecture

Let's now use the [MobileNet V2 model architecture](https://keras.rstudio.com/reference/application_mobilenet_v2.html) to do Transfer Learning.

With transfer learning we reuse parts of an already trained model and change the final layer, or several layers, of the model, and then retrain those layers on our own dataset.

In addition to the complete models, models available through `keras` listed under [applications](https://keras.rstudio.com/reference/index.html#section-applications) in the documentation can be loaded without the last classification layer. These can be used to easily do transfer learning. To do so, we set `include_top = FALSE` when loading. This specifies whether to include the fully-connected layer at the top of the network. We also set `pooling = "avg"`. This is an optional pooling mode for feature extraction when `include_top` is `FALSE`. `avg` means that global average pooling will be applied to the output of the last convolutional layer, and thus the output of the model will be a 2D tensor.

We'll also continue to use the Dogs vs Cats dataset, so we will be able to compare the performance of this model against the ones we created from scratch earlier.

Note that we're calling the partial model (without the final classification layer) a `feature_extractor`. The reasoning for this term is that it will take the input all the way to a layer containing a number of features. So it has done the bulk of the work in identifying the content of an image, except for creating the final probability distribution. That is, it has extracted the features of the image.

In [None]:
feature_extractor <- application_mobilenet_v2(input_shape = c(image_res, image_res, 3),
                                      include_top = FALSE, pooling = "avg")

Let's run the batch of cat & dog images we've already loaded through this, and see the final shape. 10 is the number of images, and 1280 is the number of neurons in the last layer of the partial model.

In [None]:
feature_batch <- feature_extractor(pred_batch[[1]])
dim(feature_batch)

### Prepare image data generators

Before we proceed let's set up our image generators. We'll use the data augmentation generators we developed in the previous lab. We'll also set our batch size to 32.

In [None]:
batch_size <- 32
# training generators
train_image_generator <- image_data_generator(rescale = 1/255,
                                              rotation_range = 45,
                                              width_shift_range = 0.2,
                                              height_shift_range = 0.2,
                                              shear_range = 0.2,
                                              zoom_range = 0.2,
                                              horizontal_flip = TRUE,
                                              fill_mode = 'nearest')

train_data_gen <- flow_images_from_directory(directory = train_dir,
                                             generator = train_image_generator,
                                             target_size = c(image_res, image_res),
                                             class_mode = "binary",
                                             batch_size = batch_size)

# validation generators
val_image_generator <- image_data_generator(rescale = 1/255)

val_data_gen <- flow_images_from_directory(directory = validation_dir,
                                             generator = val_image_generator,
                                             target_size = c(image_res, image_res),
                                             class_mode = "binary",
                                             batch_size = batch_size)

Now, lets go back to our model and freeze the variables in the feature extractor layer, so that the training only modifies the final classifier layer. 

In [None]:
freeze_weights(feature_extractor)

Now let's use are feature extractor as part of a keras sequential model, and add a new classification layer of 2 units with softmax activation. Note that when we print out the structure of our feature extractor we can see that the trainable parameters are 2562 and are associated with the last dense layer we have added.

In [None]:
model <- keras_model_sequential() %>%
          feature_extractor() %>%
          layer_dense(units = 2, activation = "softmax")
model

## Train the model

We now train this model like any other using an image generator, by first calling `compile` followed by `fit`. 

In [None]:
model %>% compile(optimizer="adam",
                  loss = 'sparse_categorical_crossentropy',
                  metrics = "accuracy")

epochs = 6
history <- model %>%
              fit(x = train_data_gen,
              epochs = epochs,
              validation_data = val_data_gen)

cat('Validation loss:', format(tail(history$metrics$val_loss, 1), digits = 2), "\n")
cat('Validation accuracy:', format(tail(history$metrics$val_accuracy, 1), digits = 2), "\n")

You can see we get ~98% validation accuracy, which is absolutely awesome. This is a huge improvement over the model we created in the previous lesson, where we were able to get ~77% accuracy after 40 epochs of training. The reason for this difference is that MobileNet was carefully designed over a long time by experts, then trained on a massive dataset (ImageNet).

Let's plot the training and validation accuracy/loss graphs.

In [None]:
# Plot history
options(repr.plot.width = 12, repr.plot.height = 6)
plot(history)

What is a bit curious here is that validation performance is better than training performance, right from the start to the end of execution.

One reason for this is that validation performance is measured at the end of the epoch, but training performance is the average values across the epoch.

The bigger reason though is that we're reusing a large part of MobileNet which is already trained on Dogs and Cats images. While doing training, the network is still performing image augmentation on the training images, but not on the validation dataset. This means the training images may be harder to classify compared to the normal images in the validation dataset.

## Check the predictions

To redo the plot from before, let's use our new model to make some predictions on the images from our `pred_batch`. 

Let's compile our predictions into a data.frame. Let's also add the actual labels for each row to the data.frame and have a look at our predictions. We can see that predictions using our transfer learning very accurate with very high confidence in each prediciton!

In [None]:
pred_mat <- model %>%
  predict(pred_batch[[1]]) 

predictions <- data.frame(class_description = names(pred_data_gen$class_indices)[apply(pred_mat, 1, which.max)],
                          score = apply(pred_mat, 1, max),
                          label = names(pred_data_gen$class_indices)[pred_batch[[2]] + 1])
predictions

Let's now plot the images from our Dogs vs Cats dataset and put the ImageNet labels above them and their actual labels below by modifying our rgb plotting function

In [None]:
options(repr.plot.width = 16, repr.plot.height = 8)

# Set the layout
layout(matrix(1:10, ncol = 5))

# Loop plotting over the batch of images
for(i in 1:10){
  plot_rgb_image_ttl(pred_batch[[1]][i,,,], predictions[i,])
  }

## Fine-tuning the transferred model

In our previous example, we froze the `feature_extractor` so that its weights does not update during training. In some cases, performance can be further improved by doing further training of the layer, this is called fine-tuning.

This fine-tuning process is done after we've trained the model with our custom classifier. First we'll want to unfreeze the `feature_extractor`. We can see now that model parameters associated with the `feature_extractor` are now also trainable.

In [None]:
unfreeze_weights(feature_extractor)
model

Then re-compile the model. As you are training a much larger model and want to readapt the pretrained weights, it is important to use a lower learning rate at this stage. Otherwise, your model could overfit very quickly.

In [None]:
model %>% compile(optimizer = optimizer_adam(lr = 0.0001),
                  loss = 'sparse_categorical_crossentropy',
                  metrics = "accuracy")

Finally, re-fit the model, print the validation metrics and plot model fit history

In [None]:
epochs = 6
history <- model %>%
              fit(x = train_data_gen,
              epochs = epochs,
              validation_data = val_data_gen)

cat('Validation loss:', format(tail(history$metrics$val_loss, 1), digits = 2), "\n")
cat('Validation accuracy:', format(tail(history$metrics$val_accuracy, 1), digits = 2), "\n")

# Plot history
options(repr.plot.width = 12, repr.plot.height = 6)
plot(history)