#### Copyright 2019 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Transfer Learning

In Deep Learning, "transfer learning" is defined as the conveyance of knowledge from one pretrained model to a new model. This means using a pretrained model to train a new model. This approach is used in few-shot learning, where you have many class labels, and not many samples. Using transfer learning, we can train very accurate models in a fraction of the time, and with far less training data.

Transfer learning has been touted as the future of Artificial General Intelligence (AGI); it allows knowledge to be consolidated. Prebuilt models trained on [ImageNet](http://www.image-net.org) are generally good at classifying images, and using a pretrained model, we can extend the model by training it on new images. [ModelZoo](https://modelzoo.co/) is a good resource for browsing a wide variety of pretrained models for deep learning.

### Train a convolutional neural network using fastai

The first thing we need to do when training a CNN model is to change the runtime type in your colab to GPU.  It should already be activated, but be aware of this requirement, as a CPU will take much, much longer to train. This is generally going to be the case, and Google Cloud offers various levels of GPU activation.  There are also tutorials on fastai setup with many different GPU cloud services providers [here](https://course.fast.ai/start_gcp.html).

Use these magic commands to display everything correctly in Colab, as well as preserve any changes you make to the library.

In [0]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

Now you can import [fastai](https://course.fast.ai/), which is built on top of PyTorch.  [PyTorch](https://pytorch.org) is a deep learning framework like tensorflow, and fastai provides a high level API to build, train, and deploy models. Check out this [PyTorch tutorial](https://pytorch.org/tutorials/beginner/nn_tutorial.html), from Jeremy Howard, former Kaggle President, and creator of fastai.

In [0]:
from fastai.vision import *
from fastai.metrics import error_rate

If you get a memory error, it is because Colab has not provisioned enough GPU to process 64 batches at a time. You might need to reset the batch size to 16.

In [0]:
bs = 64
# Uncomment the following line if you run out of memory even after clicking
# Kernel->Restart.
# bs = 16

### Getting the Data

We will be working with is the [Oxford-IIIT Pet Dataset](http://www.robots.ox.ac.uk/~vgg/data/pets/) dataset by [O. M. Parkhi et al., 2012](http://www.robots.ox.ac.uk/~vgg/publications/2012/parkhi12a/parkhi12a.pdf). There are 37 classes contained in this dataset: 12 cat breeds and 25 dog breeds. There are roughly 200 example images of each category, for a total of 7349 images.

First, untar the data. `.tar` files are a common compression format for images, similar to zip files, but they are much smaller for portability.

In [0]:
path = untar_data(URLs.PETS); path

In [0]:
path.ls()

We can then set the paths for both images and annotations, or labels.

In [0]:
path_anno = path/'annotations'
path_img = path/'images'

We will need to extract the image labels from their filenames, so a regex is needed to complete this process.  Fastai has a method called `ImageDataBunch.from_name_re` which will sort your data and labels as they exist. 

In [0]:
fnames = get_image_files(path_img)
fnames[:5]

We create the regex pattern for the data bunch object to extract, and set a random seed for demonstration purposes.

In [0]:
np.random.seed(2)
pat = r'/([^/]+)_\d+.jpg$'

Now, let's create the data bunch with the paths, filenames, and regex to extract labels from the filenames, transforms which will provide default transformations for the images (cropping, resizing, padding, and other augmentations). Size limits are 224 pixels, so that we don't have different sized images, and their size doesn't impact performance. The batch size is already set, and normalizing the images actually standardizes the image pixel values.

In [0]:
data = ImageDataBunch.from_name_re(path_img, fnames, pat,
                                   ds_tfms=get_transforms(), size=224, bs=bs
                                  ).normalize(imagenet_stats)

We show a random selection of images after they have been processed. This will often help as a sanity check, and also to find any images that do not belong, or are otherwise unusable for the task.

In [0]:
data.show_batch(rows=3, figsize=(7, 6))

In [0]:
display(sorted(data.classes, key=lambda s: s.lower()))
len(data.classes),data.c

### Train CNN using ResNet-34

We instantiate `cnn_learner` to create a CNN model using our images object `data`, and the residual neural network (ResNet) architecture. In this case, ResNet-34 is a 34 layer neural network architecture for training CNNs. You can read deeper into residual neural networks and ResNet [here](https://arxiv.org/abs/1512.03385).  ResNet has been proven to work extremely well for most image classification tasks. There is also ResNet-50, if ResNet-34 doesn't work well, but it takes longer to train.

The model automatically creates training, validation, and test sets automatically by default, so overfitting shouldn't be a problem. You can tune all of these parameters if you need to, but the defaults should provide a good benchmark.

We will download the ResNet-34 prebuilt model. This model was trained on the ImageNet dataset of approximately 1.5 million images across a multitude of classes. This model is essentially a set of weights, stored in matrices that we will use to perform transfer learning.

In [0]:
learn = cnn_learner(data, models.resnet34, metrics=error_rate)

View the model architecture of the RNN using `.model`.

In [0]:
learn.model

Now that we have a `learn` object we can train it in one cycle, over 4 epochs.

*Note: This may take a few minutes to run.*

In [0]:
learn.fit_one_cycle(4)

Our model beat the model from the research paper associated with this dataset. Our error rate of 7%, or overall accuracy of 93% across all 37 classes of dogs and cats beat the researchers' reported accuracy of 59%.

In [0]:
# Save model locally as a pickled file.
learn.save('stage-1')

### Interpreting the Results

We can use fastai to interpret where the model that we trained is failing.

In [0]:
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()
len(data.valid_ds)==len(losses)==len(idxs)

The plot below shows the most confused classes, as well as a heatmap of the most interesting activation locations.  The loss is equivalent to the level of confidence associated with each wrong prediction.

In [0]:
interp.plot_top_losses(9, figsize=(15, 11))

We can also plot a confusion matrix.

In [0]:
interp.plot_confusion_matrix(figsize=(12, 12), dpi=60)

Below, we used the `.most_confused` method to inspect the most confused classes, and their frequency of error.

In [0]:
interp.most_confused(min_val=2)

### Iterating on the saved model

- Unfreezing
- Fine-tuning
- Learning rates

Training our model with only 4 epochs ensured that our training time would be extremely fast. fastai allowed us to use transfer learning by adding some extra layers to the ResNet-34 model and trained only those extra layers.

By unfreezing the `learn` object, we can go back and retrain the whole model

In [0]:
learn.unfreeze()

After the model is unfrozen, we can train it again using `fit_one_cycle` for 1 epoch.

In [0]:
learn.fit_one_cycle(1)

Wait, so our loss is higher now! What happened?

Our attempt at fine-tuning the model didn't work because it tried to train all the layers at the same speed. This is called the *learning rate*, and we can use fastai to tune this parameter as well.

In [0]:
learn.load('stage-1')

Now that the model is loaded, we can call `lr_find()` to plot the learning rate vs. loss, and search for the optimal learning rate (the fastest you can train the model without vanishing gradients). If you are interested, you can read more about the Vanishing Gradient Problem [here](https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484).

In [0]:
learn.lr_find()

In [0]:
learn.recorder.plot()

Now, looking at this plot we can see the learning rate explodes at around `1e-02`. We want to look for the learning rate where the loss is decreasing the fastest, or the point before it starts getting worse. A learning rate of `1e-04` meets both these criteria.

Note that we don't need to train all the layers at this rate. Because we know that the later layers of our model train fast, we can apply a range of learning rates to our `learn` object.

In [0]:
learn.unfreeze()
learn.fit_one_cycle(2, max_lr=slice(1e-4, 1e-3))

As a rule of thumb, we should pick a range that is at least 10 times smaller than the minimum learning rate observed from the plot. We are using a Python `slice` to define the range from `1e-4` to `1e-03`.

### Implement ResNet-50

By using a model that was trained on an architecture with 50 layers, we can train our model with a deeper network, and we might be able to obtain better results using this pretrained model in place of of ResNet-34.

In [0]:
data = ImageDataBunch.from_name_re(path_img, fnames, pat,
                                   ds_tfms=get_transforms(), size=299, bs=bs//2
                                   ).normalize(imagenet_stats)

In [0]:
learn = cnn_learner(data, models.resnet50, metrics=error_rate)

In [0]:
learn.lr_find()
learn.recorder.plot()

In [0]:
learn.fit_one_cycle(8)

In [0]:
learn.save('stage-1-50')

Try and fine-tune ResNet-50 model using unfreeze, and setting the range of learning rates as before.

In [0]:
learn.unfreeze()
learn.fit_one_cycle(3, max_lr=slice(1e-3, 1e-1))

If fine-tuning doesn't work, you can load your previous model, and look at the classification interpretation.

In [0]:
learn.load('stage-1-50');

In [0]:
interp = ClassificationInterpretation.from_learner(learn)

In [0]:
interp.most_confused(min_val=2)

# Resources

* [fast.ai Lesson 1](https://course.fast.ai/videos/?lesson=1)
* [fast.ai](https://www.fast.ai/)
* [fast.ai Documentation](https://docs.fast.ai/)
* [ModelZoo](https://modelzoo.co/)

# Exercises

## Exercise 1

Complete [fastai lesson 2](https://course.fast.ai/videos/?lesson=2).

### Student Solution

In [0]:
# Your answer goes here

### Answer Key

**Solution**

In [0]:
# Put the recommended solution here; if there is more than one "good" solution
# that you think students should know put those solutions in subsequent code
# boxes with "# Solution" in the first line.