> **DO NOT EDIT IF INSIDE neural_networks folder**


# Week 6: Transfer learning and multitask learning

Deep neural networks are **extremely expensive** to train. Training a good classifier on a complex task, like telling objects in images apart, or determining whether a move in a board game is good, can take weeks on multiple GPUs, cost millions of dollars in cloud computing fees and release massive amounts of CO$_2$ into the atmosphere ([some more than 5 cars over their entire lifetimes!](https://arxiv.org/pdf/1906.02243.pdf)). Because of this, we want to be able to **reuse** weights in models we have trained. This is called transfer learning. The fundamental idea is that things learned in one context can be *transferred* to another context.



## Exrcises

We will follow [a very nice blog post](https://machinelearningmastery.com/how-to-use-transfer-learning-when-developing-convolutional-neural-network-models/) written by Jason Brownlee of 'Machine Learning 
Mastery' for most of these exercises. In his blog post, Jason takes the reader through
the process of using pretrained models in Keras. Below I have outlined the steps you
will go through with reference to his blog post. I strongly recommend you read from the
top and down to 'Models for Transfer Learning' before proceeding.

### Loading pretrained models

The first practical thing we need to figure out when doing transfer learning is loading pretrained models. Keras makes this very easy by offering a number of pretrained models for image classification which can be downloaded through their [Applications API](https://keras.io/applications). 

#### Applications API arguments

When loading pretrained models, we will want to provide some arguments that depend on what
we want to do with the model after loading. Below I ask you to explain, in your own words,
what some of these parameters do. See the Application API reference on some of the models
and the 'Models for Transfer Learning' section in Jason's bloc post for help.

> **Ex. 6.1.1**: In your own words, explain what the following function arguments do in
the different model loading functions:
1. `include_top`
1. `weights`
1. `input_shape`
1. `pooling`
1. `classes`
1. Explain what 'global pooling' does, and why it is needed when `include_top=False`

#### Load a model and predict an image

> **Ex. 6.1.2**: Following Jason's example under 'Pre-Trained Model as Classifier'
classify [this image](https://images.squarespace-cdn.com/content/v1/58f0ecc029687fbef7b86b03/1583064484458-IM0UKAZIONS6E2CFCDJC/ke17ZwdGBToddI8pDm48kD5ENJpXCfmjfXuRxqpPb-1Zw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZUJFbgE-7XRK3dMEBRBhUpyN2spBBImrH38afc2UL8XBF0s2RHqmX-QW0wG37RpCsIsNysB0CO3b7e86dkNKVNs/Otter+Makes+an+Immediate+U-Turn+Back+to+the+Water.jpg?format=1500w).
Print not just the most likely label, but everything that `decode_predictions` returns.
>
> ***Important***: *Don't use VGG as he does. It's 500 MB to download and will take too long.
> Use one of the smaller models instead ([here](https://keras.io/applications/#documentation-for-individual-models)'s an overview of model sizes), such as DenseNet121.*

### Adapting pretrained models

#### Simple feature extractor for ML prediction

By removing the last layer, we can turn a pretrained convolutional neural network into a
feature extractor. We can then use it to extract features of a large number of images and
classify those using any machine learning model. Jason describes this under 'Pre-Trained Model as Feature Extractor Preprocessor'.

> **Ex. 6.2.1:** Extract features for every datapoint (i.e., image) in the [cifar10](https://keras.io/api/datasets/cifar10/) dataset and build a feature matrix X. To do so use an architecture such as DenseNet121 without the top layer. The predictions of such a model are the features which we store in X. Once X is built, train an SVM classifier on X and try to predict the  and report the accuracy on the test data.
>
> *Hint: You can import SVM from sklearn. It has a simple API, just check out some of the examples on the [documentation page](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).*
> 
> Wanna know more about cifar10? Read [here](https://www.cs.toronto.edu/~kriz/cifar.html).

#### Changing the prediction task (switching out the last layer)

Another way to achieve roughly the same thing is to remove the last layer and insert more layers. Jason describes this under 'Pre-Trained Model as Feature Extractor in Model'.

> **Ex. 6.2.2**: Do the same as above, but by following Jason's example under 'Pre-Trained Model as Feature Extractor in Mode'.  You will have to retrain the model to get weights for the new layers you've added; you should freeze the other layers.  Compare to the accuracy you got in 6.2.1.