### Transfer Learning

It's a technique of adapting a pre-trained model to a new task. What does **pre-trained model** mean? These are models that have already been trained on a different dataset, most commonly [ImageNet](http://www.image-net.org/) due to the size of the database and number of classes (1000). Due to the sheer volume of data, these sort of models are difficult to train as they take a very long time even weeks.

Famous architectures: VGG, ResNet, DenseNet etc.

Essentially, what transfer learning allows us is to make good use of an already trained model and apply the knowledge on another task.

### Useful Layers

When considering transfer learning it is important to recognize that a pre-trained model has already been trained for a certain dataset so in order to get the most out of it we would have to replace the last layers because they were already doing a very specialized task so unless our dataset is very similar we will replace it, adjust the number of output classes, and train only the **classifier** (last layer). 

Therefore, we can use the model as a **feature detector** that has already done most of the work for us and correlate those outputs with the newly trained classifier. Below an example of VGG model.

![Feature extractor](part3_images/feature_extractor.png)

### Fine-Tuning

A technique that involves slightly or completely modifying the parameters of the model. 

Applying transfer learning depends greatly on the size of the data set and the similarity between the original and the new data set. There are four main cases:

1. New data set is small, new data is similar to original training data.
2. New data set is small, new data is different from original training data.
3. New data set is large, new data is similar to original training data.
4. New data set is large, new data is different from original training data.

![Guide to Transfer Learning](part3_images/guide_transfer_learning.png)

**Case 1: new data is small and similar to original training data (End of ConvNet)**
- replace the last layer with a new fully connected layer with the appropriate number of classes, then retrain it but keep the pre-trained model's weights and parameters frozen. 
- the weights are initialized randomly and we are only training the last layer's weights.

**Case 2: new data is small but different from original training data (Start of ConvNet)**
- remove all except for the first few layers of the pre-trained model and add a new fully connected layer to match the output classes
- weights are initialized randomly, weights of the remaining pre-trained model are frozen so that we are only training the last layer.

**Case 3: new data is large but similar to original training data (Fine-tune)**
- remove and replace last fully connected layer with a new one that has the appropriate number of classes
- randomly initialize the weights for the last layer
- initialize the rest of the weights using the pre-trained weights (unfreezing)
- re-train the entire neural network

**Case 4: new data set is large but different from the original training data (Fine-tune or retrain)**
- remove and replace last fully connected layer with a new one that has the appropriate number of classes
- retrain the network from scratch with randomly initialized weights
- there is the option of using the strategies from case 3