# Transfer Learning 
We're going to look at it in the context of image processing!

#### Transfer Learning means reusing an already trained network for a similar purpose.

The benefits of transfer learning are:

- you can reuse pre-trained networks
- it saves lots of training time
- it allows you to train with very small training datasets

## There are three strategies to reuse a pretrained network:

### Use intermediate output 
- Use the intermediate output of one network layer (e.g. the last fully connected layer before the output). 
- Use the output as input for a conventional Machine Learning model (e.g. a SVM). 
- This approach is successfully used for image classification tasks.

### Partial Retraining
- If you exchange the last layers of the network (e.g. change the number of output classes), you can retrain the network for a slightly different purpose. For instance we could re-train a CNN that recognizes cats to recognize foxes instead.

- Of course, the first layers for which you want to use the pre-trained weights need to be exactly the same, otherwise the number of parameters won’t match.

**YOU FREEZE THE EARLIER LAYERS!**

### Combine the pretrained network
- If you remove the output layer you can put other layers or other networks on top of it. For instance, combining a pretrained CNN with an LSTM would allow you to train a network that produces image captions.

### Frozen Layers

In transfer learning, the weights of the first layers are usually fixed. Only the last layers are trainable.

To freeze a layer in a Keras model, set `layer.trainable = False`. To freeze layers of an already existing model try accessing the trainable attribute via `model.layers`:

In [None]:
model.layers[0].trainable = False  # first layer

**Note:** **Caching** the output of frozen layers for all data points may speed up training, because the output needs to be calculated only once!

### Adding extra layers

To add new layers to a model, you need to use the non-sequential syntax in Keras, where you define input and output of each layer explicitly:


In [None]:
from keras.applications.resnet50 import ResNet50
from keras.layers import Dense, Flatten, Activation
m = ResNet50()  # alternative: include_top=False

# connect to input and output of existing model
dense = Dense(1)(m.layers[-1].output)
act = Activation('sigmoid')(dense)
m2 = Model(inputs=m.input, outputs=[act])

m2.compile(optimizer='rmsprop', loss='binary_crossentropy')
m2.fit(Xtrain, ytrain)

After modifying the layers, you need to recompile the model.

### Warmup Training
- When bringing old and new layers together, the new layers start with a very strong gradient. To prevent this gradient from distorting the pretrained weights, there are usually a few warmup epochs where the pretrained layers are completely frozen and the new layers adjust to them. Only after that the real training begins.