## Transfer Learning 

Transfer Learning consists of taking a saved network that was trained on a larage dataset, and reusing sections for other purposes. If the saved network is large and general enough, then its spatial hieracrchy of features can be transferred to new tasks. 

VGG is a common model to be used for image classification tasks. You can take the convolitional base layers, freeze them so as to not be retrained, and rebuild the flattened Dense layers for training. 

Available models for Keras are: Xception, Inception V3, ResNet50, VGG16, VGG19, MobileNet. 

In [1]:
from keras.applications import VGG16 

conv_base = VGG16(weights='imagenet', 
                   include_top=False, 
                 input_shape=(150, 150, 3)) #optional param! 
conv_base.summary()

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 150, 150, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 150, 150, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 150, 150, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 75, 75, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 75, 75, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 75, 75, 128)       147584   

At this point there are two options to continue training. One is significantly cheaper computationally, while the other is cleaner in terms of code and allows you to use data augmentation and other. 

----------------------------------------------------------------

### Training Option 1: Feature Extraction 

The first option is to use the convolutional base (or other pretrained classifier) as a feature extractor, resave the data, then feed that new data into a new Dense network. This is essentially the same as a normal network, except you are breaking up the network and saving data in the middle. You need to build a new model of the flattend Dense layer and train that. Because it is just a feed-forward NN, it is quick and can be trained on a CPU, but still preferbly a GPU. Here is some psuedo code. You would probably use a generator for this in the real world. 

```python 
def extract_features(x_data, feed_size):
    features = np.zeros(shape=(feed_size, 4, 4, 512)) #size of conv_base output
    for i in x_data: 
        batch = conv_base.predict(x_data[i]) #predict each sample with VGG
        features[i, :, :, :] = batch #store in features 
   return features #return predictions for all your input data 

train_features, train_labels = extract_features(train_data, 1000), y_data
val_features...
test_features...
```

Next, you can build a feed-forward NN as normal, where the input data is the output from the conv_base prediction above. 

```python 
model = Sequential()
model.add(layers.Dense(256, 'relu', input_shape=(4*4*512)))
model.add(layers.Dense(1, 'sigmoid'))
model.fit(train_features, train_labels...) #output from conv_base
```

-------------------------------------------------------------------------

### Training Option 2: Train as Normal 

The second option is much cleaner in terms of code, but expensive (only attempt with GPU). This is because there are the same amount of parameters as there would be if training the network from scratch, and VGG is big. The benefit here is that you can use data augmentation or add other peripherals to the network with much more ease. 

It is important to set the conv_base weights parameter for trainable to False. Otherwise, you are destroying the previously learned representations. 

```python 
model = Sequential()
model.add(conv_base)
conv_base.trainable = False ###IMPORTANT ###
model.add(layers.Dense(256, 'relu', input_shape=(4*4*512)))
model.add(layers.Dense(1, 'sigmoid'))
model.fit(x_data, y_data...) #train as normal
```

------------------------------------------------------------------------

## Fine Tuning 

You can also fine tune a few layers of VGG which may help learning. You will want to only train the conv-block near the end of the network, right before the Dense layers. These end layers are mroe specialized features, worth tuning. The early layers only create representations for general features that are present in all images. You can set specific layers to trainable as so: 

```python 
for layer in conv_base: 
    if 'block5' in layer.name: 
        print (layer.name)
        layer.trainable = True 
    else: 
        layer.trainable = False
```

This will take longer to train, but you may see better results. 