# Transfer Learning

### Motivation: snow is a kind of precipitation

 If two learning tasks are in some way similar, we may not need to build two different models from scratch. Rather, build model for `Task1` and reuse it for `Task2`, retraining only the outer layers.
 
 An example that's commonly given in literature is retraining massive industry standard Image Recognition models to classify a smaller set of image types, such as Cats and Dogs. The idea being that the massive industry proven model has already learned to detect features common to all images (e.g. edges, colors etc) and so we can use that more general learning to solve a more specific problem.
 
 Similarly, for the Capstone project some quantities that we are trying to predict appear to be related. Specifically, as part of the main work on the project we built a number of models to predict `Precipitation` (a binary classifier). Now whether or not it's going to _snow_ (`_is_snow` binary variable in our processed data set) appears to be a subset of `Precipitation`. 
 
Thus, in this demo we'll show how to use a complex model built for `Precipitation` to predict `Snow`.

### Deep Convolutional Model to Predict Precipitation

We are going to use a single Conv1D / MaxPooling layer combined with a Dense layer to predict Precipitation as follows:

In [1]:
def build_model_dcnn(is_binary, label_width, input_cnt):
    _activation, _loss, _metrics = get_activation_loss_and_metrics(is_binary)
    conv_width = 5

    model = tf.keras.Sequential([

        # Conv and Pooling Layers to learn some features
        tf.keras.layers.Conv1D(filters=512,
                               kernel_size=conv_width,
                               activation='relu'),
        tf.keras.layers.MaxPooling1D(pool_size=4),
        
        # Extra Hiddel Deeply Connected Layer
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(units=input_cnt / 10, activation='relu'),
        
        # Output Layer
        tf.keras.layers.Dense(units=label_width, activation=_activation, kernel_initializer=tf.initializers.zeros()),
        tf.keras.layers.Reshape([label_width, 1]),
    ])

    model.compile(loss=_loss, optimizer='adam', metrics=[_metrics])
    return model

def get_activation_loss_and_metrics(is_binary):
    activation, loss, metrics = "linear", 'mean_squared_error', REGRESSION_METRICS
    if is_binary:
        activation, loss, metrics = "sigmoid", tf.keras.losses.BinaryCrossentropy(), CLASSIFICATION_METRICS

    return activation, loss, metrics

When we train this kind of model to predict Precipitation **12h in advance**, we get the following results:


| PRECISION | RECALL |
| -----------------| ------------- |
| 0.76 | 0.47 |


### Re-purpose the DCNN model for Precipitation to Predict Snow

The approach is to freeze the Convolutional / Pooling Layers and retrain the Dense and Output Layer only. That way we re-use the previously learned features that matter for determining `Precipitation` and only fine tune the output layers to make them fit the `Snow` problem specifically.

Here is what the `Precipitation` Model looks like when we load it from file:

```
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d (Conv1D)              (None, 32, 512)           410112    
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 8, 512)            0         
_________________________________________________________________
flatten (Flatten)            (None, 4096)              0         
_________________________________________________________________
dense (Dense)                (None, 576)               2359872   
_________________________________________________________________
dense_1 (Dense)              (None, 7)                 4039      
_________________________________________________________________
reshape (Reshape)            (None, 7, 1)              0         
```

Here is the code to generate a new model that freezes the Conv/Pooling layers and opens up the Outer layers:

In [2]:
def tweak_model(model_to_tweak):
    model = keras.models.load_model(libcommons.libcommons.get_model_file(model_to_tweak))
    model.summary()

    new_model= keras.models.Sequential()
    layer_no = 0
    for layer in model.layers:
        if layer.name == 'conv1d' or layer.name == 'max_pooling1d': # Freeze Convolutional and Pooling Layers
            layer.trainable = False

        if layer_no < len(model.layers) - 3: # Do not include the last Dense and Output layers
            new_model.add(layer)

        layer_no = layer_no + 1

    # Put in new Dense & Output Layers
    new_model.add(tf.keras.layers.Dense(name='Deep_Dense_2', units=100, activation='relu'))
    new_model.add(tf.keras.layers.Dense(name='Dense_Out', units=7, activation='sigmoid', kernel_initializer=tf.initializers.zeros()))
    new_model.add(tf.keras.layers.Reshape([7, 1])),

    for layer in new_model.layers:
        print("NEW LAYER: {}, trainable= {}".format(layer.name, layer.trainable))

    new_model.compile(loss=tf.keras.losses.BinaryCrossentropy(), optimizer='adam', metrics=[CLASSIFICATION_METRICS])

    return new_model


Running this code outputs:

```
NEW LAYER: conv1d, trainable= False
NEW LAYER: max_pooling1d, trainable= False
NEW LAYER: flatten, trainable= True
NEW LAYER: Deep_Dense_2, trainable= True
NEW LAYER: Dense_Out, trainable= True
NEW LAYER: reshape, trainable= True
```

We shall now use the above to train 3 models to predict `Snow`, **6h**, **12h** and **18h** in advance:

In [None]:
for target_model in ([config.PREDICTION_TARGET_IS_SNOW_6H, 
                      config.PREDICTION_TARGET_IS_SNOW_12H, 
                      config.PREDICTION_TARGET_IS_SNOW_18H]):
    new_model = tweak_model(config.PREDICTION_TARGET_IS_PRECIP_12H)
    create_model(config.PREDICTION_TARGET_IS_PRECIP_12H, target_model, new_model, '../../processed-data')

### Results

True to the promise of Transfer Learning, we can build the models for predicting `Snow` quite a bit faster than the original `Precipitation+12h` model. Here is how each of the models performed:

| Lookahead, hr | Precision | Recall |
| --- | --- | --- |
| 6 | 0.78| 0.6 |
|12 | 0.62 | 0.43 |
|0 | 0.47 | 0.35 |

Not bad, given that we got 3 models out of 1 and did not have to fully train any of them!