# Topic 42: Transfer Learning and Tuning Neural Networks

1. Using GridSearch/Talos for finding optimal parameter combinations 
2. Saving your neural network to disk
4. Transfer Learning

## 1. Tuning hyperparameters with Keras Models

There are a couple ways to go about testing combinations of parameters, GridSearch style:
* **Using SKlearn GridSearch**: https://chrisalbon.com/deep_learning/keras/tuning_neural_network_hyperparameters/
    * This involves creating a model object such that scikit-learn's existing GridSearch functions work with your neural net.
* **Using KerasTuner**: https://www.tensorflow.org/tutorials/keras/keras_tuner
* **Using Talos**: https://autonomio.github.io/talos/#/Scan
    * This library lets you tune without having to create the model object, and also can automatically output your parameter combination scores into a csv file

In [None]:
# !pip install talos
import talos

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

import tensorflow.keras as keras

from keras.models import Sequential
from keras import layers
from keras.callbacks import EarlyStopping



## Data:
Let's return to our Seattle Housing data from the last study group.

We are going to split the testing set into a validation and a holdout set.

In [None]:
train = pd.read_csv('https://raw.githubusercontent.com/learn-co-students/011121-pt-ds/main/phase_4/41-building-deep-neural-networks/train.csv')
test = pd.read_csv('https://raw.githubusercontent.com/learn-co-students/011121-pt-ds/main/phase_4/41-building-deep-neural-networks/test.csv')
split = int(len(test)*.5)
val = test.iloc[:split, :]
holdout = test.iloc[split:, :]

X_train, y_train = (train.drop('price', axis=1), train['price'])
X_val, y_val = (val.drop('price', axis=1), val['price'])
X_holdout, y_holdout = (holdout.drop('price', axis=1), holdout['price'])

display(train)
train.info()

# Define a Function to Create a Model

To work with Talos, we create a specially formed function that returns our model and the model training history.

The function must take: `X_train, y_train, X_test, y_test, params` in that order.  Params is a dictionary of parameters.  Keys should be a name for the hyperparameter.  It is arbitrary, but should be something descriptive.  The values are the range of values that hyperparameter could take.

Talos will pass only one of the possible values for each hyperparameter to the function on each experiment in the `params` dictionary that the function expects.

We then set the hyperparameters of the model generated in the function to be the value we want to try from the params dictionary.

Another note: we can adjust the depth of the model by adding layers in a loop, a series of loops, or even nested loops.

In [None]:
def dense_network(x_train, y_train, x_test, y_test, params):

    #we build the model like we would normally do it
    model = Sequential()
    
    #input layer
    model.add(layers.InputLayer(input_shape=(X_train.shape[1],)))
    
    # hidden layers
    for layer in range(params['dense_layers']):
        model.add(layers.Dense(params['nodes'], activation=params['activation']))
        model.add(layers.Dropout(params['dropout']))
    
    # output layer
    model.add(layers.Dense(1, activation='relu'))

    model.compile(loss='mse', optimizer=params['optimizer'], metrics=['mae', 'mse'])
    
    #callback to prevent over-training
    earlystopping = EarlyStopping(monitor='val_loss', patience=2)
    
    out = model.fit(x_train, y_train, 
                   validation_data=(x_test, y_test),
                   batch_size=50,
                   epochs=10,
                   verbose=0,
                   callbacks = [earlystopping])

    return out, model

## Parameters:

Define a dictionary of possible parameter values.  Remember that more parameters quickly multiply the number of models Talos will compare.  It's okay to do a series of experiments based on the results of your previous ones, rather than on big gridsearch.

Alternatively you can use [Probabalistic Reduction](https://autonomio.github.io/talos/#/Probabilistic_Reduction) to instruct Scan to keep hyperparameter values that have shown to be 

In [None]:
params = {'dropout': [0.2, 0.5], 
          'optimizer': ['adam', 'sgd'], 
          'activation': ['relu', 'tanh'], 
          'dense_layers': [5,10],
          'nodes': [100,200]}

`talos.Scan()` instantiates a new scan object by conducting all of the experiments defined in `params` using the model defined in the `model=` argument and the data passed.

your scan object (named `results` below) will contain the record of your experiments, including all of the fitted models.  It also saves records to disk under the folder defined in `experiment_name=`.



In [None]:
results = talos.Scan(X_train, y_train, 
                     x_val=X_val,
                     y_val=y_val,
                     params=params, 
                     model=dense_network,
                     experiment_name='grid',
                     minimize_loss=True)


# Experimental Record

The record of all experiments is stored in a dataframe in the `.data` attribute.  

The scan object will keep a record of scores according to the metrics defined when the model was compiled.  You can use these metrics to order the data frame.

In [None]:
results.data.sort_values(by='val_mae', ascending=True)

We can take the top `n_models` tested by the scan object and do cross validation on a given dataset for further validation using the `.evaluate_models` method.  This adds columns to the `.data` dataframe attribute with the mean scores of cross validation.  Notice we have to define the metric we want to use.

In [None]:
results.evaluate_models(X_val.values,
                        y_val.values,
                        task='continuous',
                        n_models = 10,
                        metric='val_mae',
                        folds=5,
                        shuffle=True,
                        asc = True)

In [None]:
results.data.sort_values(by='eval_mae_mean', ascending=True)

In [None]:
best_model = results.best_model(metric='mse', asc=True)

In [None]:
best_model.summary()

## Leveling Up

A full Gridsearch can take a long time, especially with deep learning models that tend to be slower to train than traditional models.  One way to reduce the search time is to use a reduction parameter in your Talos Scan object initialization.  This argument will use previous results to remove future experiments that are unlikely to return improved results.  This is called *Probablistic Reduction*. 

You can learn more at: https://autonomio.github.io/talos/#/Probabilistic_Reduction

## 2. Saving your model

In [None]:
best_model.save('best_model.h5')
best_model.save_weights('model_weights.h5')

In [None]:
from keras.models import load_model

my_model = load_model('best_model.h5')
y_pred = my_model.predict(X_holdout)
score = mean_absolute_error(y_holdout, y_pred)
print(f'final model score on holdout: {score:.2f}')

## Notice:  
`results.best_model()` returns and uncompiled model.  While we can use the model for prediction, we won't be able to train it further until we recompile it with the desired optimizer and loss function.  These can be referenced in the experimental record from the results file.  

Recompiling the model will reset the weights, but they can then be reloaded from the saved weights file.

Another Talos walkthrough: https://medium.com/swlh/how-to-perform-keras-hyperparameter-optimization-x3-faster-on-tpu-for-free-602b97812602

## 3. Transfer Learning and Pretrained Models

* A pretrained network (also known in image classification as a convolutional base) consists of layers that have already been trained on typically general data
* For images, these layers have already learned general patterns, textures, colors, etc. such that when you feed in your training data, certain features can immediately be detected. This part is **feature extraction**.
* You typically add your own final layers to train the network to classify/regress based on your problem. This component is **fine tuning**

Here are the pretrained models that exist within Keras: https://keras.io/api/applications/

To demonstrate the utility of pretrained networks, we'll compare model performance between a baseline model and a model using a pretrained network (VGG19).

# The Data

We will be shifting gears now and moving to some transfer learning using image classificaiton.  We will be using the Cifar10 dataset, a dataset of 60,000 images divided into 10 categories for our model to predict.

In [None]:
(X_train_img, y_train_img), (X_test_img, y_test_img) = keras.datasets.cifar10.load_data()

y_train_img = keras.utils.to_categorical(y_train_img)
y_test_img = keras.utils.to_categorical(y_test_img)

image_size = (X_train_img.shape[1], 
              X_train_img.shape[2], 
              X_train_img.shape[3])
image_size

In [None]:
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1,2, figsize=(10,10))
for i in range(2):
    axes[i].imshow(X_train_img[i])
    axes[i].set_title(y_train_img[i])
    
fig.suptitle('First Two Images in Cifar10')
plt.show()

# Adding Pretrained Layers


## VGG1

[VGG16](https://neurohive.io/en/popular-networks/vgg16/) is a very deep image classification model architecture.  

It starts with a set input shape, (224,224, 3).  This represents a 2 dimensional RGB image (first two dimensions) and 3 color channels (last dimension).

It uses convolutional and max pooling layers, but don't worry about what these are yet.  We will learn more about these soon.

We will be setting the weights of this model to a collection of pre-trained weights called 'imagenet'.  These weights are the result of the VGG16 model's training on the [imagenet dataset](https://en.wikipedia.org/wiki/ImageNet).  This is a database of 14 million images in 20,000 different categories.  

VGG19: https://keras.io/api/applications/vgg/#vgg19-function

In [None]:
#Load the VGG16 model
from keras.applications.vgg16 import VGG16
pretrained = VGG16(weights='imagenet', 
                 include_top=False, 
                 input_shape=image_size)

pretrained.summary()

In [None]:
cnn_transfer = Sequential()
cnn_transfer.add(pretrained)

# freezing layers so they don't get retrained with your new data
for layer in cnn_transfer.layers:
    layer.trainable=False 

In [None]:
# adding our own dense layers
cnn_transfer.add(layers.Flatten())
cnn_transfer.add(layers.Dense(132, activation='relu'))
cnn_transfer.add(layers.Dense(10, activation='softmax'))

In [None]:
# to verify that the weights are "frozen" 
for layer in cnn_transfer.layers:
    print(layer.name, layer.trainable)

In [None]:
cnn_transfer.compile(loss='categorical_crossentropy', 
                     optimizer='adam', 
                     metrics=['accuracy'])
cnn_transfer.summary()

## Create a custom image classification model.

In [None]:
cnn_custom = Sequential()
cnn_custom.add(layers.Input(shape=image_size))
cnn_custom.add(layers.Conv2D(64, 2, padding='same', activation='relu'))
cnn_custom.add(layers.Conv2D(64, 2, padding='same', activation='relu'))
cnn_custom.add(layers.MaxPool2D())
cnn_custom.add(layers.Conv2D(64, 2, padding='same', activation='relu'))
cnn_custom.add(layers.Conv2D(64, 2, padding='same', activation='relu'))
cnn_custom.add(layers.MaxPool2D())
cnn_custom.add(layers.Conv2D(64, 2, padding='same', activation='relu'))
cnn_custom.add(layers.Conv2D(64, 2, padding='same', activation='relu'))
cnn_custom.add(layers.MaxPool2D())
cnn_custom.add(layers.Conv2D(64, 2, padding='same', activation='relu'))
cnn_custom.add(layers.Conv2D(64, 2, padding='same', activation='relu'))
cnn_custom.add(layers.Flatten())
cnn_custom.add(layers.Dense(132, activation='relu'))
cnn_custom.add(layers.Dense(10, activation='softmax'))

cnn_custom.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

cnn_custom.summary()

## Let's test each:

Notice that the custom model is much shallower than VGG16, but has more trainable weights.  Take note of accuracy and training time between the two models.

Transfer learning is not always the most successful tactic, but it tends to reduct training time and is sometimes more successful.  It is one more tool for the toolkit!

In [None]:
%%time
#fit and evaluate the custom CNN.
cnn_custom.fit(X_train_img, y_train_img,
               epochs=1,
               batch_size=1024,
               validation_split=.2)
custom_loss, custom_accuracy = cnn_custom.evaluate(X_test_img, y_test_img)

print(f'Loss of custom model on test set: {custom_loss:.2f}')
print(f'Accuracy of custom model on test set: {custom_accuracy: .2f}')

In [None]:
%%time
#fit and evaluate the CNN using transfer learning.
cnn_transfer.fit(X_train_img, y_train_img,
                 epochs=1,
                 batch_size=1024,
                 validation_split=.2)
transfer_loss, transfer_accuracy = cnn_transfer.evaluate(X_test_img, y_test_img)
print(f'Loss of transfer model on test set: {transfer_loss:.2f}')
print(f'Accuracy of transfer model on test set: {transfer_accuracy: .2f}')

# Conclusion

In this notebook you:

1. Used Talos to search for optimal hyperparameters for a multi-layered perceptron with dense and dropout layers.
2. Saved the tuned model and reloaded it using saved weights.
3. Used transfer learning to reduce the training time for an image classification model.