<a href="https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_09_1_keras_transfer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-558: Applications of Deep Neural Networks
**Module 9: Transfer Learning**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 9 Material

* **Part 9.1: Introduction to Keras Transfer Learning** [[Video]](https://www.youtube.com/watch?v=AtoeoNwmd7w&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_09_1_keras_transfer.ipynb)
* Part 9.2: Keras Transfer Learning for Computer Vision [[Video]](https://www.youtube.com/watch?v=nXcz0V5SfYw&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_09_2_keras_xfer_cv.ipynb)
* Part 9.3: Transfer Learning for NLP with Keras [[Video]](https://www.youtube.com/watch?v=PyRsjwLHgAU&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_09_3_transfer_nlp.ipynb)
* Part 9.4: Transfer Learning for Facial Feature Recognition [[Video]](https://www.youtube.com/watch?v=uUZg33DfCls&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_09_4_facial_points.ipynb)
* Part 9.5: Transfer Learning for Style Transfer [[Video]](https://www.youtube.com/watch?v=pLWIaQwkJwU&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_09_5_style_transfer.ipynb)

# Google CoLab Instructions

The following code ensures that Google CoLab is running the correct version of TensorFlow.

In [None]:
# Start CoLab
try:
    %tensorflow_version 2.x
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

Note: not using Google CoLab


# Part 9.1: Introduction to Keras Transfer Learning

Human beings learn new skills throughout their entire lives. However, this learning is rarely from scratch. No matter what task a human learns, they are most likely drawing on experiences to learn this new skill early in life. In this way, humans learn much differently than most deep learning projects. 

A human being learns to tell the difference between a cat and a dog at some point. To teach a neural network, you would obtain many cat pictures and dog pictures. The neural network would iterate over all of these pictures and train on the differences. The human child that learned to distinguish between the two animals would probably need to see a few examples when parents told them the name of each type of animal. The human child would use previous knowledge of looking at different living and non-living objects to help make this classification. The child would already know the physical appearance of sub-objects, such as fur, eyes, ears, noses, tails, and teeth.

Transfer learning attempts to teach a neural network by similar means. Rather than training your neural network from scratch, you begin training with a preloaded set of weights. Usually, you will remove the topmost layers of the pretrained neural network and retrain it with new uppermost layers. The layers from the previous neural network will be locked so that training does not change these weights. Only the newly added layers will be trained.  

It can take much computing power to train a neural network for a large image dataset. Google, Facebook, Microsoft, and other tech companies have utilized GPU arrays for training high-quality neural networks for various applications. Transferring these weights into your neural network can save considerable effort and compute time. It is unlikely that a pretrained model will exactly fit the application that you seek to implement. Finding the closest pretrained model and using transfer learning is essential for a deep learning engineer.

## Transfer Learning Example

Let's look at a simple example of using transfer learning to build upon an imagenet neural network. We will begin by training a neural network for Fisher's Iris Dataset. This network takes four measurements and classifies each observation into three iris species. However, what if later we received a data set that included the four measurements, plus a cost as the target? This dataset does not contain the species; as a result, it uses the same four inputs as the base model we just trained.

We can take our previously trained iris network and transfer the weights to a new neural network that will learn to predict the cost through transfer learning. Also of note, the original neural network was a classification network, yet we now use it to build a regression neural network. Such a transformation is common for transfer learning. As a reference point, I randomly created this iris cost dataset.

The first step is to train our neural network for the regular Iris Dataset. The code presented here is the same as we saw in Module 3.

In [None]:
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/iris.csv", 
    na_values=['NA', '?'])

# Convert to numpy - Classification
x = df[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values
dummies = pd.get_dummies(df['species']) # Classification
species = dummies.columns
y = dummies.values


# Build neural network
model = Sequential()
model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(25, activation='relu')) # Hidden 2
model.add(Dense(y.shape[1],activation='softmax')) # Output

model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(x,y,verbose=2,epochs=100)

Epoch 1/100
5/5 - 1s - loss: 1.7638 - 634ms/epoch - 127ms/step
Epoch 2/100
5/5 - 0s - loss: 1.2951 - 9ms/epoch - 2ms/step
Epoch 3/100
5/5 - 0s - loss: 1.0713 - 8ms/epoch - 2ms/step
Epoch 4/100
5/5 - 0s - loss: 1.0110 - 12ms/epoch - 2ms/step
Epoch 5/100
5/5 - 0s - loss: 0.9364 - 9ms/epoch - 2ms/step
Epoch 6/100
5/5 - 0s - loss: 0.8444 - 8ms/epoch - 2ms/step
Epoch 7/100
5/5 - 0s - loss: 0.7800 - 12ms/epoch - 2ms/step
Epoch 8/100
5/5 - 0s - loss: 0.7321 - 13ms/epoch - 3ms/step
Epoch 9/100
5/5 - 0s - loss: 0.6806 - 13ms/epoch - 3ms/step
Epoch 10/100
5/5 - 0s - loss: 0.6377 - 12ms/epoch - 2ms/step
Epoch 11/100
5/5 - 0s - loss: 0.6021 - 13ms/epoch - 3ms/step
Epoch 12/100
5/5 - 0s - loss: 0.5693 - 10ms/epoch - 2ms/step
Epoch 13/100
5/5 - 0s - loss: 0.5470 - 11ms/epoch - 2ms/step
Epoch 14/100
5/5 - 0s - loss: 0.5219 - 11ms/epoch - 2ms/step
Epoch 15/100
5/5 - 0s - loss: 0.4992 - 24ms/epoch - 5ms/step
Epoch 16/100
5/5 - 0s - loss: 0.4757 - 15ms/epoch - 3ms/step
Epoch 17/100
5/5 - 0s - loss: 0.45

<keras.callbacks.History at 0x7fea3fb1ef50>

To keep this example simple, we are not setting aside a validation set.  The goal of this example is to show how to create a multi-layer neural network, where we transfer the weights to another network.  We begin by evaluating the accuracy of the network on the training set.


In [None]:
from sklearn.metrics import accuracy_score
pred = model.predict(x)
predict_classes = np.argmax(pred,axis=1)
expected_classes = np.argmax(y,axis=1)
correct = accuracy_score(expected_classes,predict_classes)
print(f"Training Accuracy: {correct}")

Training Accuracy: 0.9866666666666667


Viewing the model summary is as expected; we can see the three layers previously defined.

In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 50)                250       
                                                                 
 dense_1 (Dense)             (None, 25)                1275      
                                                                 
 dense_2 (Dense)             (None, 3)                 78        
                                                                 
Total params: 1,603
Trainable params: 1,603
Non-trainable params: 0
_________________________________________________________________


## Create a New Iris Network

Now that we've trained a neural network on the iris dataset, we can transfer the knowledge of this neural network to other neural networks. It is possible to create a new neural network from some or all of the layers of this neural network. We will create a new neural network that is essentially a clone of the first neural network to demonstrate the technique. We now transfer all of the layers from the original neural network into the new one.

In [None]:
model2 = Sequential()
for layer in model.layers:
    model2.add(layer)
model2.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 50)                250       
                                                                 
 dense_1 (Dense)             (None, 25)                1275      
                                                                 
 dense_2 (Dense)             (None, 3)                 78        
                                                                 
Total params: 1,603
Trainable params: 1,603
Non-trainable params: 0
_________________________________________________________________


As a sanity check, we would like to calculate the accuracy of the newly created model.  The in-sample accuracy should be the same as the previous model that the new model transferred.

In [None]:
from sklearn.metrics import accuracy_score
pred = model2.predict(x)
predict_classes = np.argmax(pred,axis=1)
expected_classes = np.argmax(y,axis=1)
correct = accuracy_score(expected_classes,predict_classes)
print(f"Training Accuracy: {correct}")

Training Accuracy: 0.9866666666666667


The in-sample accuracy of the newly created neural network is the same as the first neural network. We've successfully transferred all of the layers from the original neural network.

## Transfering to a Regression Network

The Iris Cost Dataset has measurements for samples of these flowers that conform to the predictors contained in the original iris dataset: sepal width, sepal length, petal width, and petal length. We present the cost dataset here.

In [None]:
df_cost = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/iris_cost.csv", 
    na_values=['NA', '?'])

df_cost

Unnamed: 0,sepal_l,sepal_w,petal_l,petal_w,cost
0,7.8,3.0,6.2,2.0,10.740
1,5.0,2.2,1.7,1.5,2.710
2,6.9,2.6,3.7,1.4,4.624
3,5.9,2.2,3.7,2.4,6.558
4,5.1,3.9,6.8,0.7,7.395
...,...,...,...,...,...
245,4.7,2.1,4.0,2.3,5.721
246,7.2,3.0,4.3,1.1,5.266
247,6.6,3.4,4.6,1.4,5.776
248,5.7,3.7,3.1,0.4,2.233


For transfer learning to be effective, the input for the newly trained neural network most closely conforms to the first neural network we transfer.

We will strip away the last output layer that contains the softmax activation function that performs this final classification. We will create a new output layer that will output the cost prediction. We will only train the weights in this new layer. We will mark the first two layers as non-trainable. The hope is that the first few layers have learned to abstract the raw input data in a way that is also helpful to the new neural network.
This process is accomplished by looping over the first few layers and copying them to the new neural network. We output a summary of the new neural network to verify that Keras stripped the previous output layer.

In [None]:
model3 = Sequential()
for i in range(2):
    layer = model.layers[i]
    layer.trainable = False
    model3.add(layer)
model3.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 50)                250       
                                                                 
 dense_1 (Dense)             (None, 25)                1275      
                                                                 
Total params: 1,525
Trainable params: 0
Non-trainable params: 1,525
_________________________________________________________________


We add a final regression output layer to complete the new neural network.

In [None]:
model3.add(Dense(1)) # Output

model3.compile(loss='mean_squared_error', optimizer='adam')
model3.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 50)                250       
                                                                 
 dense_1 (Dense)             (None, 25)                1275      
                                                                 
 dense_3 (Dense)             (None, 1)                 26        
                                                                 
Total params: 1,551
Trainable params: 26
Non-trainable params: 1,525
_________________________________________________________________


Now we train just the output layer to predict the cost. The cost in the made-up dataset is dependent on the species, so the previous learning should be helpful.

In [None]:
# Convert to numpy - Classification
x = df_cost[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values
y = df_cost.cost.values

# Train the last layer of the network
model3.fit(x,y,verbose=2,epochs=100)

Epoch 1/100
8/8 - 0s - loss: 14.0400 - 379ms/epoch - 47ms/step
Epoch 2/100
8/8 - 0s - loss: 12.6133 - 10ms/epoch - 1ms/step
Epoch 3/100
8/8 - 0s - loss: 11.3224 - 12ms/epoch - 1ms/step
Epoch 4/100
8/8 - 0s - loss: 10.1006 - 19ms/epoch - 2ms/step
Epoch 5/100
8/8 - 0s - loss: 9.0898 - 19ms/epoch - 2ms/step
Epoch 6/100
8/8 - 0s - loss: 8.1514 - 13ms/epoch - 2ms/step
Epoch 7/100
8/8 - 0s - loss: 7.3497 - 11ms/epoch - 1ms/step
Epoch 8/100
8/8 - 0s - loss: 6.6789 - 14ms/epoch - 2ms/step
Epoch 9/100
8/8 - 0s - loss: 6.0785 - 11ms/epoch - 1ms/step
Epoch 10/100
8/8 - 0s - loss: 5.5620 - 11ms/epoch - 1ms/step
Epoch 11/100
8/8 - 0s - loss: 5.1035 - 11ms/epoch - 1ms/step
Epoch 12/100
8/8 - 0s - loss: 4.7415 - 12ms/epoch - 2ms/step
Epoch 13/100
8/8 - 0s - loss: 4.4169 - 13ms/epoch - 2ms/step
Epoch 14/100
8/8 - 0s - loss: 4.1181 - 19ms/epoch - 2ms/step
Epoch 15/100
8/8 - 0s - loss: 3.8847 - 20ms/epoch - 3ms/step
Epoch 16/100
8/8 - 0s - loss: 3.6586 - 13ms/epoch - 2ms/step
Epoch 17/100
8/8 - 0s - los

<keras.callbacks.History at 0x7fea3f9bc890>

We can evaluate the in-sample RMSE for the new model containing transferred layers from the previous model.

In [None]:
from sklearn.metrics import accuracy_score
pred = model3.predict(x)
score = np.sqrt(metrics.mean_squared_error(pred,y))
print(f"Final score (RMSE): {score}")

Final score (RMSE): 1.3716589625823072


# Module 9 Assignment

You can find the first assignment here: [assignment 9](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/assignments/assignment_yourname_class9.ipynb)