<a href="https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_03_4_early_stop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-558: Applications of Deep Neural Networks
**Module 3: Introduction to TensorFlow**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 3 Material

* Part 3.1: Deep Learning and Neural Network Introduction [[Video]](https://www.youtube.com/watch?v=zYnI4iWRmpc&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_1_neural_net.ipynb)
* Part 3.2: Introduction to Tensorflow and Keras [[Video]](https://www.youtube.com/watch?v=PsE73jk55cE&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_2_keras.ipynb)
* Part 3.3: Saving and Loading a Keras Neural Network [[Video]](https://www.youtube.com/watch?v=-9QfbGM1qGw&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_3_save_load.ipynb)
* **Part 3.4: Early Stopping in Keras to Prevent Overfitting** [[Video]](https://www.youtube.com/watch?v=m1LNunuI2fk&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_4_early_stop.ipynb)
* Part 3.5: Extracting Weights and Manual Calculation [[Video]](https://www.youtube.com/watch?v=7PWgx16kH8s&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_5_weights.ipynb)

# Google CoLab Instructions

The following code ensures that Google CoLab is running the correct version of TensorFlow.

In [1]:
try:
    %tensorflow_version 2.x
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

Note: not using Google CoLab


# Part 3.4: Early Stopping in Keras to Prevent Overfitting

**Overfitting** occurs when a neural network is trained to the point that it begins to memorize rather than generalize, as demonstrated in Figure 3.OVER. 

**Figure 3.OVER: Training vs Validation Error for Overfitting**
![Training vs Validation Error for Overfitting](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_3_training_val.png "Training vs Validation Error for Overfitting")

It is important to segment the original dataset into several datasets:

* **Training Set**
* **Validation Set**
* **Holdout Set**

There are several different ways that these sets can be constructed.  The following programs demonstrate some of these.

The first method is a training and validation set.  The training data are used to train the neural network until the validation set no longer improves.  This attempts to stop at a near optimal training point.  This method will only give accurate "out of sample" predictions for the validation set, this is usually 20% or so of the data.  The predictions for the training data will be overly optimistic, as these were the data that the neural network was trained on.  Figure 3.VAL demonstrates how a dataset is divided.

**Figure 3.VAL: Training with a Validation Set**
![Training with a Validation Set](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_1_train_val.png "Training with a Validation Set")

### Early Stopping with Classification

In [1]:
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/iris.csv", 
    na_values=['NA', '?'])

# Convert to numpy - Classification
x = df[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values
dummies = pd.get_dummies(df['species']) # Classification
species = dummies.columns
y = dummies.values

# Split into validation and training sets
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

# Build neural network
model = Sequential()
model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(25, activation='relu')) # Hidden 2
model.add(Dense(y.shape[1],activation='softmax')) # Output
model.compile(loss='categorical_crossentropy', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, 
        verbose=1, mode='auto', restore_best_weights=True)
model.fit(x_train,y_train,validation_data=(x_test,y_test),
        callbacks=[monitor],verbose=2,epochs=1000)


Epoch 1/1000
4/4 - 2s - loss: 1.3495 - val_loss: 1.1960
Epoch 2/1000
4/4 - 0s - loss: 1.1812 - val_loss: 1.0673
Epoch 3/1000
4/4 - 0s - loss: 1.0641 - val_loss: 0.9881
Epoch 4/1000
4/4 - 0s - loss: 0.9784 - val_loss: 0.9281
Epoch 5/1000
4/4 - 0s - loss: 0.9318 - val_loss: 0.9035
Epoch 6/1000
4/4 - 0s - loss: 0.9085 - val_loss: 0.8730
Epoch 7/1000
4/4 - 0s - loss: 0.8755 - val_loss: 0.8325
Epoch 8/1000
4/4 - 0s - loss: 0.8376 - val_loss: 0.7981
Epoch 9/1000
4/4 - 0s - loss: 0.8103 - val_loss: 0.7639
Epoch 10/1000
4/4 - 0s - loss: 0.7805 - val_loss: 0.7274
Epoch 11/1000
4/4 - 0s - loss: 0.7510 - val_loss: 0.6948
Epoch 12/1000
4/4 - 0s - loss: 0.7246 - val_loss: 0.6628
Epoch 13/1000
4/4 - 0s - loss: 0.6923 - val_loss: 0.6356
Epoch 14/1000
4/4 - 0s - loss: 0.6669 - val_loss: 0.6104
Epoch 15/1000
4/4 - 0s - loss: 0.6445 - val_loss: 0.5842
Epoch 16/1000
4/4 - 0s - loss: 0.6199 - val_loss: 0.5592
Epoch 17/1000
4/4 - 0s - loss: 0.5984 - val_loss: 0.5358
Epoch 18/1000
4/4 - 0s - loss: 0.5750 - 

<tensorflow.python.keras.callbacks.History at 0x171ceb0c1f0>

There are a number of parameters that are specified to the **EarlyStopping** object. 

* **min_delta** This value should be kept small. It simply means the minimum change in error to be registered as an improvement.  Setting it even smaller will not likely have a great deal of impact.
* **patience** How long should the training wait for the validation error to improve?  
* **verbose** How much progress information do you want?
* **mode** In general, always set this to "auto".  This allows you to specify if the error should be minimized or maximized.  Consider accuracy, where higher numbers are desired vs log-loss/RMSE where lower numbers are desired.
* **restore_best_weights** This should always be set to true.  This restores the weights to the values they were at when the validation set is the highest.  Unless you are manually tracking the weights yourself (we do not use this technique in this course), you should have Keras perform this step for you.

As you can see from above, the entire number of requested epochs were not used.  The neural network training stopped once the validation set no longer improved.

In [3]:
from sklearn.metrics import accuracy_score

pred = model.predict(x_test)
predict_classes = np.argmax(pred,axis=1)
expected_classes = np.argmax(y_test,axis=1)
correct = accuracy_score(expected_classes,predict_classes)
print(f"Accuracy: {correct}")

Accuracy: 1.0


### Early Stopping with Regression

The following code demonstrates how we can apply early stopping to a regression problem.  The technique is similar to the early stopping for classification code that we just saw.

In [6]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
import pandas as pd
import io
import os
import requests
import numpy as np
from sklearn import metrics

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/auto-mpg.csv", 
    na_values=['NA', '?'])

cars = df['name']

# Handle missing value
df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median())

# Pandas to Numpy
x = df[['cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'year', 'origin']].values
y = df['mpg'].values # regression

# Split into validation and training sets
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

# Build the neural network
model = Sequential()
model.add(Dense(25, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(10, activation='relu')) # Hidden 2
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, 
        patience=5, verbose=1, mode='auto',
        restore_best_weights=True)
model.fit(x_train,y_train,validation_data=(x_test,y_test),
        callbacks=[monitor], verbose=2,epochs=1000)

Epoch 1/1000
10/10 - 1s - loss: 390036.1875 - val_loss: 294417.0938
Epoch 2/1000
10/10 - 0s - loss: 244529.0000 - val_loss: 180679.7812
Epoch 3/1000
10/10 - 0s - loss: 147188.6250 - val_loss: 103410.0703
Epoch 4/1000
10/10 - 0s - loss: 82251.0703 - val_loss: 55436.4102
Epoch 5/1000
10/10 - 0s - loss: 43512.7969 - val_loss: 29313.8906
Epoch 6/1000
10/10 - 0s - loss: 22102.9961 - val_loss: 13818.5889
Epoch 7/1000
10/10 - 0s - loss: 9850.4609 - val_loss: 5536.8110
Epoch 8/1000
10/10 - 0s - loss: 3709.7881 - val_loss: 1771.8453
Epoch 9/1000
10/10 - 0s - loss: 1103.6124 - val_loss: 460.4190
Epoch 10/1000
10/10 - 0s - loss: 289.3120 - val_loss: 138.1106
Epoch 11/1000
10/10 - 0s - loss: 121.8494 - val_loss: 104.6027
Epoch 12/1000
10/10 - 0s - loss: 110.5760 - val_loss: 109.5717
Epoch 13/1000
10/10 - 0s - loss: 112.2939 - val_loss: 107.1854
Epoch 14/1000
10/10 - 0s - loss: 107.9153 - val_loss: 100.9774
Epoch 15/1000
10/10 - 0s - loss: 102.3722 - val_loss: 96.5153
Epoch 16/1000
10/10 - 0s - los

Epoch 134/1000
10/10 - 0s - loss: 67.1112 - val_loss: 61.1433
Epoch 135/1000
10/10 - 0s - loss: 66.8004 - val_loss: 60.7718
Epoch 136/1000
10/10 - 0s - loss: 66.4782 - val_loss: 60.4750
Epoch 137/1000
10/10 - 0s - loss: 66.2206 - val_loss: 60.2280
Epoch 138/1000
10/10 - 0s - loss: 66.0565 - val_loss: 59.8757
Epoch 139/1000
10/10 - 0s - loss: 65.6932 - val_loss: 59.6399
Epoch 140/1000
10/10 - 0s - loss: 65.4174 - val_loss: 59.4523
Epoch 141/1000
10/10 - 0s - loss: 65.2317 - val_loss: 59.0549
Epoch 142/1000
10/10 - 0s - loss: 64.8965 - val_loss: 58.8534
Epoch 143/1000
10/10 - 0s - loss: 64.6559 - val_loss: 58.4761
Epoch 144/1000
10/10 - 0s - loss: 64.4625 - val_loss: 58.1943
Epoch 145/1000
10/10 - 0s - loss: 64.3151 - val_loss: 58.1052
Epoch 146/1000
10/10 - 0s - loss: 63.9739 - val_loss: 57.6159
Epoch 147/1000
10/10 - 0s - loss: 63.6170 - val_loss: 57.3801
Epoch 148/1000
10/10 - 0s - loss: 63.2566 - val_loss: 57.4417
Epoch 149/1000
10/10 - 0s - loss: 63.6356 - val_loss: 57.1869
Epoch 15

Epoch 267/1000
10/10 - 0s - loss: 40.3548 - val_loss: 33.5923
Epoch 268/1000
10/10 - 0s - loss: 40.3262 - val_loss: 33.6416
Epoch 269/1000
10/10 - 0s - loss: 40.3310 - val_loss: 33.4121
Epoch 270/1000
10/10 - 0s - loss: 40.0662 - val_loss: 33.5350
Epoch 271/1000
10/10 - 0s - loss: 40.6119 - val_loss: 33.1418
Epoch 272/1000
10/10 - 0s - loss: 39.9215 - val_loss: 33.6389
Epoch 273/1000
10/10 - 0s - loss: 39.6169 - val_loss: 33.1060
Epoch 274/1000
10/10 - 0s - loss: 39.6101 - val_loss: 32.9929
Epoch 275/1000
10/10 - 0s - loss: 39.4291 - val_loss: 32.7844
Epoch 276/1000
10/10 - 0s - loss: 39.6574 - val_loss: 32.7002
Epoch 277/1000
10/10 - 0s - loss: 39.4713 - val_loss: 33.3798
Epoch 278/1000
10/10 - 0s - loss: 39.0950 - val_loss: 32.5029
Epoch 279/1000
10/10 - 0s - loss: 38.9195 - val_loss: 32.6518
Epoch 280/1000
10/10 - 0s - loss: 38.7807 - val_loss: 32.3553
Epoch 281/1000
10/10 - 0s - loss: 38.6903 - val_loss: 32.1952
Epoch 282/1000
10/10 - 0s - loss: 38.5662 - val_loss: 32.5098
Epoch 28

10/10 - 0s - loss: 27.7236 - val_loss: 22.9129
Epoch 400/1000
10/10 - 0s - loss: 27.6417 - val_loss: 22.9174
Epoch 401/1000
10/10 - 0s - loss: 27.4922 - val_loss: 22.9766
Epoch 402/1000
10/10 - 0s - loss: 27.5527 - val_loss: 22.6549
Epoch 403/1000
10/10 - 0s - loss: 28.4039 - val_loss: 24.0018
Epoch 404/1000
10/10 - 0s - loss: 28.2054 - val_loss: 23.2164
Epoch 405/1000
10/10 - 0s - loss: 28.1508 - val_loss: 24.3742
Epoch 406/1000
10/10 - 0s - loss: 28.2765 - val_loss: 22.3770
Epoch 407/1000
10/10 - 0s - loss: 27.0944 - val_loss: 22.8562
Epoch 408/1000
10/10 - 0s - loss: 27.0292 - val_loss: 22.5485
Epoch 409/1000
10/10 - 0s - loss: 26.8810 - val_loss: 22.4339
Epoch 410/1000
10/10 - 0s - loss: 26.9847 - val_loss: 22.2984
Epoch 411/1000
10/10 - 0s - loss: 28.7695 - val_loss: 22.8068
Epoch 412/1000
10/10 - 0s - loss: 26.5466 - val_loss: 21.9482
Epoch 413/1000
10/10 - 0s - loss: 26.8524 - val_loss: 21.9657
Epoch 414/1000
10/10 - 0s - loss: 26.3521 - val_loss: 21.8245
Epoch 415/1000
10/10 - 

<tensorflow.python.keras.callbacks.History at 0x171d198a4f0>

Finally, we evaluate the error.

In [7]:
# Measure RMSE error.  RMSE is common for regression.
pred = model.predict(x_test)
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print(f"Final score (RMSE): {score}")

Final score (RMSE): 4.595648277397226
