<a href="https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_03_4_early_stop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-558: Applications of Deep Neural Networks
**Module 3: Introduction to TensorFlow**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 3 Material

* Part 3.1: Deep Learning and Neural Network Introduction [[Video]](https://www.youtube.com/watch?v=zYnI4iWRmpc&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_1_neural_net.ipynb)
* Part 3.2: Introduction to Tensorflow and Keras [[Video]](https://www.youtube.com/watch?v=PsE73jk55cE&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_2_keras.ipynb)
* Part 3.3: Saving and Loading a Keras Neural Network [[Video]](https://www.youtube.com/watch?v=-9QfbGM1qGw&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_3_save_load.ipynb)
* **Part 3.4: Early Stopping in Keras to Prevent Overfitting** [[Video]](https://www.youtube.com/watch?v=m1LNunuI2fk&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_4_early_stop.ipynb)
* Part 3.5: Extracting Weights and Manual Calculation [[Video]](https://www.youtube.com/watch?v=7PWgx16kH8s&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_5_weights.ipynb)

# Google CoLab Instructions

The following code ensures that Google CoLab is running the correct version of TensorFlow.

In [6]:
try:
    %tensorflow_version 2.x
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

Note: not using Google CoLab


# Part 3.4: Early Stopping in Keras to Prevent Overfitting

**Overfitting** occurs when a neural network is trained to the point that it begins to memorize rather than generalize.  

![Training vs Validation Error for Overfitting](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_3_training_val.png "Training vs Validation Error for Overfitting")

It is important to segment the original dataset into several datasets:

* **Training Set**
* **Validation Set**
* **Holdout Set**

There are several different ways that these sets can be constructed.  The following programs demonstrate some of these.

The first method is a training and validation set.  The training data are used to train the neural network until the validation set no longer improves.  This attempts to stop at a near optimal training point.  This method will only give accurate "out of sample" predictions for the validation set, this is usually 20% or so of the data.  The predictions for the training data will be overly optimistic, as these were the data that the neural network was trained on.  

![Training with a Validation Set](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_1_train_val.png "Training with a Validation Set")

### Early Stopping with Classification

In [7]:
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/iris.csv", 
    na_values=['NA', '?'])

# Convert to numpy - Classification
x = df[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values
dummies = pd.get_dummies(df['species']) # Classification
species = dummies.columns
y = dummies.values

# Split into validation and training sets
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

# Build neural network
model = Sequential()
model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(25, activation='relu')) # Hidden 2
model.add(Dense(y.shape[1],activation='softmax')) # Output
model.compile(loss='categorical_crossentropy', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto',
        restore_best_weights=True)
model.fit(x_train,y_train,validation_data=(x_test,y_test),callbacks=[monitor],verbose=2,epochs=1000)


Train on 112 samples, validate on 38 samples
Epoch 1/1000
112/112 - 1s - loss: 1.6608 - val_loss: 1.6149
Epoch 2/1000
112/112 - 0s - loss: 1.3935 - val_loss: 1.3683
Epoch 3/1000
112/112 - 0s - loss: 1.2071 - val_loss: 1.2028
Epoch 4/1000
112/112 - 0s - loss: 1.0931 - val_loss: 1.0934
Epoch 5/1000
112/112 - 0s - loss: 1.0243 - val_loss: 1.0194
Epoch 6/1000
112/112 - 0s - loss: 0.9771 - val_loss: 0.9524
Epoch 7/1000
112/112 - 0s - loss: 0.9273 - val_loss: 0.8971
Epoch 8/1000
112/112 - 0s - loss: 0.8818 - val_loss: 0.8490
Epoch 9/1000
112/112 - 0s - loss: 0.8413 - val_loss: 0.8025
Epoch 10/1000
112/112 - 0s - loss: 0.8041 - val_loss: 0.7589
Epoch 11/1000
112/112 - 0s - loss: 0.7652 - val_loss: 0.7222
Epoch 12/1000
112/112 - 0s - loss: 0.7322 - val_loss: 0.6868
Epoch 13/1000
112/112 - 0s - loss: 0.7023 - val_loss: 0.6530
Epoch 14/1000
112/112 - 0s - loss: 0.6723 - val_loss: 0.6233
Epoch 15/1000
112/112 - 0s - loss: 0.6449 - val_loss: 0.5974
Epoch 16/1000
112/112 - 0s - loss: 0.6211 - val_l

112/112 - 0s - loss: 0.0903 - val_loss: 0.0839
Epoch 135/1000
112/112 - 0s - loss: 0.0978 - val_loss: 0.0858
Epoch 136/1000
112/112 - 0s - loss: 0.0902 - val_loss: 0.0696
Epoch 137/1000
112/112 - 0s - loss: 0.0940 - val_loss: 0.0686
Epoch 138/1000
112/112 - 0s - loss: 0.0928 - val_loss: 0.0791
Epoch 139/1000
112/112 - 0s - loss: 0.0906 - val_loss: 0.0797
Epoch 140/1000
112/112 - 0s - loss: 0.0912 - val_loss: 0.0711
Epoch 141/1000
Restoring model weights from the end of the best epoch.
112/112 - 0s - loss: 0.0887 - val_loss: 0.0723
Epoch 00141: early stopping


<tensorflow.python.keras.callbacks.History at 0x1a4eb2aa90>

There are a number of parameters that are specified to the **EarlyStopping** object. 

* **min_delta** This value should be kept small. It simply means the minimum change in error to be registered as an improvement.  Setting it even smaller will not likely have a great deal of impact.
* **patience** How long should the training wait for the validation error to improve?  
* **verbose** How much progress information do you want?
* **mode** In general, always set this to "auto".  This allows you to specify if the error should be minimized or maximized.  Consider accuracy, where higher numbers are desired vs log-loss/RMSE where lower numbers are desired.
* **restore_best_weights** This should always be set to true.  This restores the weights to the values they were at when the validation set is the highest.  Unless you are manually tracking the weights yourself (we do not use this technique in this course), you should have Keras perform this step for you.

As you can see from above, the entire number of requested epochs were not used.  The neural network training stopped once the validation set no longer improved.

In [8]:
from sklearn.metrics import accuracy_score

pred = model.predict(x_test)
predict_classes = np.argmax(pred,axis=1)
expected_classes = np.argmax(y_test,axis=1)
correct = accuracy_score(expected_classes,predict_classes)
print(f"Accuracy: {correct}")

Accuracy: 1.0


### Early Stopping with Regression

In [9]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
import pandas as pd
import io
import os
import requests
import numpy as np
from sklearn import metrics

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/auto-mpg.csv", 
    na_values=['NA', '?'])

cars = df['name']

# Handle missing value
df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median())

# Pandas to Numpy
x = df[['cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'year', 'origin']].values
y = df['mpg'].values # regression

# Split into validation and training sets
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

# Build the neural network
model = Sequential()
model.add(Dense(25, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(10, activation='relu')) # Hidden 2
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto',
        restore_best_weights=True)
model.fit(x_train,y_train,validation_data=(x_test,y_test),callbacks=[monitor],verbose=2,epochs=1000)

Train on 298 samples, validate on 100 samples
Epoch 1/1000
298/298 - 1s - loss: 416476.3492 - val_loss: 314769.4375
Epoch 2/1000
298/298 - 0s - loss: 264400.6968 - val_loss: 195679.2900
Epoch 3/1000
298/298 - 0s - loss: 160242.9806 - val_loss: 116489.1669
Epoch 4/1000
298/298 - 0s - loss: 92886.4565 - val_loss: 65705.3525
Epoch 5/1000
298/298 - 0s - loss: 51821.5195 - val_loss: 34490.5341
Epoch 6/1000
298/298 - 0s - loss: 26432.8879 - val_loss: 17017.6602
Epoch 7/1000
298/298 - 0s - loss: 12628.8491 - val_loss: 7859.4910
Epoch 8/1000
298/298 - 0s - loss: 5661.1263 - val_loss: 3483.9925
Epoch 9/1000
298/298 - 0s - loss: 2499.7458 - val_loss: 1587.6598
Epoch 10/1000
298/298 - 0s - loss: 1219.2911 - val_loss: 908.8167
Epoch 11/1000
298/298 - 0s - loss: 757.9986 - val_loss: 712.8690
Epoch 12/1000
298/298 - 0s - loss: 650.2481 - val_loss: 667.7047
Epoch 13/1000
298/298 - 0s - loss: 628.3641 - val_loss: 661.6571
Epoch 14/1000
298/298 - 0s - loss: 627.6285 - val_loss: 661.0166
Epoch 15/1000
2

Epoch 126/1000
298/298 - 0s - loss: 324.5907 - val_loss: 322.3570
Epoch 127/1000
298/298 - 0s - loss: 323.2112 - val_loss: 319.9200
Epoch 128/1000
298/298 - 0s - loss: 321.4225 - val_loss: 317.7534
Epoch 129/1000
298/298 - 0s - loss: 319.5154 - val_loss: 316.3254
Epoch 130/1000
298/298 - 0s - loss: 316.5832 - val_loss: 313.6975
Epoch 131/1000
298/298 - 0s - loss: 314.3458 - val_loss: 311.1823
Epoch 132/1000
298/298 - 0s - loss: 312.8003 - val_loss: 309.2521
Epoch 133/1000
298/298 - 0s - loss: 310.5062 - val_loss: 306.9785
Epoch 134/1000
298/298 - 0s - loss: 308.3392 - val_loss: 304.5313
Epoch 135/1000
298/298 - 0s - loss: 306.9253 - val_loss: 302.1577
Epoch 136/1000
298/298 - 0s - loss: 304.2731 - val_loss: 300.1864
Epoch 137/1000
298/298 - 0s - loss: 302.8002 - val_loss: 299.2582
Epoch 138/1000
298/298 - 0s - loss: 300.4674 - val_loss: 296.3807
Epoch 139/1000
298/298 - 0s - loss: 298.3239 - val_loss: 293.3122
Epoch 140/1000
298/298 - 0s - loss: 296.2692 - val_loss: 291.2344
Epoch 141/

298/298 - 0s - loss: 115.5645 - val_loss: 99.0514
Epoch 251/1000
298/298 - 0s - loss: 114.6218 - val_loss: 98.1883
Epoch 252/1000
298/298 - 0s - loss: 113.6407 - val_loss: 97.3574
Epoch 253/1000
298/298 - 0s - loss: 113.3064 - val_loss: 98.8197
Epoch 254/1000
298/298 - 0s - loss: 113.1713 - val_loss: 94.0275
Epoch 255/1000
298/298 - 0s - loss: 110.1867 - val_loss: 95.5165
Epoch 256/1000
298/298 - 0s - loss: 109.4709 - val_loss: 94.4464
Epoch 257/1000
298/298 - 0s - loss: 108.3651 - val_loss: 91.5213
Epoch 258/1000
298/298 - 0s - loss: 107.6009 - val_loss: 93.2123
Epoch 259/1000
298/298 - 0s - loss: 106.6531 - val_loss: 89.9199
Epoch 260/1000
298/298 - 0s - loss: 106.0645 - val_loss: 87.8295
Epoch 261/1000
298/298 - 0s - loss: 104.0702 - val_loss: 88.9748
Epoch 262/1000
298/298 - 0s - loss: 103.3226 - val_loss: 88.5360
Epoch 263/1000
298/298 - 0s - loss: 102.1890 - val_loss: 87.0284
Epoch 264/1000
298/298 - 0s - loss: 101.2309 - val_loss: 85.2209
Epoch 265/1000
298/298 - 0s - loss: 100.

Epoch 378/1000
298/298 - 0s - loss: 41.2918 - val_loss: 30.7420
Epoch 379/1000
298/298 - 0s - loss: 41.0291 - val_loss: 32.0244
Epoch 380/1000
298/298 - 0s - loss: 40.6796 - val_loss: 30.6563
Epoch 381/1000
298/298 - 0s - loss: 40.7214 - val_loss: 31.3741
Epoch 382/1000
298/298 - 0s - loss: 40.2879 - val_loss: 30.9548
Epoch 383/1000
298/298 - 0s - loss: 40.3160 - val_loss: 30.4093
Epoch 384/1000
298/298 - 0s - loss: 40.2615 - val_loss: 31.7429
Epoch 385/1000
298/298 - 0s - loss: 39.6475 - val_loss: 29.3286
Epoch 386/1000
298/298 - 0s - loss: 39.6762 - val_loss: 30.5010
Epoch 387/1000
298/298 - 0s - loss: 39.3409 - val_loss: 29.6369
Epoch 388/1000
298/298 - 0s - loss: 39.1239 - val_loss: 29.6341
Epoch 389/1000
298/298 - 0s - loss: 40.2079 - val_loss: 30.0855
Epoch 390/1000
Restoring model weights from the end of the best epoch.
298/298 - 0s - loss: 38.7772 - val_loss: 30.1127
Epoch 00390: early stopping


<tensorflow.python.keras.callbacks.History at 0x1a50af27d0>

In [10]:
# Measure RMSE error.  RMSE is common for regression.
pred = model.predict(x_test)
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print(f"Final score (RMSE): {score}")

Final score (RMSE): 5.415585597418776
