# T81-558: Applications of Deep Neural Networks
**Module 3: Introduction to TensorFlow**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 3 Material

* Part 3.1: Deep Learning and Neural Network Introduction [[Video]](https://www.youtube.com/watch?v=zYnI4iWRmpc&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_1_neural_net.ipynb)
* Part 3.2: Introduction to Tensorflow and Keras [[Video]](https://www.youtube.com/watch?v=PsE73jk55cE&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_2_keras.ipynb)
* Part 3.3: Saving and Loading a Keras Neural Network [[Video]](https://www.youtube.com/watch?v=-9QfbGM1qGw&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_3_save_load.ipynb)
* **Part 3.4: Early Stopping in Keras to Prevent Overfitting** [[Video]](https://www.youtube.com/watch?v=m1LNunuI2fk&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_4_early_stop.ipynb)
* Part 3.5: Extracting Weights and Manual Calculation [[Video]](https://www.youtube.com/watch?v=7PWgx16kH8s&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_03_5_weights.ipynb)

# Part 3.4: Early Stopping in Keras to Prevent Overfitting

**Overfitting** occurs when a neural network is trained to the point that it begins to memorize rather than generalize.  

![Training vs Validation Error for Overfitting](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_3_training_val.png "Training vs Validation Error for Overfitting")

It is important to segment the original dataset into several datasets:

* **Training Set**
* **Validation Set**
* **Holdout Set**

There are several different ways that these sets can be constructed.  The following programs demonstrate some of these.

The first method is a training and validation set.  The training data are used to train the neural network until the validation set no longer improves.  This attempts to stop at a near optimal training point.  This method will only give accurate "out of sample" predictions for the validation set, this is usually 20% or so of the data.  The predictions for the training data will be overly optimistic, as these were the data that the neural network was trained on.  

![Training with a Validation Set](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_1_train_val.png "Training with a Validation Set")

### Early Stopping with Classification

In [1]:
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/iris.csv", 
    na_values=['NA', '?'])

# Convert to numpy - Classification
x = df[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values
dummies = pd.get_dummies(df['species']) # Classification
species = dummies.columns
y = dummies.values

# Split into validation and training sets
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

# Build neural network
model = Sequential()
model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(25, activation='relu')) # Hidden 2
model.add(Dense(y.shape[1],activation='softmax')) # Output
model.compile(loss='categorical_crossentropy', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto',
        restore_best_weights=True)
model.fit(x_train,y_train,validation_data=(x_test,y_test),callbacks=[monitor],verbose=2,epochs=1000)


W0819 12:47:13.647017 140736216040320 deprecation.py:323] From /Users/jheaton/miniconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Train on 112 samples, validate on 38 samples
Epoch 1/1000
112/112 - 0s - loss: 1.2451 - val_loss: 1.0949
Epoch 2/1000
112/112 - 0s - loss: 1.0841 - val_loss: 1.0130
Epoch 3/1000
112/112 - 0s - loss: 1.0086 - val_loss: 0.9805
Epoch 4/1000
112/112 - 0s - loss: 0.9637 - val_loss: 0.9518
Epoch 5/1000
112/112 - 0s - loss: 0.9254 - val_loss: 0.9183
Epoch 6/1000
112/112 - 0s - loss: 0.8992 - val_loss: 0.8893
Epoch 7/1000
112/112 - 0s - loss: 0.8797 - val_loss: 0.8608
Epoch 8/1000
112/112 - 0s - loss: 0.8532 - val_loss: 0.8292
Epoch 9/1000
112/112 - 0s - loss: 0.8272 - val_loss: 0.8021
Epoch 10/1000
112/112 - 0s - loss: 0.7993 - val_loss: 0.7723
Epoch 11/1000
112/112 - 0s - loss: 0.7742 - val_loss: 0.7449
Epoch 12/1000
112/112 - 0s - loss: 0.7509 - val_loss: 0.7197
Epoch 13/1000
112/112 - 0s - loss: 0.7255 - val_loss: 0.6940
Epoch 14/1000
112/112 - 0s - loss: 0.7003 - val_loss: 0.6695
Epoch 15/1000
112/112 - 0s - loss: 0.6795 - val_loss: 0.6427
Epoch 16/1000
112/112 - 0s - loss: 0.6564 - val_l

<tensorflow.python.keras.callbacks.History at 0x1a342f5588>

There are a number of parameters that are specified to the **EarlyStopping** object. 

* **min_delta** This value should be kept small. It simply means the minimum change in error to be registered as an improvement.  Setting it even smaller will not likely have a great deal of impact.
* **patience** How long should the training wait for the validation error to improve?  
* **verbose** How much progress information do you want?
* **mode** In general, always set this to "auto".  This allows you to specify if the error should be minimized or maximized.  Consider accuracy, where higher numbers are desired vs log-loss/RMSE where lower numbers are desired.
* **restore_best_weights** This should always be set to true.  This restores the weights to the values they were at when the validation set is the highest.  Unless you are manually tracking the weights yourself (we do not use this technique in this course), you should have Keras perform this step for you.

As you can see from above, the entire number of requested epochs were not used.  The neural network training stopped once the validation set no longer improved.

In [2]:
from sklearn.metrics import accuracy_score

pred = model.predict(x_test)
predict_classes = np.argmax(pred,axis=1)
expected_classes = np.argmax(y_test,axis=1)
correct = accuracy_score(expected_classes,predict_classes)
print(f"Accuracy: {correct}")

Accuracy: 1.0


### Early Stopping with Regression

In [3]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
import pandas as pd
import io
import os
import requests
import numpy as np
from sklearn import metrics

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/auto-mpg.csv", 
    na_values=['NA', '?'])

cars = df['name']

# Handle missing value
df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median())

# Pandas to Numpy
x = df[['cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'year', 'origin']].values
y = df['mpg'].values # regression

# Split into validation and training sets
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

# Build the neural network
model = Sequential()
model.add(Dense(25, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(10, activation='relu')) # Hidden 2
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto',
        restore_best_weights=True)
model.fit(x_train,y_train,validation_data=(x_test,y_test),callbacks=[monitor],verbose=2,epochs=1000)

Train on 298 samples, validate on 100 samples
Epoch 1/1000
298/298 - 0s - loss: 469.4676 - val_loss: 174.5299
Epoch 2/1000
298/298 - 0s - loss: 205.0109 - val_loss: 229.1096
Epoch 3/1000
298/298 - 0s - loss: 174.4931 - val_loss: 149.4208
Epoch 4/1000
298/298 - 0s - loss: 132.4420 - val_loss: 130.8720
Epoch 5/1000
298/298 - 0s - loss: 119.8952 - val_loss: 116.8375
Epoch 6/1000
298/298 - 0s - loss: 111.4138 - val_loss: 107.6508
Epoch 7/1000
298/298 - 0s - loss: 103.4241 - val_loss: 101.8580
Epoch 8/1000
298/298 - 0s - loss: 97.9938 - val_loss: 87.5160
Epoch 9/1000
298/298 - 0s - loss: 88.0546 - val_loss: 79.7848
Epoch 10/1000
298/298 - 0s - loss: 78.6753 - val_loss: 70.8645
Epoch 11/1000
298/298 - 0s - loss: 71.6881 - val_loss: 64.0160
Epoch 12/1000
298/298 - 0s - loss: 65.5250 - val_loss: 65.5996
Epoch 13/1000
298/298 - 0s - loss: 62.7212 - val_loss: 56.4486
Epoch 14/1000
298/298 - 0s - loss: 58.9747 - val_loss: 50.0813
Epoch 15/1000
298/298 - 0s - loss: 50.2088 - val_loss: 41.8455
Epoc

<tensorflow.python.keras.callbacks.History at 0x1a35dfb278>

In [4]:
# Measure RMSE error.  RMSE is common for regression.
pred = model.predict(x_test)
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print(f"Final score (RMSE): {score}")

Final score (RMSE): 4.146816559067575
