## Evaluate the Performance

#### Data Splitting
The large amount of data and the complexity of the models require very long training times.

As such, it is typically to use a simple separation of data into training and test datasets or training and validation datasets.

Keras provides a two convenient ways of evaluating your deep learning algorithms this way:

- Use an automatic verification dataset.
- Use a manual verification dataset.

#### 1. Use a Automatic Verification Dataset
Keras can separate a portion of your training data into a validation dataset and evaluate the performance of your model on that validation dataset __each epoch__.

You can do this by setting the __validation_split__ argument on the __fit()__ function to a percentage of the size of your training dataset.

For example, a reasonable value might be 0.2 or 0.33 for 20% or 33% of your training data held back for validation.

In [1]:
# first neural network with keras tutorial
import pandas as pd
import numpy  as np

from keras.models import Sequential
from keras.layers import Dense

In [4]:
location = r'D:\MYLEARN\datasets\pima.csv'

In [5]:
data = pd.read_csv(location)

In [6]:
# split into input (X) and output (y) variables
X = data.iloc[:, 0:8]
y = data.iloc[:, 8]

In [7]:
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Fit the model
model.fit(X, y, validation_split=0.33, epochs=150, batch_size=10)

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x150f4f76790>

#### 2. Use a Manual Verification Dataset
Keras also allows you to manually specify the dataset to use for validation during training.

we can use the __train_test_split()__ function from the Python scikit-learn machine learning library to separate our data into a training and test dataset. We use 67% for training and the remaining 33% of the data for validation.

The validation dataset can be specified to the __fit()__ function in Keras by the validation_data argument. It takes a tuple of the input and output datasets.

In [7]:
from sklearn.model_selection import train_test_split

In [8]:
# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

In [11]:
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

In [12]:
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [13]:
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), epochs=150, batch_size=10)

Train on 514 samples, validate on 254 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
E

<keras.callbacks.callbacks.History at 0x20c6bebc9c8>

#### 3. Manual k-Fold Cross Validation
The gold standard for machine learning model evaluation is k-fold cross validation.

It provides a robust estimate of the performance of a model on unseen data. It does this by splitting the training dataset into __k subsets__ and takes turns training models on all subsets except one which is held out, and evaluating model performance on the held out validation dataset. 

The process is repeated until all subsets are given an opportunity to be the held out validation set. 

The performance measure is then averaged across all models that are created.

> Cross validation is often __not used__ for evaluating deep learning models because of the greater computational expense. 
For example k-fold cross validation is often used with 5 or 10 folds. As such, 5 or 10 models must be constructed and evaluated, greatly adding to the evaluation time of a model.

> when the problem is small enough or if you have sufficient compute resources, k-fold cross validation can give you a less biased estimate of the performance of your model.

- we can use the handy __StratifiedKFold__ class from the scikit-learn Python machine learning library to split up the training dataset into 10 folds. The folds are stratified, meaning that the algorithm attempts to balance the number of instances of each class in each fold.

The example creates and evaluates 10 models using the 10 splits of the data and collects all of the scores. The verbose output for each epoch is turned off by passing verbose=0 to the fit() and evaluate() functions on the model.

The performance is printed for each model and it is stored. The average and standard deviation of the model performance is then printed at the end of the run to provide a robust estimate of model accuracy.

In [8]:
location = r'D:\MYLEARN\datasets\pima.csv'

In [9]:
data = pd.read_csv(location)

In [10]:
# split into input (X) and output (y) variables
X = data.iloc[:, 0:8]
y = data.iloc[:, 8]

In [11]:
from sklearn.model_selection import KFold

In [12]:
# define 10-fold cross validation test 
kfold = KFold(n_splits=3, shuffle=True, random_state=1)

In [13]:
for train, test in kfold.split(X, y):
    print(len(train), len(test))

512 256
512 256
512 256


In [62]:
# cvscores = []

# for train, test in kfold.split(X, y):
    
#     # create model
#     model = Sequential()
#     model.add(Dense(12, input_dim=8, activation='relu'))
#     model.add(Dense(8, activation='relu'))
#     model.add(Dense(1, activation='sigmoid'))
    
#     # Compile model
#     model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
#     # Fit the model
#     model.fit(X[train], y[train], epochs=150, batch_size=10, verbose=1)
    
#     # evaluate the model
#     scores = model.evaluate(X[test], y[test], verbose=0)
    
#     print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
    
#     cvscores.append(scores[1] * 100)
    
# print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores), np.std(cvscores)))

In [14]:
import numpy as np
from keras import models
from keras import layers

from keras.wrappers.scikit_learn import KerasClassifier

from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_classification

In [15]:
# Set random seed
np.random.seed(0)

In [16]:
# Number of features
number_of_features = 100

# Generate features matrix and target vector
features, target = make_classification(n_samples = 10000,
                                       n_features = number_of_features,
                                       n_informative = 3,
                                       n_redundant = 0,
                                       n_classes = 2,
                                       weights = [.5, .5],
                                       random_state = 0)

In [17]:
# Create function returning a compiled network
def create_network():
    
    # Start neural network
    network = models.Sequential()

    # Add fully connected layer with a ReLU activation function
    network.add(layers.Dense(units=16, activation='relu', input_shape=(number_of_features,)))

    # Add fully connected layer with a ReLU activation function
    network.add(layers.Dense(units=16, activation='relu'))

    # Add fully connected layer with a sigmoid activation function
    network.add(layers.Dense(units=1, activation='sigmoid'))

    # Compile neural network
    network.compile(loss='binary_crossentropy', # Cross-entropy
                    optimizer='rmsprop', # Root Mean Square Propagation
                    metrics=['accuracy']) # Accuracy performance metric
    
    # Return compiled network
    return network

In [18]:
# Wrap Keras model so it can be used by scikit-learn
neural_network = KerasClassifier(build_fn=create_network, 
                                 epochs=10, 
                                 batch_size=100, 
                                 verbose=0)

In [19]:
# Evaluate neural network using three-fold cross-validation
cross_val_score(neural_network, 
                features, 
                target, 
                cv=10)

array([0.90700001, 0.93300003, 0.92400002, 0.935     , 0.93300003,
       0.93599999, 0.91299999, 0.92799997, 0.92000002, 0.91399997])