# Neural Network Regularization

This assessment covers building and training a `tf.keras` `Sequential` model, then applying regularization.  The dataset comes from a ["don't overfit" Kaggle competition](https://www.kaggle.com/c/dont-overfit-ii).  There are 300 features labeled 0-299, and a target called "target".  There are only 250 records total, meaning this is a very small dataset to be used with a neural network. 

_You can assume that the dataset has already been scaled._

In [1]:
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split, cross_val_score

import tensorflow as tf
from tensorflow.keras import Sequential, regularizers
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
tf.logging.set_verbosity(tf.logging.ERROR)

In the cells below, the set of data has been split into a training and testing set and then fit to a neural network with two hidden layers. Run the cells below to see how well the model performs.

In [2]:
df = pd.read_csv("data.csv")
df.drop("id", axis=1, inplace=True)

X = df.drop("target", axis=1)
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=2020)
X_train.shape

(187, 300)

In [3]:

def build_model():
    classifier = Sequential()
    classifier.add(Dense(units=64, input_shape=(300,)))
    classifier.add(Dense(units=64))
    classifier.add(Dense(units=64))
    classifier.add(Dense(units=1, activation='sigmoid'))
    classifier.compile(optimizer='adam', loss="binary_crossentropy", metrics=['accuracy'])
    return classifier

In [4]:
def fit_and_cross_validate_model(model_func, X, y):
    """
    Given a function that builds a model and training X and y, validate the model based on
    cross-validated train and test data
    """
    keras_classifier = KerasClassifier(build_model, epochs=5, batch_size=50, verbose=1, shuffle=False)
    print("######################## Training cross-validated models ###########################")
    cross_val_scores = cross_val_score(keras_classifier, X, y, cv=5)
    print("########################### Training on full X_train ###############################")
    keras_classifier.fit(X, y)
    print("############################### Evaluation report ##################################")
    print("Approximate training accuracy:")
    print(accuracy_score(y, keras_classifier.predict(X)))
    print("Approximate testing accuracy:")
    print(np.mean(cross_val_scores), "+/-", np.std(cross_val_scores))

In [5]:
fit_and_cross_validate_model(build_model, X_train, y_train);

######################## Training cross-validated models ###########################
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
########################### Training on full X_train ###############################
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
############################### Evaluation report ##################################
Approximate training accuracy:
0.9411764705882353
Approximate testing accuracy:
0.60412517786026 +/- 0.05821661272907996


## 1) Modify the code below to use regularization


The model appears to be overfitting. To deal with this overfitting, modify the code below to include regularization in the model. You can add L1, L2, both L1 and L2, or dropout regularization.

Hint: these might be helpful

 - [`Dense` layer documentation](https://keras.io/layers/core/)
 - [`regularizers` documentation](https://keras.io/regularizers/)

In [6]:
def build_model_with_regularization():
    classifier = Sequential()
    classifier.add(Dense(units=64, input_shape=(300,), kernel_regularizer=regularizers.l2(0.0000000000000001)))
    # they might add a kernel regularizer
    classifier.add(Dense(units=64, kernel_regularizer=regularizers.l2(0.0000000000000001)))
    # they might add a dropout layer
    classifier.add(Dropout(0.8))
    classifier.add(Dense(units=64, kernel_regularizer=regularizers.l2(0.0000000000000001)))
    classifier.add(Dense(units=1, activation='sigmoid'))
    classifier.compile(optimizer='adam', loss="binary_crossentropy", metrics=['accuracy'])
    return classifier


In [7]:
fit_and_cross_validate_model(build_model_with_regularization, X_train, y_train);

######################## Training cross-validated models ###########################
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
########################### Training on full X_train ###############################
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
############################### Evaluation report ##################################
Approximate training accuracy:
0.9144385026737968
Approximate testing accuracy:
0.5560455083847046 +/- 0.041120110886002884


### Based on the cross-validated scores, did the regularization you performed help prevent overfitting? Is the first or the second model better?

In [None]:
# It may or may not have prevented overfitting, depending on random elements
# within the neural net as well as their choice of regularization technique
#
# (TensorFlow + random seeding is not fully possible in a Jupyter Notebook)
#
# The student should interpret the numbers they have
#
# In the example given above, a reasonable answer would be:
# The regularization is helping to prevent overfitting, but it also might be
# causing some underfitting.  The train and test accuracy are more similar to
# each other, but the test accuracy also got slightly worse.  I think the
# original model is better, even though it is overfitting.
#
# It is also very likely that they will not have applied strong enough
# regularization to make a difference, so the scores for the two models will
# mainly differ based on random seeds

### Now, evaluate both models on the holdout set

In [8]:
classifier_1 = build_model()
classifier_1.fit(X_train, y_train, epochs=5, verbose=1, batch_size=50, shuffle=False)

classifier_2 = build_model_with_regularization()
classifier_2.fit(X_train, y_train, epochs=5, verbose=1, batch_size=50, shuffle=False)

print("Accuracy score without regularization:", accuracy_score(y_test, classifier_1.predict_classes(X_test)))
print("Accuracy score with regularization:", accuracy_score(y_test, classifier_2.predict_classes(X_test)))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Accuracy score without regularization: 0.6031746031746031
Accuracy score with regularization: 0.5396825396825397


### 2) Explain how regularization is related to the bias/variance tradeoff within Neural Networks and how it's related to the results you just achieved in the training and test accuracies of the previous models. What does regularization change in the training process (be specific to what is being regularized and how it is regularizing)?


In [None]:
# Regularization helps prevent over fitting by adding penalty terms to the cost function. 
# This prevents any one feature to having too much importance in a model.  One feature
# having too much importance can lead to overfitting (high variance).  On the other hand,
# too much regularization can lead to underfitting (high bias).
#
# The specific regularization used in the solution code is:
# L2 regularization: penalizes weight matrices for being too large
# Dropout regularization: a random subset of nodes are ignored
#
# The current dataset is very small to be used with a neural network, so it's possible that
# we don't actually have enough information to create a good, generalizable model

### 3) How might L1  and dropout regularization change a neural network's architecture?

In [None]:
# L1 and dropout regularization may eliminate connections between nodes entirely.