<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

# Hyperparameter Tuning

## *Data Science Unit 4 Sprint 2 Assignment 4*

## Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: [Available Here](https://lambdaschool-data-science.s3.amazonaws.com/telco-churn/WA_Fn-UseC_-Telco-Customer-Churn+(1).csv)

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


In [116]:
import pandas as pd
import numpy as np

from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

In [117]:
# LOAD DATA
df = pd.read_csv('churn.csv')

In [118]:
# CLEAN DATA
df['gender'] = df['gender'].eq('Male').mul(1)

def yes_no(text):
    return 1 if text == 'Yes' else 0

yes_no_cols = ['Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'PaperlessBilling', 'Churn']
for col in yes_no_cols:
    df[col] = df[col].apply(yes_no)

In [119]:
# ONE HOT ENCODING
ohc = ['InternetService', 'Contract', 'PaymentMethod']
encoder = OneHotEncoder()
encoded = encoder.fit_transform(df[ohc]).toarray()
columns = encoder.get_feature_names(ohc)
df_encoded = pd.DataFrame(encoded, columns=columns)

In [120]:
# MERGE DATAFRAMES
df = df.drop(ohc, axis=1)
df = df.drop(['customerID'], axis=1)
df = pd.concat([df, df_encoded], axis=1)

In [121]:
# OUTPUT AS NUMPY
target = 'Churn'
y = df[target].to_numpy()
X = df.drop(target, axis=1)
X['TotalCharges'] = X['TotalCharges'].replace(' ', 0.)
X = X.astype(float).to_numpy()

In [122]:
# NORMALIZE DATA
scaler = StandardScaler()
X = scaler.fit_transform(X)

In [165]:
# MODEL
def create_model(optimizer='adam', learning_rate=0.001):
    model = Sequential()
    model.add(Dense(26, input_dim=26, activation='relu'))
    model.add(Dense(26, activation='relu'))
    model.add(Dense(26, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='mse', optimizer=optimizer, metrics=['mse', 'accuracy'])
    return model

In [159]:
model = create_model()
results = model.fit(X, y, epochs=20, verbose=False)

In [160]:
model.evaluate(X, y)



[0.11576873730023392, 0.11576871, 0.8368593]

In [161]:
def print_results(grid_result):
    print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
    means = grid_result.cv_results_['mean_test_score']
    stds = grid_result.cv_results_['std_test_score']
    params = grid_result.cv_results_['params']
    for mean, stdev, param in zip(means, stds, params):
        print(f"Means: {mean}, Stdev: {stdev} with: {param}")

In [162]:
# HYPERPARAMETER TUNING, MUST USE GRID SEARCH AND CROSS VALIDATION
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.callbacks import EarlyStopping

stop = EarlyStopping(monitor='accuracy', min_delta=0.001, patience=5)

In [163]:
# BATCH SIZE
model = KerasClassifier(build_fn=create_model, verbose=False)
parameters = {
    'batch_size': [10, 20, 40, 60, 80, 100],
    'epochs': [20]
}

grid = GridSearchCV(
    estimator=model,
    param_grid=parameters,
    n_jobs=1,
    cv=3
)
grid_result = grid.fit(X, y, callbacks=[stop])
print_results(grid_result)

Best: 0.7949739694595337 using {'batch_size': 100, 'epochs': 20}
Means: 0.7802069783210754, Stdev: 0.00594349978026747 with: {'batch_size': 10, 'epochs': 20}
Means: 0.7843254804611206, Stdev: 0.003445340237015599 with: {'batch_size': 20, 'epochs': 20}
Means: 0.7884420355161031, Stdev: 0.003541267201158549 with: {'batch_size': 40, 'epochs': 20}
Means: 0.7946901321411133, Stdev: 0.00205129291787358 with: {'batch_size': 60, 'epochs': 20}
Means: 0.7925598621368408, Stdev: 0.002183593514392557 with: {'batch_size': 80, 'epochs': 20}
Means: 0.7949739694595337, Stdev: 0.0025898276908970287 with: {'batch_size': 100, 'epochs': 20}


In [152]:
# EPOCHS
model = KerasClassifier(build_fn=create_model, verbose=False)
parameters = {
    'batch_size': [10],
    'epochs': [5, 10, 20, 30, 40, 50]
}
grid_result = grid.fit(X, y, callbacks=[stop])
print_results(grid_result)

Best: 0.26537078619003296 using {'batch_size': 10, 'epochs': 20}
Means: 0.26537078619003296, Stdev: 0.004601538989727422 with: {'batch_size': 10, 'epochs': 20}
Means: 0.26537078619003296, Stdev: 0.004601538989727422 with: {'batch_size': 20, 'epochs': 20}
Means: 0.26537078619003296, Stdev: 0.004601538989727422 with: {'batch_size': 40, 'epochs': 20}
Means: 0.26537078619003296, Stdev: 0.004601538989727422 with: {'batch_size': 60, 'epochs': 20}
Means: 0.26537078619003296, Stdev: 0.004601538989727422 with: {'batch_size': 80, 'epochs': 20}
Means: 0.26537078619003296, Stdev: 0.004601538989727422 with: {'batch_size': 100, 'epochs': 20}


In [164]:
# OPTIMIZER
model = KerasClassifier(build_fn=create_model, verbose=False)
parameters = {
    'batch_size': [10],
    'optimizer': ['adam', 'nadam', 'sgd', 'adadelta', 'adagrad'],
    'epochs':[20]
}
grid = GridSearchCV(
    estimator=model,
    param_grid=parameters,
    n_jobs=1,
    cv=3
)
grid_result = grid.fit(X, y, callbacks=[stop])
print_results(grid_result)

Best: 0.7973870237668356 using {'batch_size': 10, 'epochs': 20, 'optimizer': 'sgd'}
Means: 0.7802066802978516, Stdev: 0.003907393050319533 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'adam'}
Means: 0.7789284586906433, Stdev: 0.00877427163311916 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'nadam'}
Means: 0.7973870237668356, Stdev: 0.0023548223398968332 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'sgd'}
Means: 0.6564104159673055, Stdev: 0.0697960940020746 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'adadelta'}
Means: 0.7854603926340739, Stdev: 0.0021578498765298898 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'adagrad'}


In [168]:
# LEARNING RATE
model = KerasClassifier(build_fn=create_model, verbose=False)
parameters = {
    'batch_size': [10],
    'epochs':[20],
    'optimizer': ['sgd'],
    'learning_rate': [0.001, 0.01, 0.1]

}
grid = GridSearchCV(
    estimator=model,
    param_grid=parameters,
    n_jobs=1,
    cv=3
)
grid_result = grid.fit(X, y, callbacks=[stop])
print_results(grid_result)

Best: 0.799659013748169 using {'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'optimizer': 'sgd'}
Means: 0.799659013748169, Stdev: 0.0019388835439777708 with: {'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'optimizer': 'sgd'}
Means: 0.7961095174153646, Stdev: 0.0005628553981703821 with: {'batch_size': 10, 'epochs': 20, 'learning_rate': 0.01, 'optimizer': 'sgd'}
Means: 0.7971039414405823, Stdev: 0.002576133387160183 with: {'batch_size': 10, 'epochs': 20, 'learning_rate': 0.1, 'optimizer': 'sgd'}


In [171]:
# MOMENTUM
from tensorflow.keras.optimizers import SGD

# MODEL
def create_model(momentum=0.0, learning_rate=0.001):
    model = Sequential()
    model.add(Dense(26, input_dim=26, activation='relu'))
    model.add(Dense(26, activation='relu'))
    model.add(Dense(26, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='mse', optimizer=SGD(learning_rate=learning_rate, momentum=momentum), metrics=['mse', 'accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=False)

parameters = {
    'batch_size': [10],
    'epochs':[20],
    'learning_rate': [0.001],
    'momentum': [0.0, 0.01, 0.02, 0.05]

}
grid = GridSearchCV(
    estimator=model,
    param_grid=parameters,
    n_jobs=1,
    cv=3
)
grid_result = grid.fit(X, y, callbacks=[stop])
print_results(grid_result)

Best: 0.7706962625185648 using {'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.02}
Means: 0.7346291939417521, Stdev: 0.004601525031331852 with: {'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.0}
Means: 0.744992713133494, Stdev: 0.01619429611592491 with: {'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.01}
Means: 0.7706962625185648, Stdev: 0.020058926012406053 with: {'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.02}
Means: 0.7652935981750488, Stdev: 0.026420731922726974 with: {'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.05}


In [172]:
# ACTIVATION FUNCTIONS

def create_model(momentum=0.0, learning_rate=0.001, add_layers=1, activation_fxn='relu'):
    model = Sequential()
    model.add(Dense(26, input_dim=26, activation=activation_fxn))
    for layer in range(add_layers):
        model.add(Dense(26, activation=activation_fxn))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='mse', optimizer=SGD(learning_rate=learning_rate, momentum=momentum), metrics=['mse', 'accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=False)

parameters = {
    'batch_size': [10],
    'epochs':[20],
    'learning_rate': [0.001],
    'momentum': [0.02],
    'activation_fxn': ['relu', 'sigmoid', 'softmax']

}
grid = GridSearchCV(
    estimator=model,
    param_grid=parameters,
    n_jobs=1,
    cv=3
)
grid_result = grid.fit(X, y, callbacks=[stop])
print_results(grid_result)

Best: 0.7824799418449402 using {'activation_fxn': 'relu', 'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.02}
Means: 0.7824799418449402, Stdev: 0.004707336098485002 with: {'activation_fxn': 'relu', 'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.02}
Means: 0.7346291939417521, Stdev: 0.004601525031331852 with: {'activation_fxn': 'sigmoid', 'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.02}
Means: 0.7346291939417521, Stdev: 0.004601525031331852 with: {'activation_fxn': 'softmax', 'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.02}


In [175]:
# NUMBER OF LAYERS
model = KerasClassifier(build_fn=create_model, verbose=False)

parameters = {
    'batch_size': [10],
    'epochs':[20],
    'learning_rate': [0.001],
    'momentum': [0.02],
    'activation_fxn': ['relu'],
    'add_layers': [1, 2, 3, 4, 5]
}
grid = GridSearchCV(
    estimator=model,
    param_grid=parameters,
    n_jobs=1,
    cv=3
)
grid_result = grid.fit(X, y, callbacks=[stop])
print_results(grid_result)

Best: 0.7718306581179301 using {'activation_fxn': 'relu', 'add_layers': 1, 'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.02}
Means: 0.7718306581179301, Stdev: 0.0037230276835275387 with: {'activation_fxn': 'relu', 'add_layers': 1, 'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.02}
Means: 0.7346293330192566, Stdev: 0.003973197106041736 with: {'activation_fxn': 'relu', 'add_layers': 2, 'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.02}
Means: 0.7346291939417521, Stdev: 0.004601525031331852 with: {'activation_fxn': 'relu', 'add_layers': 3, 'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.02}
Means: 0.7346291939417521, Stdev: 0.004601525031331852 with: {'activation_fxn': 'relu', 'add_layers': 4, 'batch_size': 10, 'epochs': 20, 'learning_rate': 0.001, 'momentum': 0.02}
Means: 0.7346291939417521, Stdev: 0.004601525031331852 with: {'activation_fxn': 'relu', 'add_layers': 5, 'batch_size': 10, 'epochs': 2

In [None]:
# NETWORK WEIGHTS

In [114]:
# DROPOUT REGULARIZATION

In [None]:
# NUMBER OF NEURONS IN HIDDEN LAYER

## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset using hyperas or hyperopt (if you're brave)
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?