<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

# Hyperparameter Tuning

## *Data Science Unit 4 Sprint 2 Assignment 4*

## Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: [Available Here](https://lambdaschool-data-science.s3.amazonaws.com/telco-churn/WA_Fn-UseC_-Telco-Customer-Churn+(1).csv)

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


## Loading and Cleaning the Dataset

In [1]:
url = "https://lambdaschool-data-science.s3.amazonaws.com/telco-churn/WA_Fn-UseC_-Telco-Customer-Churn+(1).csv"

In [172]:
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

In [114]:
df = pd.read_csv(url)
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [115]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


In [116]:
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'],errors='coerce')

In [117]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


In [118]:
df.columns

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')

In [119]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

le = LabelEncoder()
encoder = OneHotEncoder()

cats = ["gender", "Partner", "Dependents", "PhoneService", "MultipleLines", "InternetService",
        "OnlineSecurity", "OnlineBackup", "DeviceProtection", "TechSupport", "StreamingTV", 
        "StreamingMovies", "Contract", "PaperlessBilling", "PaymentMethod"]

for col in cats:
    df[col]= le.fit_transform(df[col]) 
    
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,0,0,1,0,1,0,1,0,0,...,0,0,0,0,0,1,2,29.85,29.85,No
1,5575-GNVDE,1,0,0,0,34,1,0,0,2,...,2,0,0,0,1,0,3,56.95,1889.5,No
2,3668-QPYBK,1,0,0,0,2,1,0,0,2,...,0,0,0,0,0,1,3,53.85,108.15,Yes
3,7795-CFOCW,1,0,0,0,45,0,1,0,2,...,2,2,0,0,1,0,0,42.3,1840.75,No
4,9237-HQITU,0,0,0,0,2,1,0,1,0,...,0,0,0,0,0,1,2,70.7,151.65,Yes


In [120]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   int32  
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   int32  
 4   Dependents        7043 non-null   int32  
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   int32  
 7   MultipleLines     7043 non-null   int32  
 8   InternetService   7043 non-null   int32  
 9   OnlineSecurity    7043 non-null   int32  
 10  OnlineBackup      7043 non-null   int32  
 11  DeviceProtection  7043 non-null   int32  
 12  TechSupport       7043 non-null   int32  
 13  StreamingTV       7043 non-null   int32  
 14  StreamingMovies   7043 non-null   int32  
 15  Contract          7043 non-null   int32  
 16  PaperlessBilling  7043 non-null   int32  


In [121]:
# Change the Churn values to numbers
df['Churn_Num'] = df['Churn'].map(dict(Yes=1, No=0))
df['Churn_Num'].head(10)

0    0
1    0
2    1
3    0
4    1
5    1
6    0
7    0
8    1
9    0
Name: Churn_Num, dtype: int64

In [122]:
X_df = df[df.columns[1:20]]
X_df.head()

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges
0,0,0,1,0,1,0,1,0,0,2,0,0,0,0,0,1,2,29.85,29.85
1,1,0,0,0,34,1,0,0,2,0,2,0,0,0,1,0,3,56.95,1889.5
2,1,0,0,0,2,1,0,0,2,2,0,0,0,0,0,1,3,53.85,108.15
3,1,0,0,0,45,0,1,0,2,0,2,2,0,0,1,0,0,42.3,1840.75
4,0,0,0,0,2,1,0,1,0,0,0,0,0,0,0,1,2,70.7,151.65


In [123]:
for col in X_df.columns:
    print(X_df[col].value_counts(dropna=False))

1    3555
0    3488
Name: gender, dtype: int64
0    5901
1    1142
Name: SeniorCitizen, dtype: int64
0    3641
1    3402
Name: Partner, dtype: int64
0    4933
1    2110
Name: Dependents, dtype: int64
1     613
72    362
2     238
3     200
4     176
     ... 
28     57
39     56
44     51
36     50
0      11
Name: tenure, Length: 73, dtype: int64
1    6361
0     682
Name: PhoneService, dtype: int64
0    3390
2    2971
1     682
Name: MultipleLines, dtype: int64
1    3096
0    2421
2    1526
Name: InternetService, dtype: int64
0    3498
2    2019
1    1526
Name: OnlineSecurity, dtype: int64
0    3088
2    2429
1    1526
Name: OnlineBackup, dtype: int64
0    3095
2    2422
1    1526
Name: DeviceProtection, dtype: int64
0    3473
2    2044
1    1526
Name: TechSupport, dtype: int64
0    2810
2    2707
1    1526
Name: StreamingTV, dtype: int64
0    2785
2    2732
1    1526
Name: StreamingMovies, dtype: int64
0    3875
2    1695
1    1473
Name: Contract, dtype: int64
1    4171
0    2872
Name

In [124]:
X_df['TotalCharges'] = X_df['TotalCharges'].fillna(0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [125]:
X = X_df.to_numpy()
X.shape

(7043, 19)

In [140]:
from tensorflow.keras.utils import normalize

X_norm = normalize(X)

In [126]:
Y_df = df[df.columns[21]]
Y_df.head()

0    0
1    0
2    1
3    0
4    1
Name: Churn_Num, dtype: int64

In [127]:
Y = Y_df.to_numpy()
Y.shape

(7043,)

In [141]:
print(type(X))
print(type(Y))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


In [142]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_norm, Y)

In [143]:
X_train.shape, y_train.shape, X_test.shape, y_test.shape

((5282, 19), (5282,), (1761, 19), (1761,))

## Baseline Keras Model

In [48]:
# X_train = np.asarray(X_train)
# X_test = np.asarray(X_test)
# y_train = np.asarray(y_train)
# y_test = np.asarray(y_test)

In [131]:
type(X_train), type(y_train)

(numpy.ndarray, numpy.ndarray)

In [144]:
X_train

array([[1.52113529e-04, 0.00000000e+00, 1.52113529e-04, ...,
        0.00000000e+00, 1.66488257e-02, 9.99819407e-01],
       [4.39686344e-04, 4.39686344e-04, 0.00000000e+00, ...,
        8.79372687e-04, 4.58812700e-02, 9.98901420e-01],
       [1.89031964e-04, 0.00000000e+00, 0.00000000e+00, ...,
        1.89031964e-04, 1.59353946e-02, 9.99799509e-01],
       ...,
       [2.58073032e-03, 0.00000000e+00, 0.00000000e+00, ...,
        2.58073032e-03, 5.08403874e-02, 9.97452270e-01],
       [3.45221882e-02, 0.00000000e+00, 0.00000000e+00, ...,
        1.03566564e-01, 6.97348201e-01, 6.97348201e-01],
       [1.21747403e-03, 0.00000000e+00, 0.00000000e+00, ...,
        0.00000000e+00, 2.38016173e-02, 9.98267831e-01]])

In [133]:
y_train

array([1, 0, 1, ..., 0, 1, 1], dtype=int64)

In [103]:
from tensorflow.keras.optimizers import Adam

# optimizer=Adam(learning_rate=.0001, clipnorm=1)

In [145]:
# Create the model
model = Sequential()
model.add(Dense(20, input_dim=19, activation='sigmoid'))
model.add(Dense(20, activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))

# Compile the Model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Fit the model
model.fit(X_train, y_train,
          validation_data=(X_test, y_test),
          epochs=10)

Train on 5282 samples, validate on 1761 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x133112637b8>

### Baseline accuracy of 76%

## Tune Hyperparameters

### Batch Size

In [146]:
# Function to create model, required for KerasClassifier
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(128, input_dim=19, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_norm, Y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7900053024291992 using {'batch_size': 10, 'epochs': 20}
Means: 0.7900053024291992, Stdev: 0.0077126740055040175 with: {'batch_size': 10, 'epochs': 20}
Means: 0.7881592988967896, Stdev: 0.005900721595859639 with: {'batch_size': 20, 'epochs': 20}
Means: 0.78560391664505, Stdev: 0.005686729215171136 with: {'batch_size': 40, 'epochs': 20}
Means: 0.7797819614410401, Stdev: 0.0033302999175679263 with: {'batch_size': 60, 'epochs': 20}
Means: 0.7775097370147706, Stdev: 0.002985235754600714 with: {'batch_size': 80, 'epochs': 20}
Means: 0.776231837272644, Stdev: 0.004237812947991486 with: {'batch_size': 100, 'epochs': 20}


#### Best Score: 79% with batch size of 10

### Optimizer

In [147]:
# Function to create model, required for KerasClassifier
def create_model(optimizer='adam'):
    # create model
    model = Sequential()
    model.add(Dense(128, input_dim=19, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

param_grid = {'batch_size': [10],
              'optimizer': ['adam', 'nadam', 'sgd', 'adadelta', 'adagrad'],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_norm, Y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7865974307060242 using {'batch_size': 10, 'epochs': 20, 'optimizer': 'adam'}
Means: 0.7865974307060242, Stdev: 0.004771093032551508 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'adam'}
Means: 0.7857446432113647, Stdev: 0.008061100382002955 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'nadam'}
Means: 0.7550762295722961, Stdev: 0.006709139558555549 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'sgd'}
Means: 0.734344756603241, Stdev: 0.004624123504553289 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'adadelta'}
Means: 0.7386031985282898, Stdev: 0.013451801906467574 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'adagrad'}


#### Best Score: 78.7% with batch size of 10 and adam optimzer

### Learning Rate

In [153]:
from tensorflow.keras.optimizers import Adam

# Function to create model, required for KerasClassifier
def create_model(learn_rate=0.001):
    # create model
    model = Sequential()
    model.add(Dense(128, input_dim=19, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    optimizer = Adam(lr=learn_rate)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=10, verbose=0)

param_grid = {'learn_rate': [.001, .01, .1, .2, .3, .5],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_norm, Y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.793838107585907 using {'epochs': 20, 'learn_rate': 0.01}
Means: 0.7860296249389649, Stdev: 0.005652749825580089 with: {'epochs': 20, 'learn_rate': 0.001}
Means: 0.793838107585907, Stdev: 0.004856577116414651 with: {'epochs': 20, 'learn_rate': 0.01}
Means: 0.7813415408134461, Stdev: 0.008761379738999897 with: {'epochs': 20, 'learn_rate': 0.1}
Means: 0.7451325535774231, Stdev: 0.02185186759967692 with: {'epochs': 20, 'learn_rate': 0.2}
Means: 0.7354796171188355, Stdev: 0.008877530709330712 with: {'epochs': 20, 'learn_rate': 0.3}
Means: 0.7346286535263061, Stdev: 0.004757632470334884 with: {'epochs': 20, 'learn_rate': 0.5}


#### Best Score: 79.3% with learning rate of 0.01

### Learning Rate - Refined

In [154]:
from tensorflow.keras.optimizers import Adam

# Function to create model, required for KerasClassifier
def create_model(learn_rate=0.001):
    # create model
    model = Sequential()
    model.add(Dense(128, input_dim=19, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    optimizer = Adam(lr=learn_rate)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=10, verbose=0)

param_grid = {'learn_rate': [.005, .008, .01, .012, .015],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_norm, Y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7914241433143616 using {'epochs': 20, 'learn_rate': 0.012}
Means: 0.7867377400398254, Stdev: 0.011807875627288647 with: {'epochs': 20, 'learn_rate': 0.005}
Means: 0.7908575773239136, Stdev: 0.005547373957795438 with: {'epochs': 20, 'learn_rate': 0.008}
Means: 0.7895797729492188, Stdev: 0.006591223171022952 with: {'epochs': 20, 'learn_rate': 0.01}
Means: 0.7914241433143616, Stdev: 0.010386277592608524 with: {'epochs': 20, 'learn_rate': 0.012}
Means: 0.7908577799797059, Stdev: 0.006617300444945898 with: {'epochs': 20, 'learn_rate': 0.015}


#### Best Score: 79.1% with learning rate of 0.012

### Epochs

In [156]:
from tensorflow.keras.optimizers import Adam

# Function to create model, required for KerasClassifier
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(128, input_dim=19, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    optimizer = Adam(lr=0.012)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=10, verbose=0)

param_grid = {'epochs': [10, 20, 30, 50, 70, 90]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_norm, Y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7979570388793945 using {'epochs': 90}
Means: 0.7878759384155274, Stdev: 0.007381748793887982 with: {'epochs': 10}
Means: 0.7941220879554749, Stdev: 0.0063861040529461285 with: {'epochs': 20}
Means: 0.7928454995155334, Stdev: 0.007058988058106029 with: {'epochs': 30}
Means: 0.7971043348312378, Stdev: 0.010771027738452541 with: {'epochs': 50}
Means: 0.7948333263397217, Stdev: 0.00907692533317782 with: {'epochs': 70}
Means: 0.7979570388793945, Stdev: 0.008097939328829 with: {'epochs': 90}


#### Best Score: 79.8% with 90 epochs

### Hidden Layers

In [161]:
from tensorflow.keras.optimizers import Adam

# Function to create model, required for KerasClassifier
def create_model(layers=0):
    # create model
    model = Sequential()
    model.add(Dense(128, input_dim=19, activation='relu'))
    for i in range(layers):
        model.add(Dense(128, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    optimizer = Adam(lr=0.012)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=10, epochs=90, verbose=0)

param_grid = {'layers': [1, 2, 3, 4, 5]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_norm, Y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7936984658241272 using {'layers': 3}
Means: 0.7918507814407348, Stdev: 0.008351878522173386 with: {'layers': 1}
Means: 0.7927035570144654, Stdev: 0.005531252800906223 with: {'layers': 2}
Means: 0.7936984658241272, Stdev: 0.014055112682957675 with: {'layers': 3}
Means: 0.7704185485839844, Stdev: 0.03360924160177946 with: {'layers': 4}
Means: 0.7831918597221375, Stdev: 0.009983455816664327 with: {'layers': 5}


#### Best Score: 79.3% with 3 hidden layers

### # of Neurons

In [163]:
from tensorflow.keras.optimizers import Adam

# Function to create model, required for KerasClassifier
def create_model(neurons=1):
    # create model
    model = Sequential()
    model.add(Dense(neurons, input_dim=19, activation='relu'))
    for i in range(3):
        model.add(Dense(neurons, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    optimizer = Adam(lr=0.012)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=10, epochs=90, verbose=0)

param_grid = {'neurons': [1, 5, 10, 20, 30, 50, 75, 100, 125, 150]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_norm, Y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7976723313331604 using {'neurons': 30}
Means: 0.7346286535263061, Stdev: 0.004757632470334884 with: {'neurons': 1}
Means: 0.7976708054542542, Stdev: 0.0061605871232331055 with: {'neurons': 5}
Means: 0.7918492674827575, Stdev: 0.004938415827607498 with: {'neurons': 10}
Means: 0.7922765135765075, Stdev: 0.006877493055978661 with: {'neurons': 20}
Means: 0.7976723313331604, Stdev: 0.0070004884473836996 with: {'neurons': 30}
Means: 0.7939803600311279, Stdev: 0.0064479522475314405 with: {'neurons': 50}
Means: 0.7907130002975464, Stdev: 0.011018603914706616 with: {'neurons': 75}
Means: 0.79525705575943, Stdev: 0.012273841895596672 with: {'neurons': 100}
Means: 0.791282308101654, Stdev: 0.011563328942493923 with: {'neurons': 125}
Means: 0.7917088389396667, Stdev: 0.008900482798879247 with: {'neurons': 150}


#### Best Score: 79.8% with 30 neurons per layer

### Activation Function

In [164]:
from tensorflow.keras.optimizers import Adam

# Function to create model, required for KerasClassifier
def create_model(activation='relu'):
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=19, activation=activation))
    for i in range(3):
        model.add(Dense(30, activation=activation))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    optimizer = Adam(lr=0.012)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=10, epochs=90, verbose=0)

param_grid = {'activation': ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_norm, Y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7959675669670105 using {'activation': 'softplus'}
Means: 0.7952586650848389, Stdev: 0.005777923000001376 with: {'activation': 'softmax'}
Means: 0.7959675669670105, Stdev: 0.0074266496194318925 with: {'activation': 'softplus'}
Means: 0.7834745407104492, Stdev: 0.009029476684623903 with: {'activation': 'softsign'}
Means: 0.7942638516426086, Stdev: 0.007312034031957119 with: {'activation': 'relu'}
Means: 0.7400225400924683, Stdev: 0.012997464581681025 with: {'activation': 'tanh'}
Means: 0.791851794719696, Stdev: 0.0070104889256460984 with: {'activation': 'sigmoid'}
Means: 0.7951163172721862, Stdev: 0.006258636696086447 with: {'activation': 'hard_sigmoid'}
Means: 0.7816280484199524, Stdev: 0.0026800755794137254 with: {'activation': 'linear'}


#### Best Score: 79.6% with the softplus activation function

### Network Weight Initialization

In [167]:
from tensorflow.keras.optimizers import Adam

# Function to create model, required for KerasClassifier
def create_model(init_mode='uniform'):
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=19, kernel_initializer=init_mode, activation='softplus'))
    for i in range(3):
        model.add(Dense(30, activation='softplus'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    optimizer = Adam(lr=0.012)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=10, epochs=90, verbose=0)

param_grid = {'init_mode': ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_norm, Y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7983824610710144 using {'init_mode': 'zero'}
Means: 0.7952580571174621, Stdev: 0.01056426899793999 with: {'init_mode': 'uniform'}
Means: 0.7956850051879882, Stdev: 0.007988649361056846 with: {'init_mode': 'lecun_uniform'}
Means: 0.790289294719696, Stdev: 0.008789148208859092 with: {'init_mode': 'normal'}
Means: 0.7983824610710144, Stdev: 0.0067319531353292154 with: {'init_mode': 'zero'}
Means: 0.7963949203491211, Stdev: 0.011529374921576763 with: {'init_mode': 'glorot_normal'}
Means: 0.7911394357681274, Stdev: 0.007590751519553474 with: {'init_mode': 'glorot_uniform'}
Means: 0.7962528824806213, Stdev: 0.009652582044656802 with: {'init_mode': 'he_normal'}
Means: 0.7925616264343261, Stdev: 0.006058667320114655 with: {'init_mode': 'he_uniform'}


#### Best Score: 79.8% with the zero Network Weight Initialization

### Dropout Rate

In [173]:
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.constraints import MaxNorm

# Function to create model, required for KerasClassifier
def create_model(dropout_rate=0.0, weight_constraint=0):
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=19, kernel_initializer='zero', activation='softplus', kernel_constraint=MaxNorm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    for i in range(3):
        model.add(Dense(30, activation='softplus'))
        model.add(Dropout(dropout_rate))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    optimizer = Adam(lr=0.012)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=10, epochs=90, verbose=0)

param_grid = {'weight_constraint': [1, 2, 3, 4, 5],
              'dropout_rate': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_norm, Y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7709796905517579 using {'dropout_rate': 0.1, 'weight_constraint': 5}
Means: 0.750105345249176, Stdev: 0.015630362670744462 with: {'dropout_rate': 0.0, 'weight_constraint': 1}
Means: 0.7684233665466309, Stdev: 0.0053031211270675605 with: {'dropout_rate': 0.0, 'weight_constraint': 2}
Means: 0.7678556919097901, Stdev: 0.00458901024755599 with: {'dropout_rate': 0.0, 'weight_constraint': 3}
Means: 0.7684236764907837, Stdev: 0.0054214621678142835 with: {'dropout_rate': 0.0, 'weight_constraint': 4}
Means: 0.7618924021720886, Stdev: 0.013481438200899887 with: {'dropout_rate': 0.0, 'weight_constraint': 5}
Means: 0.7461310148239135, Stdev: 0.015387894859408604 with: {'dropout_rate': 0.1, 'weight_constraint': 1}
Means: 0.7601870536804199, Stdev: 0.004924949478876485 with: {'dropout_rate': 0.1, 'weight_constraint': 2}
Means: 0.7671448588371277, Stdev: 0.005949138590095312 with: {'dropout_rate': 0.1, 'weight_constraint': 3}
Means: 0.7664349317550659, Stdev: 0.007492221077262266 with: {'drop

#### All of these were terrible! I'm not using them.

## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset using hyperas or hyperopt (if you're brave)
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?