# Bank Customer Churn Using Frameworks

## Introduction

The dataset is for an international bank with millions of clients, mostly in Spain, France, and Germany, but also throughout all of Europe. The bank decided to take action after noticing that customer attrition rates had started to rise relative to the typical rate over the previous six months. The bank made the decision to select a random sample of 10,000 of its clients in order to gather some data.

They observed the behavior of these 10,000 customers over a period of six months and examined who remained in the bank and who left. They therefore want us to create a model that can calculate the likelihood that a customer will leave the bank.

The objective of this work is to develop a geodemographic segmentation model that will inform the bank of which clients are most likely to depart.



## Data Preprocessing

I will start by importing the necessary libraries.

In [17]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.colors import n_colors
from plotly.subplots import make_subplots
import copy
import scipy
import random
import math
!pip install scikeras
from scikeras.wrappers import KerasClassifier
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
# Feature Scaling (very important)
from sklearn.preprocessing import StandardScaler
#Import Keras library and packages
import keras
import sys
from keras.models import Sequential #to initialize NN
from keras.layers import Dense #used to create layers in NN
from keras import regularizers

from sklearn.metrics import (accuracy_score, f1_score,average_precision_score, confusion_matrix,
                             average_precision_score, precision_score, recall_score, roc_auc_score, )
from mlxtend.plotting import plot_confusion_matrix

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [18]:
df = pd.read_csv('Bank_Customer_Churn_Prediction.csv')

In [19]:
df.head()

Unnamed: 0,customer_id,credit_score,country,gender,age,tenure,balance,products_number,credit_card,active_member,estimated_salary,churn
0,15634602,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,15647311,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,15619304,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,15701354,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,15737888,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customer_id       10000 non-null  int64  
 1   credit_score      10000 non-null  int64  
 2   country           10000 non-null  object 
 3   gender            10000 non-null  object 
 4   age               10000 non-null  int64  
 5   tenure            10000 non-null  int64  
 6   balance           10000 non-null  float64
 7   products_number   10000 non-null  int64  
 8   credit_card       10000 non-null  int64  
 9   active_member     10000 non-null  int64  
 10  estimated_salary  10000 non-null  float64
 11  churn             10000 non-null  int64  
dtypes: float64(2), int64(8), object(2)
memory usage: 937.6+ KB


In [21]:
df.describe()

Unnamed: 0,customer_id,credit_score,age,tenure,balance,products_number,credit_card,active_member,estimated_salary,churn
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,15690940.0,650.5288,38.9218,5.0128,76485.889288,1.5302,0.7055,0.5151,100090.239881,0.2037
std,71936.19,96.653299,10.487806,2.892174,62397.405202,0.581654,0.45584,0.499797,57510.492818,0.402769
min,15565700.0,350.0,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,15628530.0,584.0,32.0,3.0,0.0,1.0,0.0,0.0,51002.11,0.0
50%,15690740.0,652.0,37.0,5.0,97198.54,1.0,1.0,1.0,100193.915,0.0
75%,15753230.0,718.0,44.0,7.0,127644.24,2.0,1.0,1.0,149388.2475,0.0
max,15815690.0,850.0,92.0,10.0,250898.09,4.0,1.0,1.0,199992.48,1.0


In [22]:
df.columns

Index(['customer_id', 'credit_score', 'country', 'gender', 'age', 'tenure',
       'balance', 'products_number', 'credit_card', 'active_member',
       'estimated_salary', 'churn'],
      dtype='object')

### Handeling Categorical Data

In [23]:
df = pd.get_dummies(df, columns = ['country', 'gender'])

In [24]:
df.columns

Index(['customer_id', 'credit_score', 'age', 'tenure', 'balance',
       'products_number', 'credit_card', 'active_member', 'estimated_salary',
       'churn', 'country_France', 'country_Germany', 'country_Spain',
       'gender_Female', 'gender_Male'],
      dtype='object')

### Separate the Data
Lets seperate the predictors X form the target variable y

In [25]:
X = df[['credit_score', 'age', 'tenure', 'balance',
       'products_number', 'credit_card', 'active_member', 'estimated_salary','country_France', 'country_Germany', 'country_Spain',
       'gender_Female', 'gender_Male']].values
y = df[['churn']].values

In [26]:
print(X[:10,:], '\n')
print(y[:10])

[[6.1900000e+02 4.2000000e+01 2.0000000e+00 0.0000000e+00 1.0000000e+00
  1.0000000e+00 1.0000000e+00 1.0134888e+05 1.0000000e+00 0.0000000e+00
  0.0000000e+00 1.0000000e+00 0.0000000e+00]
 [6.0800000e+02 4.1000000e+01 1.0000000e+00 8.3807860e+04 1.0000000e+00
  0.0000000e+00 1.0000000e+00 1.1254258e+05 0.0000000e+00 0.0000000e+00
  1.0000000e+00 1.0000000e+00 0.0000000e+00]
 [5.0200000e+02 4.2000000e+01 8.0000000e+00 1.5966080e+05 3.0000000e+00
  1.0000000e+00 0.0000000e+00 1.1393157e+05 1.0000000e+00 0.0000000e+00
  0.0000000e+00 1.0000000e+00 0.0000000e+00]
 [6.9900000e+02 3.9000000e+01 1.0000000e+00 0.0000000e+00 2.0000000e+00
  0.0000000e+00 0.0000000e+00 9.3826630e+04 1.0000000e+00 0.0000000e+00
  0.0000000e+00 1.0000000e+00 0.0000000e+00]
 [8.5000000e+02 4.3000000e+01 2.0000000e+00 1.2551082e+05 1.0000000e+00
  1.0000000e+00 1.0000000e+00 7.9084100e+04 0.0000000e+00 0.0000000e+00
  1.0000000e+00 1.0000000e+00 0.0000000e+00]
 [6.4500000e+02 4.4000000e+01 8.0000000e+00 1.1375578e+

### Splitting the data
In order to train our model and later test its accuracy, we need to split the data into two datasets. In ANN feature scaling is very important so that all inputs are at a comparable range and only the weights assigned to them are, in fact, the only factor which makes a difference on the predicted value.

In [27]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Feature Scaling (very important)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## Creating Deep Neuarl Network

### Initialize the weights
I will try "He Initialization"; this is named for the first author of He et al., 2015. (If you have heard of "Xavier initialization", this is similar except Xavier initialization uses a scaling factor for the weights $W^{[l]}$ of `sqrt(1./layers_dims[l-1])` where He initialization would use `sqrt(2./layers_dims[l-1])`.)

**Initialize_parameters_he**

Implement the following function to initialize your parameters with He initialization. This function is similar to the previous `initialize_parameters_random(...)`. The only difference is that instead of multiplying `np.random.randn(..,..)` by 10, you will multiply it by $\sqrt{\frac{2}{\text{dimension of the previous layer}}}$, which is what He initialization recommends for layers with a ReLU activation.

**Note:**
I have a notebbok where I am comparing different initialization methods: Link

### Regullarization Technique: L2
To use it in tensorflow, we need to set the kernel_regularizer='l2'

### Batch Normalization
Related to what we learnt in class, Batch Norm (BN) is one of the most exciting innovations in optimizing deep Neural Networks. It’s not an algorithm, it is a method of adaptive re-parametrization motivated by the difficulty of training very deep models. It reduces the problem of internal covariate shift and coordinating updates across many layer.
In my model, I will use the BN in every hidden layer.

To use the BN in tendorflow, I need just to add BatchNormalization() to the concerned layer.

### Features
Distribute features of the first observation, from your dataset, per each node in the input layer. Thus, thirteen independent variables will be added to our input layer.

### Input Layer
**units:** number of nodes in the hidden layer.
**kernel_initializer:** it is the initialization method. As I said I will use He initialization
**activation:** the activation function
**input_dim:** number of nodes in the input layer, that our hidden layer should be expecting

In [28]:
from tensorflow.keras.layers import BatchNormalization
#Initialising the Model - Defining as a sequence of layers or a Graph
classifier = Sequential()
#Input Layer
classifier.add(Dense(units = 6, kernel_initializer = 'he_uniform',kernel_regularizer='l2', activation = 'relu', input_dim = 13 ))
classifier.add(BatchNormalization())

### Hidden Layers
From the input to the output the neurons are activated, and the impact they have in the predicted results is measured by the assigned weights. Depending on the number of hidden layers, the system propagates the activation until getting the predicted result y.

For the hidden layers, I will use **ReLU** as an activation function, **He initialization**, and ,for sure, the **BN**.

In [29]:
#Second Hidden Layer
classifier.add(Dense(units = 6, kernel_initializer = 'he_uniform',kernel_regularizer='l2', activation = 'relu'))
classifier.add(BatchNormalization())

#Third Hidden Layer
classifier.add(Dense(units = 6, kernel_initializer = 'he_uniform',kernel_regularizer='l2', activation = 'relu'))
classifier.add(BatchNormalization())

#The Fourth Hidden Layer
classifier.add(Dense(units = 6, kernel_initializer = 'he_uniform',kernel_regularizer='l2', activation = 'relu'))
classifier.add(BatchNormalization())

### Output Layer


In [30]:
classifier.add(Dense(units = 1, kernel_initializer = 'he_uniform',kernel_regularizer='l2', activation = 'sigmoid'))
#classifier.add(BatchNormalization())

### Optimize the Model
**optimizer:** algorithm to use to find the best weights that will make our model powerful

**loss:** Loss function within our optimizer algorithm

**metric:** critiria to evaluate the model

In this case I will use Momentum

In [31]:
optimizer = keras.optimizers.SGD(learning_rate=0.01, momentum=0.9)
classifier.compile(optimizer = optimizer,loss= "binary_crossentropy",metrics=["accuracy"])

### Fitting the Model to Training Set
**batch_size:** number of observations after which we upadte the weights. In fact, we are using **mini-batch gradient descent**

**epochs:** how many times we train the model 

In [32]:
classifier.fit(X_train, y_train, batch_size = 64, epochs = 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7f7ac8695f90>

### Making Predictions
I've trained the model and now it is time to see its capability on predecting futur churn result with test set 

In [33]:
#Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred_train = classifier.predict(X_train)
#Threshold of 50%
y_pred = (y_pred > 0.5)
y_pred_train = (y_pred_train>0.5)



### Confusion Matrix

In [34]:
cm = confusion_matrix(y_test, y_pred)
cm

array([[1547,   48],
       [ 226,  179]])

In [35]:
accuracy_score(y_test,y_pred)

0.863

In [36]:
accuracy_score(y_train,y_pred_train)

0.858375

## Result Discussion
In the above code, I managed to build a deep neural network with **4 Hidden Layers**. I used the mini-batch gradient descent with momentum and the result was acceptable. Furthermore, I managed to implement Batch Normalization as well as L2 regularization technique. 

It is time to make the code much originized by creating helper functions. Moreover, I will run multiple models with different optimization algorithms. Finally I will tune the hyperparameters and compare the perfremance of each model.

## Create Model with Different Optimization Algorithms
The following function is a helper function that will help us to kind of automate the creation of he model by choosing values of the different parameters

In [89]:
#Function to create model 

def model_create(X, Y, layers_dims, opt, lr = 0.01, mini_batch_size = 64, beta = 0.9,beta1 = 0.9, beta2 = 0.999,  epsilon = 1e-8, num_epochs=100):
  """
  Arguments:
    X -- input data, of shape (number of features, number of examples)
    Y -- true "label" vector (1 for churn / 0 for not churn), of shape (1, number of examples)
    opt -- the optimizer to be passed, momentum, adam, or rmsprop
    layers_dims -- python list, containing the size of each layer
    learning_rate -- the learning rate, scalar.
    mini_batch_size -- the size of a mini batch
    beta -- Momentum hyperparameter
    beta1 -- Exponential decay hyperparameter for the past gradients estimates 
    beta2 -- Exponential decay hyperparameter for the past squared gradients estimates 
    epsilon -- hyperparameter preventing division by zero in Adam updates
    num_epochs -- number of epochs

  Returns:
  model
  """
  
  model = Sequential()
  #Input Layer
  model.add(Dense(units = layers_dims[0], kernel_initializer = 'he_uniform',kernel_regularizer='l2', activation = 'relu', input_dim = 13 ))
  model.add(BatchNormalization())
  L = len(layers_dims)

  #(N-2) Hidden Layer (the first hidden layer and the output layer are not included)
  for i in range(1,L):
    model.add(Dense(units = layers_dims[i], kernel_initializer = 'he_uniform',kernel_regularizer='l2', activation = 'relu'))
    model.add(BatchNormalization())
  
  #Output Layer
  model.add(Dense(units = 1, kernel_initializer = 'he_uniform',kernel_regularizer='l2', activation = 'sigmoid'))
  
  #Optimization algorithms
  if opt == 'momentum':
    optimizer = keras.optimizers.SGD(learning_rate=lr, momentum=beta)
    model.compile(optimizer = optimizer,loss= "binary_crossentropy",metrics=["accuracy"])
  elif opt == 'adam':
    #optimizer = keras.optimizers.Adam(lr,beta1,beta2,epsilon)
    model.compile(optimizer = keras.optimizers.Adam(lr,beta1,beta2,epsilon),loss= "binary_crossentropy",metrics=["accuracy"])
  elif opt == 'rmsprop':
     optimizer = keras.optimizers.RMSprop(learning_rate=lr, rho=beta2)
     model.compile(optimizer = optimizer,loss= "binary_crossentropy",metrics=["accuracy"])

  #Fit the model
  model.fit(X, Y, batch_size = mini_batch_size, epochs = num_epochs)

  return model

## Prediction Function

In [38]:
def predicts(model,train_set, test_set):
  """
  Using the learned parameters, predicts a class for each example in training and testing set

  Arguments:
  model -- it is the model that was created by model_create()
  train_set -- it is the training set that was used to train the model
  test_set -- it is the training set that was used to test the model 

  Returns:
  y_pred -- vector of predictions of our model using testing data set
  y_pred_train -- vector of predictions of our model using training data set
  """
  #Predicting the Test set results
  y_pred = model.predict(X_test)
  y_pred_train = model.predict(X_train)
  #Threshold of 50%
  y_pred = (y_pred > 0.5)
  y_pred_train = (y_pred_train>0.5)
  return y_pred, y_pred_train

## Confusion Matrix
How can we measure the effectiveness of our model. Better the effectiveness, better the performance, and this is exactly what we want. So, here where the role of confusion matrix comes into the limelight.
Confusion Matrix is a performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.

**True Positive:**

    - Interpretation: You predicted positive and it’s true.

    - You predicted that a woman is pregnant and she actually is.

**True Negative:**

    - Interpretation: You predicted negative and it’s true.

    - You predicted that a man is not pregnant and he actually is not.

**False Positive: (Type 1 Error)**

    - Interpretation: You predicted positive and it’s false.

    - You predicted that a man is pregnant but he actually is not.

**False Negative: (Type 2 Error)**

    - Interpretation: You predicted negative and it’s false.

    - You predicted that a woman is not pregnant but she actually is.

In [39]:
#Function: Evaluate
def evaluate(y_pred, y_pred_train, y_test, y_train):
  """
  Arguments:
  y_pred -- vector of predictions of our model using testing data set
  y_pred_train -- vector of predictions of our model using training data set
  y_test -- the true value from testing set
  y_train -- the true value from training set
  """

  cm = confusion_matrix(y_test, y_pred)
  print('The Confusion Matrix of the Model: \n')
  print(cm)
  print('-------------------')
  print('Train Accuracy: ',accuracy_score(y_train,y_pred_train))
  print('Test Accuracy: ',accuracy_score(y_test,y_pred))

## Test Cases
 
It's time to run the model and see how it performs on a Bank Churn dataset. By running the following code I will test the model with multiple hidden layer of $n_h$ hidden units! and using different optimization algorithms.

### Run the Model with Adam

In [90]:
layers_dims = [6,6,6,1]
model  = model_create(X_train, y_train, layers_dims,opt='adam')
model_c = KerasClassifier(build_fn=model, verbose=0)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

  This is separate from the ipykernel package so we can avoid doing imports until


In [91]:
y_pred, y_pred_train = predicts(model,X_train, X_test)



In [92]:
evaluate(y_pred, y_pred_train, y_test, y_train)

The Confusion Matrix of the Model: 

[[1494  101]
 [ 213  192]]
-------------------
Train Accuracy:  0.849875
Test Accuracy:  0.843


### Run the Model with RMSProp

In [135]:
layers_dims = [6,6,6,1]
model_momentum  = model_create(X_train, y_train, layers_dims,opt='momentum')
model_c_momentum = KerasClassifier(build_fn=model, verbose=0)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

  This is separate from the ipykernel package so we can avoid doing imports until


In [44]:
y_pred, y_pred_train = predicts(model_momentum,X_train, X_test)



In [45]:
evaluate(y_pred, y_pred_train, y_test, y_train)

The Confusion Matrix of the Model: 

[[1528   67]
 [ 204  201]]
-------------------
Train Accuracy:  0.86375
Test Accuracy:  0.8645


### Run the Model with RMSProp

In [46]:
layers_dims = [6,6,6,1]
model_rmsprop  = model_create(X_train, y_train, layers_dims,opt='rmsprop')
model_c_rmsprop = KerasClassifier(model=model, verbose=0)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [47]:
y_pred, y_pred_train = predicts(model_rmsprop,X_train, X_test)



In [48]:
evaluate(y_pred, y_pred_train, y_test, y_train)

The Confusion Matrix of the Model: 

[[1595    0]
 [ 405    0]]
-------------------
Train Accuracy:  0.796
Test Accuracy:  0.7975


### Comparing Results: Momentum Vs Adam Vs RMSProp
In general, all the 3 models performed well. In fact, we have low bias and low variance in all the trained models which is something good. In fact, it seems that **momentum** have given the best perfermance so far. Next, I will use Random Grid Search to tune different hyperparameters and then comapre the perroramance of the different models. 

## Tuning Hyperparameters
### Lets Test How Tuning Hyperparameters works

In [49]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import RandomizedSearchCV
from keras.models import Sequential
from keras.layers import Dense

In [50]:
def build_classifier(optimizer):
    classifier = Sequential()
    classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 13))
    classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
    classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
    classifier.compile(optimizer = optimizer, loss = 'binary_crossentropy', metrics = ['accuracy'])
    return classifier
classifier = KerasClassifier(build_fn = build_classifier)

  


In [56]:
# Parameters for batch size, epochs, and optimizer functions
parameters = {'batch_size': [16, 32],
              'optimizer': ['adam', 'rmsprop']}

In [57]:
# Setting up Grid Search
random_grid_search = RandomizedSearchCV(estimator = classifier,
                           param_distributions = parameters,
                           scoring = 'accuracy',
                           cv = 10)
grid_search = random_grid_search.fit(X_train, y_train)
best_parameters = random_grid_search.best_params_
best_accuracy = random_grid_search.best_score_





In [58]:
# Getting our best parameters
best_parameters

{'optimizer': 'adam', 'batch_size': 16}

In [59]:
# Getting our best average
print('Best average: ', ('%.4f' % best_accuracy))

Best average:  0.7960


Try another set pf values for hyperparamaters

In [60]:
# Parameters for batch size, epochs, and optimizer functions
parameters = {'batch_size': [16, 32,64],
              'optimizer': ['adam', 'momentum','rmsprop']}

In [61]:
# Setting up Grid Search
random_grid_search = RandomizedSearchCV(estimator = classifier,
                           param_distributions = parameters,
                           scoring = 'accuracy',
                           cv = 10)
grid_search = random_grid_search.fit(X_train, y_train)
best_parameters = random_grid_search.best_params_
best_accuracy = random_grid_search.best_score_





30 fits failed out of a total of 90.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
30 fits failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 680, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib/python3.7/dist-packages/keras/wrappers/scikit_learn.py", line 236, in fit
    return super(KerasClassifier, self).fit(x, y, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/keras/wrappers/scikit_learn.py", line 155, in fit
    self.model = self.build_fn(**self.filter_sk_params(self.build_fn))
  File "<ipython-input-50-76d0ebaa6485>", line 6, in build_classifier
    classifier.compile(op



In [62]:
# Getting our best parameters
best_parameters

{'optimizer': 'adam', 'batch_size': 16}

In [63]:
# Getting our best average
print('Best average: ', ('%.4f' % best_accuracy))

Best average:  0.7960


Tune Learning Rate and Momentum

In [94]:
def build_classifier(lr, momentum):
    classifier_m = Sequential()
    classifier_m.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 13))
    classifier_m.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
    classifier_m.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
    classifier_m.compile(optimizer = keras.optimizers.SGD(lr,momentum), loss = 'binary_crossentropy', metrics = ['accuracy'])
    return classifier_m
classifier_m = KerasClassifier(build_fn = build_classifier)

learn_rate = [0.001, 0.01, 0.1]
momentum = [0.9, 0.99, 0.999]
parameters = dict(lr=learn_rate,momentum=momentum)

  


In [95]:
# Setting up Grid Search
random_grid_search = RandomizedSearchCV(estimator = classifier_m,
                           param_distributions = parameters,
                           scoring = 'accuracy',
                           cv = 10)
random_grid_search = random_grid_search.fit(X_train, y_train)
best_parameters = random_grid_search.best_params_
best_accuracy = random_grid_search.best_score_





In [96]:
# Getting our best parameters
best_parameters

{'momentum': 0.9, 'lr': 0.001}

In [97]:
# Getting our best average
print('Best average: ', ('%.4f' % best_accuracy))

Best average:  0.7960


### Lets Try Tining Hyperparameters on our Models

#### Tuning Adam Model

In [107]:
#Function to create model with adam 

def build_adam(learning_rate,beta_1, beta_2,  epsilon):
  """
  Arguments:
    learning_rate -- the learning rate, scalar.
    beta1 -- Exponential decay hyperparameter for the past gradients estimates 
    beta2 -- Exponential decay hyperparameter for the past squared gradients estimates 
    epsilon -- hyperparameter preventing division by zero in Adam updates

  Returns:
  model
  """
  
  classifier_adam = Sequential()
  classifier_adam.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu', input_dim = 13))
  classifier_adam.add(BatchNormalization())
  classifier_adam.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu'))
  classifier_adam.add(BatchNormalization())
  classifier_adam.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu'))
  classifier_adam.add(BatchNormalization())
  classifier_adam.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu'))
  classifier_adam.add(BatchNormalization())
  classifier_adam.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu'))
  classifier_adam.add(BatchNormalization())
  classifier_adam.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
  classifier_adam.compile(optimizer = keras.optimizers.Adam(learning_rate=learning_rate,beta_1=beta_1,beta_2=beta_2,epsilon=epsilon), loss = 'binary_crossentropy', metrics = ['accuracy'])
  return classifier_adam

classifier_adam = KerasClassifier(build_fn = build_adam)
  



In [108]:
learning_rate = [0.001, 0.01, 0.1]
beta_1 = [0.9, 0.99, 0.999]
beta_2 = [0.997,0.998,0.999]
epsilon = [1e-8]
parameters = dict(learning_rate=learning_rate,beta_1=beta_1, beta_2=beta_2,epsilon=epsilon)
# Setting up Random Grid Search
random_grid_search = RandomizedSearchCV(estimator = classifier_adam,
                           param_distributions = parameters,
                           scoring = 'accuracy',
                           cv = 10)
random_grid_search = random_grid_search.fit(X_train, y_train)
best_parameters = random_grid_search.best_params_
best_accuracy = random_grid_search.best_score_



In [109]:
# Getting our best parameters for adam
best_parameters

{'learning_rate': 0.01, 'epsilon': 1e-08, 'beta_2': 0.998, 'beta_1': 0.9}

In [110]:
# Getting our best average for adam
print('Best average: ', ('%.4f' % best_accuracy))

Best average:  0.7960


#### Tuning Momentum Model


In [113]:
#Function to create model with adam 

def build_momentum(learning_rate,momentum):
  """
  Arguments:
    learning_rate -- the learning rate, scalar.
    beta1 -- Exponential decay hyperparameter for the past gradients estimates 
    beta2 -- Exponential decay hyperparameter for the past squared gradients estimates 
    epsilon -- hyperparameter preventing division by zero in Adam updates

  Returns:
  model
  """
  
  classifier_momentum = Sequential()
  classifier_momentum.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu', input_dim = 13))
  classifier_momentum.add(BatchNormalization())
  classifier_momentum.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu'))
  classifier_momentum.add(BatchNormalization())
  classifier_momentum.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu'))
  classifier_momentum.add(BatchNormalization())
  classifier_momentum.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu'))
  classifier_momentum.add(BatchNormalization())
  classifier_momentum.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu'))
  classifier_momentum.add(BatchNormalization())
  classifier_momentum.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
  classifier_momentum.compile(optimizer = keras.optimizers.SGD(learning_rate=learning_rate,momentum=momentum), loss = 'binary_crossentropy', metrics = ['accuracy'])
  return classifier_momentum

classifier_momentum = KerasClassifier(build_fn = build_momentum)
  



In [115]:
learning_rate = [0.001, 0.01, 0.1]
momentum = [0.9, 0.99, 0.999]
parameters = dict(learning_rate=learning_rate,momentum=momentum)
# Setting up Random Grid Search
random_grid_search = RandomizedSearchCV(estimator = classifier_momentum,
                           param_distributions = parameters,
                           scoring = 'accuracy',
                           cv = 10)
random_grid_search = random_grid_search.fit(X_train, y_train)
best_parameters = random_grid_search.best_params_
best_accuracy = random_grid_search.best_score_





In [116]:
# Getting our best parameters for momentum
best_parameters

{'momentum': 0.9, 'learning_rate': 0.1}

In [117]:
# Getting our best average for momentum
print('Best average: ', ('%.4f' % best_accuracy))

Best average:  0.7984


#### Tuning RMSprop

In [118]:
#Function to create model with adam 

def build_rmsprop(learning_rate,rho):
  """
  Arguments:
    learning_rate -- the learning rate, scalar.
    beta1 -- Exponential decay hyperparameter for the past gradients estimates 
    beta2 -- Exponential decay hyperparameter for the past squared gradients estimates 
    epsilon -- hyperparameter preventing division by zero in Adam updates

  Returns:
  model
  """
  
  classifier_rmsprop = Sequential()
  classifier_rmsprop.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu', input_dim = 13))
  classifier_rmsprop.add(BatchNormalization())
  classifier_rmsprop.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu'))
  classifier_rmsprop.add(BatchNormalization())
  classifier_rmsprop.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu'))
  classifier_rmsprop.add(BatchNormalization())
  classifier_rmsprop.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu'))
  classifier_rmsprop.add(BatchNormalization())
  classifier_rmsprop.add(Dense(units = 6, kernel_initializer = 'he_uniform', kernel_regularizer='l2',activation = 'relu'))
  classifier_rmsprop.add(BatchNormalization())
  classifier_rmsprop.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
  classifier_rmsprop.compile(optimizer = keras.optimizers.RMSprop(learning_rate=learning_rate,rho=rho), loss = 'binary_crossentropy', metrics = ['accuracy'])
  return classifier_rmsprop

classifier_rmsprop = KerasClassifier(build_fn = build_rmsprop)
  



In [119]:
learning_rate = [0.001, 0.01, 0.1]
rho= [0.9, 0.99, 0.999]
parameters = dict(learning_rate=learning_rate,rho=rho)
# Setting up Random Grid Search
random_grid_search = RandomizedSearchCV(estimator = classifier_rmsprop,
                           param_distributions = parameters,
                           scoring = 'accuracy',
                           cv = 10)
random_grid_search = random_grid_search.fit(X_train, y_train)
best_parameters = random_grid_search.best_params_
best_accuracy = random_grid_search.best_score_





In [120]:
# Getting our best parameters for rmsprop
best_parameters

{'rho': 0.9, 'learning_rate': 0.001}

In [121]:
# Getting our best average for rmsprop
print('Best average: ', ('%.4f' % best_accuracy))

Best average:  0.7987


## Try different Models with the Tuned Hyperparameters

### Adam Optimizer

In [129]:
layers_dims = [6,6,6,1]

model_adam  = model_create(X_train, y_train, layers_dims,opt='adam',lr=0.01,beta1 = 0.9, beta2 = 0.998,  epsilon = 1e-8,)
model_a = KerasClassifier(build_fn=model_adam, verbose=0)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

  after removing the cwd from sys.path.


In [130]:
y_pred, y_pred_train = predicts(model_adam,X_train, X_test)



In [131]:
evaluate(y_pred, y_pred_train, y_test, y_train)

The Confusion Matrix of the Model: 

[[1508   87]
 [ 206  199]]
-------------------
Train Accuracy:  0.855875
Test Accuracy:  0.8535


### Momentum Optimizer

In [125]:
layers_dims = [6,6,6,1]

model_momentum  = model_create(X_train, y_train, layers_dims,opt='momentum',lr=0.01,beta=0.9)
model_m = KerasClassifier(build_fn=model_momentum, verbose=0)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

  after removing the cwd from sys.path.


In [126]:
y_pred, y_pred_train = predicts(model_momentum,X_train, X_test)



In [127]:
evaluate(y_pred, y_pred_train, y_test, y_train)

The Confusion Matrix of the Model: 

[[1509   86]
 [ 206  199]]
-------------------
Train Accuracy:  0.85925
Test Accuracy:  0.854


In [132]:
layers_dims = [6,6,6,1]

model_rmsprop  = model_create(X_train, y_train, layers_dims,opt='rmsprop',lr=0.001,beta2=0.9)
model_r = KerasClassifier(build_fn=model_rmsprop, verbose=0)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

  after removing the cwd from sys.path.


In [133]:
y_pred, y_pred_train = predicts(model_momentum,X_train, X_test)



In [134]:
evaluate(y_pred, y_pred_train, y_test, y_train)

The Confusion Matrix of the Model: 

[[1509   86]
 [ 206  199]]
-------------------
Train Accuracy:  0.85925
Test Accuracy:  0.854


## Conclusion
Our original models scored almost 84% (all the three optimizer). In fact, with tuning hyperparameters, the models score was increased almost by 1%, which gives us up t 85%.

To examine if there could be greater accuracy rates, it is possible to make improvements by raising the batch size and the epoch to 1000. However, A GPU would be required for this application because it will need much computation that I cannot afford it since I am using just i7-5600U and also google Colabs
