### Problem:
Given a Bank customer, can we build a classifier which can determine whether they will leave or
not using Neural networks?

Link to the Kaggle project site:
https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling

Case file: bank.csv

The points distribution for this case is as follows:
1. Read the dataset in a new python notebook.
2. Drop the columns which are unique for all users like IDs (2.5 points)
3. Distinguish the feature and target set (2.5 points)
4. Divide the data set into Train and test sets
5. Normalize the train and test data (2.5 points)
6. Initialize &amp; build the model (10 points)
7. Optimize the model (5 points)
9. Predict the results using 0.5 as a threshold (5 points)
10. Print the Accuracy score and confusion matrix (2.5 points)

### Import libraries

In [1]:
import tensorflow as tf

In [2]:
tf.__version__

'2.0.0'

In [3]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix, accuracy_score

### 1. Read the dataset in a new python notebook.

In [4]:
data_raw = pd.read_csv('Data/bank.csv')

In [5]:
data_raw.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


### 2. Drop the columns which are unique for all users like IDs (2.5 points)

In [6]:
data_raw.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1, inplace=True)

In [7]:
data_raw.isna().any()

CreditScore        False
Geography          False
Gender             False
Age                False
Tenure             False
Balance            False
NumOfProducts      False
HasCrCard          False
IsActiveMember     False
EstimatedSalary    False
Exited             False
dtype: bool

In [8]:
data_raw.isnull().any()

CreditScore        False
Geography          False
Gender             False
Age                False
Tenure             False
Balance            False
NumOfProducts      False
HasCrCard          False
IsActiveMember     False
EstimatedSalary    False
Exited             False
dtype: bool

In [9]:
data_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 11 columns):
CreditScore        10000 non-null int64
Geography          10000 non-null object
Gender             10000 non-null object
Age                10000 non-null int64
Tenure             10000 non-null int64
Balance            10000 non-null float64
NumOfProducts      10000 non-null int64
HasCrCard          10000 non-null int64
IsActiveMember     10000 non-null int64
EstimatedSalary    10000 non-null float64
Exited             10000 non-null int64
dtypes: float64(2), int64(7), object(2)
memory usage: 859.5+ KB


### 3. Distinguish the feature and target set (2.5 points)

In [10]:
data = pd.get_dummies(data_raw)

In [11]:
data.head()

Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Geography_France,Geography_Germany,Geography_Spain,Gender_Female,Gender_Male
0,619,42,2,0.0,1,1,1,101348.88,1,1,0,0,1,0
1,608,41,1,83807.86,1,0,1,112542.58,0,0,0,1,1,0
2,502,42,8,159660.8,3,1,0,113931.57,1,1,0,0,1,0
3,699,39,1,0.0,2,0,0,93826.63,0,1,0,0,1,0
4,850,43,2,125510.82,1,1,1,79084.1,0,0,0,1,1,0


In [12]:
target = data['Exited']

In [13]:
y = np.array(target.values.astype('float32'))
X = data.drop('Exited', axis=1)

### 4. Divide the data set into Train and test sets

In [14]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### 5. Normalize the train and test data (2.5 points)

In [15]:
sc = StandardScaler()

In [16]:
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

In [17]:
X_train.shape, y_test.shape, X_test.shape, y_test.shape

((7000, 13), (3000,), (3000, 13), (3000,))

The train and test data is already normalized before creating the tensor dataset

### 6. Initialize & build the model (10 points)

In [18]:
def create_model():
    # Clear the session before building the model
    tf.keras.backend.clear_session()
    
    model = tf.keras.Sequential([
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(30, input_dim=13, activation='relu'),
        tf.keras.layers.Dense(40, input_dim=30, activation='relu'),
        #tf.keras.layers.Dropout(0.25),
        #tf.keras.layers.Dense(60, input_dim=40, activation='relu'),
        tf.keras.layers.Dense(1, input_dim=40, activation='sigmoid')
    ])
    
    # Compile model
    #adam = tf.keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, amsgrad=True)
    #sgd = tf.keras.optimizers.SGD(learning_rate=0.005, momentum=0.5, nesterov=True)
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

### 7. Optimize the model (5 points)

### Various techniques have been tried to improve the accuracy of the NN model:
- Adding more hidden layers
- Adding more neurons
- Adding Dropout layer
- Changing Optimizer adam, sgd
- Changing parameters: learning rate, momentum, netrove
- Changing metrics: accuracy, rmse, mae
- Changing batchsize
- Changing epoch size
- Also using KerasClassifier and Stratified KFold

### First checking if KFold can give higher accuracy; If not, procedding with the simple approach

In [19]:
estimator = KerasClassifier(build_fn=create_model, epochs=50, batch_size=32, verbose=False)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X.values, y, cv=kfold)
print("Mean Accuracy: %.2f%%" % (results.mean()*100))
print("Mean Standard Deviation: %.2f%%" % (results.std()*100))

Mean Accuracy: 85.89%
Mean Standard Deviation: 0.50%


In [20]:
#Fit the  model with the given dataset
model = create_model()
model.fit(X_train, y_train, validation_data = (X_test, y_test), epochs=50, batch_size=32)

Train on 7000 samples, validate on 3000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<tensorflow.python.keras.callbacks.History at 0x2abcabf5a88>

### 8. Predict the results using 0.5 as a threshold (5 points)

In [21]:
y_pred = model.predict(X_test)

In [22]:
y_pred_df = pd.DataFrame(data=y_pred, columns=['Prob'])

In [23]:
y_pred_df.loc[y_pred_df.Prob >= 0.5, 'Prob'] = 1
y_pred_df.loc[y_pred_df.Prob < 0.5, 'Prob'] = 0

### 9. Print the Accuracy score and confusion matrix (2.5 points)

In [24]:
print('Accuracy Score of the model: %.2f%%' % (accuracy_score(y_test, y_pred_df.values)*100))

Accuracy Score of the model: 86.50%


In [25]:
confusion_matrix(y_test, y_pred_df.values)

array([[2300,  116],
       [ 289,  295]], dtype=int64)

### By employing various techniques of NN model improvement, the best accuracy I got is between 86 and 87%. I couldn't find a better model than this one.
### The model fine tuning is an exploratory exercise and we may find better parameters by experimenting more.