# ASSIGNMENT-4

Given a bank customer, build a neural network-based classifier that can determine whether they will leave or not in the next 6 months.
Dataset Description: The case study is from an open-source dataset from Kaggle. The dataset contains 10,000 sample points with 14 distinct features such as CustomerId, CreditScore, Geography, Gender, Age, Tenure, Balance, etc.
<br>Dataset: https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling
<br>Perform following steps:
1. Read the dataset.
2. Distinguish the feature and target set and divide the data set into training and test sets.
3. Normalize the train and test data.
4. Initialize and build the model. Identify the points of improvement and implement the same.
5. Print the accuracy score and confusion matrix (5 points).

In [66]:
import numpy as np
import pandas as pd

In [67]:
df = pd.read_csv('churn_Modelling.csv', index_col='RowNumber')
df.head()

Unnamed: 0_level_0,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
RowNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [68]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10000 entries, 1 to 10000
Data columns (total 13 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   CustomerId       10000 non-null  int64  
 1   Surname          10000 non-null  object 
 2   CreditScore      10000 non-null  int64  
 3   Geography        10000 non-null  object 
 4   Gender           10000 non-null  object 
 5   Age              10000 non-null  int64  
 6   Tenure           10000 non-null  int64  
 7   Balance          10000 non-null  float64
 8   NumOfProducts    10000 non-null  int64  
 9   HasCrCard        10000 non-null  int64  
 10  IsActiveMember   10000 non-null  int64  
 11  EstimatedSalary  10000 non-null  float64
 12  Exited           10000 non-null  int64  
dtypes: float64(2), int64(8), object(3)
memory usage: 1.1+ MB


In [69]:
# Check for null values
df.isnull().values.any()

False

In [70]:
df.describe()

Unnamed: 0,CustomerId,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,15690940.0,650.5288,38.9218,5.0128,76485.889288,1.5302,0.7055,0.5151,100090.239881,0.2037
std,71936.19,96.653299,10.487806,2.892174,62397.405202,0.581654,0.45584,0.499797,57510.492818,0.402769
min,15565700.0,350.0,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,15628530.0,584.0,32.0,3.0,0.0,1.0,0.0,0.0,51002.11,0.0
50%,15690740.0,652.0,37.0,5.0,97198.54,1.0,1.0,1.0,100193.915,0.0
75%,15753230.0,718.0,44.0,7.0,127644.24,2.0,1.0,1.0,149388.2475,0.0
max,15815690.0,850.0,92.0,10.0,250898.09,4.0,1.0,1.0,199992.48,1.0


In [71]:
x_columns = df.columns.tolist()[2:12]
y_columns = df.columns.tolist()[-1:]

In [72]:
print(f'All columns: {df.columns.tolist()}')

All columns: ['CustomerId', 'Surname', 'CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary', 'Exited']


In [73]:
print(f'X values: {x_columns}')
print(f'y values: {y_columns}')

X values: ['CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary']
y values: ['Exited']


In [74]:
x = df[x_columns].values # Credit Score through Estimated Salary
y = df[y_columns].values # Exited

## PREPROCESSING

### LABEL ENCODING

In [75]:
from sklearn.preprocessing import LabelEncoder

In [76]:
print(x[:8,1], '... will now become: ')

label_x_country_encoder = LabelEncoder()
x[:,1] = label_x_country_encoder.fit_transform(x[:,1])
print(x[:8,1])

['France' 'Spain' 'France' 'France' 'Spain' 'Spain' 'France' 'Germany'] ... will now become: 
[0 2 0 0 2 2 0 1]


In [77]:
print(x[:6,2], '... will now become: ')

label_x_gender_encoder = LabelEncoder()
x[:,2] = label_x_gender_encoder.fit_transform(x[:,2])
print(x[:6,2])

['Female' 'Female' 'Female' 'Female' 'Female' 'Male'] ... will now become: 
[0 0 0 0 0 1]


### ONE-HOT ENCODING
The Problem here is that we are treating the countries as one variable with ordinal values (0 < 1 <  2). Therefore, one way to get rid of that problem is to split the countries into respective dimensions. that is,
<br>
| Country |  -> | Country|-> |Spain|France|Germany|\
|------|      |------|  |------|    |------|    |------|\
|   Spain |   -> |0| -> |1|0|0|\
|   France | -> |1| -> |0|1|0|\
|   Germany | -> |2| -> |0|0|1|

You can now see that the first three columns represent the three countries that constituted the "country" category. We can now observe that  we essentially only need two columns: a 0 on two countries means that the country has to be the one variable which wasn't included. This will save us from the problem of using too many dimensions.

|Spain|France|Germany|-> |France|Germany|\
 |------|    |------|    |------|     |------|     |------|\
 |1|0|0|-> |0|0|\
|0|1|0|-> |1|0|\
|0|1|0|-> |1|0|\
|0|0|1|-> |0|1|

We have achieved this using the `drop='first'` option in the OneHotEncoder\

### FEATURE SCALING
Feature scaling is a method used to standardize the range of independent variables or features of data. It is basically scaling all the dimensions to be even so that one independent variable does not dominate another. For example, bank account balance ranges from millions to 0, whereas gender is either 0 or 1. If one of the features has a broad range of values, the distance will be governed by this particular feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance.

In [78]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline


pipeline = Pipeline(
    [('Categorizer', ColumnTransformer(
         [ # Gender
          ("Gender Label encoder", OneHotEncoder(categories='auto', drop='first'), [2]),
           # Geography
          ("Geography One Hot", OneHotEncoder(categories='auto', drop='first'), [1])
         ], remainder='passthrough', n_jobs=1)),
     # Standard Scaler for the classifier
    ('Normalizer', StandardScaler())
    ])

In [79]:
x = pipeline.fit_transform(x)

## Making the NN

In [80]:
from keras.models import Sequential
from keras.layers import Dense, Dropout

In [81]:
# Initializing the ANN
classifier = Sequential()

In [82]:
# Splitting the dataset into the Training and Testing set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2, random_state = 0)

In [83]:
print(f'training shapes: {x_train.shape}, {y_train.shape}')
print(f'testing shapes: {x_test.shape}, {y_test.shape}')

training shapes: (8000, 11), (8000, 1)
testing shapes: (2000, 11), (2000, 1)


### ADDING INPUT LAYER

#### The breakdown of the inputs for the first layer is as follows:

`units`: `6` nodes (number of nodes in hidden layer). Can think of this as number of nodes are in the next layer.

`activiation`: `relu` becasue we are in an input layer. uses the ReLu activation function for the layer. This is equivalent to $max(0, W \times x^T + b)$

`input_dim`: `11` because we span 11 dimensions in our input layer. This is needed for the first added layer. The subsequent layers's input dimensions can be inferred using the previously added layer's output dimension. The next hidden layer will know what to expect.

In [84]:
# This adds the input layer (by specifying input dimension) AND the first hidden layer (units)
classifier.add(Dense(6, activation = 'relu', input_shape = (x_train.shape[1], )))
classifier.add(Dropout(rate=0.1)) 

### ADDING 2ND HIDDEN LAYER
We will make our second hidden layer also have 6 nodes, just playing with the same arithmetic we used to determine the dimensions of the first hidden layer (average of your input and output layers) $(11+1)\div 2 = 6 $.

In [85]:
# Adding the second hidden layer
# Notice that we do not need to specify input dim. 
classifier.add(Dense(6, activation = 'relu')) 
classifier.add(Dropout(rate=0.1))

### Adding the output layer

#### The breakdown of the inputs for the output layer is as follows:

*activiation*: **sigmoid** becasue we are in an output layer. uses the Sigmoid activation function for $\phi$. This is used instead of the ReLu function becasue it generates probabilities for the outcome. We want the probability that each customer leaves the bank.  

`units`: `6` nodes (number of nodes in hidden layer). Can think of this as number of nodes are in the next layer.

`input_dim`: `11` because we span 11 dimensions in our input layer. This is needed for the first added layer. The subsequent layers's input dimensions can be inferred using the previously added layer's output dimension. The next hidden layer will know what to expect.


In [86]:
# Adding the output layer
# Notice that we do not need to specify input dim. 
# we have an output of 1 node, which is the the desired dimensions of our output (stay with the bank or not)
# We use the sigmoid because we want probability outcomes
classifier.add(Dense(1, activation = 'sigmoid')) 

In [87]:
classifier.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_6 (Dense)              (None, 6)                 72        
_________________________________________________________________
dropout_4 (Dropout)          (None, 6)                 0         
_________________________________________________________________
dense_7 (Dense)              (None, 6)                 42        
_________________________________________________________________
dropout_5 (Dropout)          (None, 6)                 0         
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 7         
Total params: 121
Trainable params: 121
Non-trainable params: 0
_________________________________________________________________


## Compiling the Neural Network

#### The breakdown of the inputs for compiling is as follows:

`optimizer`: `adam` The algorithm we want to use to find the optimal set of weights in the neural networks.  Adam is a very efficeint variation of Stochastic Gradient Descent.

`loss`: `binary_crossentropy` This is the loss function used within adam. This should be the logarthmic loss. If our dependent (output variable) is `Binary`, it is `binary_crossentropy`. If `Categorical`, then it is called `categorical_crossentropy`

`metrics`: `[accuracy]` The accuracy metrics which will be evaluated(minimized) by the model. Used as accuracy criteria to imporve model performance. 

In [88]:
classifier.compile(optimizer='adam', loss = 'binary_crossentropy', metrics=['accuracy'])

In [89]:
history = classifier.fit(x_train, y_train, batch_size=32, epochs=200, validation_split=0.1, verbose=2)

Epoch 1/200
225/225 - 0s - loss: 0.5073 - accuracy: 0.7961 - val_loss: 0.4655 - val_accuracy: 0.7950
Epoch 2/200
225/225 - 0s - loss: 0.4790 - accuracy: 0.7961 - val_loss: 0.4458 - val_accuracy: 0.7950
Epoch 3/200
225/225 - 0s - loss: 0.4625 - accuracy: 0.7961 - val_loss: 0.4355 - val_accuracy: 0.7950
Epoch 4/200
225/225 - 0s - loss: 0.4546 - accuracy: 0.7961 - val_loss: 0.4296 - val_accuracy: 0.7950
Epoch 5/200
225/225 - 0s - loss: 0.4544 - accuracy: 0.7961 - val_loss: 0.4264 - val_accuracy: 0.7950
Epoch 6/200
225/225 - 0s - loss: 0.4474 - accuracy: 0.7961 - val_loss: 0.4236 - val_accuracy: 0.7950
Epoch 7/200
225/225 - 0s - loss: 0.4432 - accuracy: 0.7961 - val_loss: 0.4217 - val_accuracy: 0.7950
Epoch 8/200
225/225 - 0s - loss: 0.4417 - accuracy: 0.7961 - val_loss: 0.4201 - val_accuracy: 0.7950
Epoch 9/200
225/225 - 0s - loss: 0.4376 - accuracy: 0.7961 - val_loss: 0.4179 - val_accuracy: 0.7950
Epoch 10/200
225/225 - 0s - loss: 0.4418 - accuracy: 0.7961 - val_loss: 0.4188 - val_accura

## Fitting the Neural Network
This is where we will be fitting the NN to our training set.

#### The breakdown of the inputs for compiling is as follows:

`X_train` The independent variable portion of the data which needs to be fitted with the model.

`Y_train` The output portion of the data which the model needs to produce after fitting.

`batch_size`:  How often we want to back-propogate the error values so that individual node weights can be adjusted. 

`epochs`: The number of times we want to run the entire test data over again to tune the weights. This is like the fuel of the algorithm. 


`validation_split`: `0.2` The fraction of data to use for validation data. 


In [90]:
history = classifier.fit(x_train, y_train, batch_size=32, epochs=200, validation_split=0.1, verbose=2)

Epoch 1/200
225/225 - 0s - loss: 0.3833 - accuracy: 0.8351 - val_loss: 0.3325 - val_accuracy: 0.8575
Epoch 2/200
225/225 - 0s - loss: 0.3859 - accuracy: 0.8357 - val_loss: 0.3342 - val_accuracy: 0.8587
Epoch 3/200
225/225 - 0s - loss: 0.3817 - accuracy: 0.8340 - val_loss: 0.3334 - val_accuracy: 0.8562
Epoch 4/200
225/225 - 0s - loss: 0.3885 - accuracy: 0.8332 - val_loss: 0.3344 - val_accuracy: 0.8562
Epoch 5/200
225/225 - 0s - loss: 0.3867 - accuracy: 0.8319 - val_loss: 0.3362 - val_accuracy: 0.8537
Epoch 6/200
225/225 - 0s - loss: 0.3897 - accuracy: 0.8317 - val_loss: 0.3363 - val_accuracy: 0.8537
Epoch 7/200
225/225 - 0s - loss: 0.3866 - accuracy: 0.8351 - val_loss: 0.3390 - val_accuracy: 0.8575
Epoch 8/200
225/225 - 0s - loss: 0.3875 - accuracy: 0.8336 - val_loss: 0.3363 - val_accuracy: 0.8550
Epoch 9/200
225/225 - 0s - loss: 0.3880 - accuracy: 0.8332 - val_loss: 0.3363 - val_accuracy: 0.8550
Epoch 10/200
225/225 - 0s - loss: 0.3896 - accuracy: 0.8351 - val_loss: 0.3354 - val_accura

In [91]:
# import seaborn as sns
# import matplotlib.pyplot as plt
# sns.set()

In [92]:
# plt.plot(np.array(history.history['acc']) * 100)
# plt.plot(np.array(history.history['val_acc']) * 100)
# plt.ylabel('accuracy')
# plt.xlabel('epochs')
# plt.legend(['train', 'validation'])
# plt.title('Accuracy over epochs')
# plt.show()

## TESTING THE NN

In [93]:
y_pred = classifier.predict(x_test)
print(y_pred[:5])

[[0.3217796 ]
 [0.22783008]
 [0.17455697]
 [0.07636455]
 [0.11289239]]


Thess are the probabilities of a customer leaving given the testing data.

<BR>To use the confusion Matrix, we need to convert the probabilities that a customer will leave the bank into the form true or false. So we will use the cutoff value 0.5 to indicate whether they are likely to exit or not.

In [94]:
y_pred = (y_pred > 0.5).astype(int)
print(y_pred[:5])

[[0]
 [0]
 [0]
 [0]
 [0]]


## REPORTS

In [95]:
from sklearn.metrics import classification_report, confusion_matrix

In [96]:
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix")
print(cm)

Confusion Matrix
[[1560   35]
 [ 250  155]]


In [97]:
cr = classification_report(y_test, y_pred)
print("Classification Report")
print(cr)

Classification Report
              precision    recall  f1-score   support

           0       0.86      0.98      0.92      1595
           1       0.82      0.38      0.52       405

    accuracy                           0.86      2000
   macro avg       0.84      0.68      0.72      2000
weighted avg       0.85      0.86      0.84      2000



In [98]:
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()

accuracy  =(tp+tn)/(tp+tn+fp+fn)
precision =(tp)/(tp+fp)
recall  =(tp)/(tp+fn)
f1_score =2*(( precision * recall)/( precision + recall))

In [99]:
print( 
    'Accuracy:\t',accuracy*100,
    '\nPrecision:\t',precision*100,
    '\nRecall: \t',recall*100,
    '\nF1-Score:\t',f1_score*100)

Accuracy:	 85.75 
Precision:	 81.57894736842105 
Recall: 	 38.2716049382716 
F1-Score:	 52.100840336134446
