## Project - Bank Churn prediction

##### Objective:
Built a Classification model to predict if the customer will leave the bank in some time.

##### Context:
 All service providers have to worry about problem of 'Churn' i.e. customers leaving and joining another service provider. It is advantageous for organizations like banks to know what leads a client towards the decision to leave the company.Management can concentrate efforts on improvement of service, keeping in mind these priorities.

##### Data Description:
The case study is from an open-source dataset from Kaggle.The dataset contains 10,000 sample points with 14 distinct features such as CustomerId, CreditScore, Geography, Gender, Age, Tenure, Balance etc. Link to the Kaggle project site:https://www.kaggle.com/mathchi/churn-for-bank-customers


## Performing  standard imports

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.metrics import confusion_matrix
%matplotlib inline

### Loading the dataset

In [2]:
df = pd.read_csv("bank.csv")
df.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


###  Data Pre-processing

In [3]:
#RowNumber #CustomerId and #Surname are not required hence dropping it
df = df.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)
df.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [4]:
X = df.iloc[:,0:10].values # Credit Score through Estimated Salary - Independent Variables
Y = df.iloc[:,10].values # Exited-Dependent column
print(X.shape)
print(Y.shape)

(10000, 10)
(10000,)


In [5]:
#Handling categorical columns -Gender and Country

label_X_gender_encoder = LabelEncoder()
X[:,2] = label_X_gender_encoder.fit_transform(X[:,2])

countryhotencoder = ColumnTransformer([("countries", OneHotEncoder(), [1])], remainder="passthrough")
X = countryhotencoder.fit_transform(X)

#Dropping one of the dummy columns
X = X[:,1:] 
print(X.shape)

(10000, 11)


In [6]:
#Splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size = 0.2, random_state = 27)

In [7]:
#Standardizing the inputs
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

###  Creating the batches for both training and test dataset 

In [8]:
#Initializing the hyper-parameters

input_features=11 #No of input neurons
hidden1=6 #No of neurons in 1st hidden layer
hidden2 =6 #No of neurons in 2nd hiddeln layer
out_features=1 #No of Ouput neuron
learning_rate=.001
batch_size=32
epochs=20

In [11]:

train_data = TensorDataset(torch.FloatTensor(X_train.astype(float)),torch.FloatTensor(y_train.astype(float)))
test_data = TensorDataset(torch.FloatTensor(X_test.astype(float)),torch.FloatTensor(y_test.astype(float)))

train_loader = DataLoader(dataset=train_data,batch_size=batch_size, shuffle=False, drop_last=True)      
test_loader = DataLoader(dataset=test_data, batch_size=batch_size, shuffle=False, drop_last=True)  
print("NO. of batches in Training datset is : ",len(train_loader))
print("NO. of batches in Test datset is : ",len(test_loader))

NO. of batches in Training datset is :  266
NO. of batches in Test datset is :  66


### Creating the basic ANN architecture


In [12]:
#Creating the architecture of  ANN model
#CReating the ANN model
class ANN(nn.Module):
    def __init__(self,input_features,hidden1,hidden2,out_features):
        super().__init__()
        self.f_connected1=nn.Linear(input_features,hidden1)
        self.f_connected2=nn.Linear(hidden1,hidden2)
        self.out=nn.Linear(hidden2,out_features)
    def forward(self,x):
        x=F.relu(self.f_connected1(x))
        x=F.relu(self.f_connected2(x))
        x=F.sigmoid(self.out(x))
        return x

In [13]:
####instantiate the  ANN class
torch.manual_seed(20)
ANNModel=ANN(input_features,hidden1,hidden2,out_features)
ANNModel.parameters

<bound method Module.parameters of ANN(
  (f_connected1): Linear(in_features=11, out_features=6, bias=True)
  (f_connected2): Linear(in_features=6, out_features=6, bias=True)
  (out): Linear(in_features=6, out_features=1, bias=True)
)>

## Defining loss function & optimizer


In [14]:
###Backward Propogation-- Define the loss_function,define the optimizer
loss_function=nn.BCEWithLogitsLoss()
optimizer=torch.optim.Adam(ANNModel.parameters(),lr=learning_rate)

### Defining train and test functions

In [15]:
def train(model, train_loader, optimizer, epoch):
    model.train(); # It is specially required when Dropout and Batch Normalization is implemented.
    total_loss = 0
    
    # Iterate through dataset
    for data, target in train_loader:
        target = target.unsqueeze(1)
        
        # Zero grad
        optimizer.zero_grad()
        output = model(data)
        loss = loss_function(output, target)

        # Backward pass
        loss.backward()
        total_loss += loss.item()
        
        # Update
        optimizer.step()

    # Print average loss
    print("Train Epoch: {}\t Loss: {:.6f}".format(epoch, total_loss / len(train_loader.dataset)))

In [16]:
def test(model, test_loader):
    model.eval() #It is specially required when Dropout and Batch Normalization is implemented.
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            target = target.unsqueeze(1)
            
            output = model(data)
            test_loss += loss_function(output, target).item()
            pred = output>=0.5
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.3f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

In [17]:
# # Training and Testing the ANN Model

for epoch in range(0,epochs):
    train(ANNModel, train_loader, optimizer, epoch)
    test(ANNModel, test_loader)




Train Epoch: 0	 Loss: 0.026636

Test set: Average loss: 0.0240, Accuracy: 1574/2000 (78.700%)

Train Epoch: 1	 Loss: 0.023448

Test set: Average loss: 0.0230, Accuracy: 1574/2000 (78.700%)

Train Epoch: 2	 Loss: 0.023099

Test set: Average loss: 0.0229, Accuracy: 1574/2000 (78.700%)

Train Epoch: 3	 Loss: 0.023055

Test set: Average loss: 0.0229, Accuracy: 1574/2000 (78.700%)

Train Epoch: 4	 Loss: 0.023035

Test set: Average loss: 0.0229, Accuracy: 1574/2000 (78.700%)

Train Epoch: 5	 Loss: 0.023022

Test set: Average loss: 0.0228, Accuracy: 1574/2000 (78.700%)

Train Epoch: 6	 Loss: 0.023006

Test set: Average loss: 0.0228, Accuracy: 1574/2000 (78.700%)

Train Epoch: 7	 Loss: 0.022986

Test set: Average loss: 0.0228, Accuracy: 1574/2000 (78.700%)

Train Epoch: 8	 Loss: 0.022958

Test set: Average loss: 0.0228, Accuracy: 1574/2000 (78.700%)

Train Epoch: 9	 Loss: 0.022925

Test set: Average loss: 0.0227, Accuracy: 1574/2000 (78.700%)

Train Epoch: 10	 Loss: 0.022887

Test set: Average

## Accuracy of a basic ANN model is around 82.5 % after training for 20 epochs. 

#### It seems Accuracy is not improving much ;Other techniques can be implemented to expect betterperformance.
#### We will be implementing following techniques:
#### 1>Batch Normalization 
####  2>Drop Out layer


### Implementing Batch Normalization


In [18]:
#Creating ANN architecture and introducing Batch Normalization of internal outputs of hidden layers

class BNAnn(nn.Module):
    def __init__(self,input_features,hidden1,hidden2,out_features): 
        super().__init__()
        self.model_BNN = nn.Sequential(
            nn.Linear(input_features,hidden1),
            nn.BatchNorm1d(hidden1), #applying batch norm before the output of activation function of first hidden layers
            nn.ReLU(),
            nn.Linear(hidden1, hidden2),
            nn.BatchNorm1d(hidden2),  #applying batch norm before the output of activation function of secon hidden layer 
            nn.ReLU(),
            nn.Linear(hidden2, out_features),
            nn.BatchNorm1d(out_features),
            nn.Sigmoid()
        )
             
    def forward(self, x):
        x = self.model_BNN(x)
        return x
   

In [19]:
####instantiate the  Batch Normalization  Model
torch.manual_seed(20)
BNAnnModel=BNAnn(input_features,hidden1,hidden2,out_features)
BNoptimizer=torch.optim.Adam(BNAnnModel.parameters(),lr=learning_rate)
BNAnnModel.parameters

<bound method Module.parameters of BNAnn(
  (model_BNN): Sequential(
    (0): Linear(in_features=11, out_features=6, bias=True)
    (1): BatchNorm1d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Linear(in_features=6, out_features=6, bias=True)
    (4): BatchNorm1d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU()
    (6): Linear(in_features=6, out_features=1, bias=True)
    (7): BatchNorm1d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): Sigmoid()
  )
)>

In [20]:
# Training and Testing the BNAnnModel

for epoch in range(0,epochs):
    train(BNAnnModel, train_loader, BNoptimizer, epoch)
    test(BNAnnModel, test_loader)

Train Epoch: 0	 Loss: 0.028368

Test set: Average loss: 0.0274, Accuracy: 1345/2000 (67.250%)

Train Epoch: 1	 Loss: 0.026772

Test set: Average loss: 0.0263, Accuracy: 1512/2000 (75.600%)

Train Epoch: 2	 Loss: 0.025922

Test set: Average loss: 0.0256, Accuracy: 1549/2000 (77.450%)

Train Epoch: 3	 Loss: 0.025328

Test set: Average loss: 0.0251, Accuracy: 1571/2000 (78.550%)

Train Epoch: 4	 Loss: 0.024839

Test set: Average loss: 0.0246, Accuracy: 1583/2000 (79.150%)

Train Epoch: 5	 Loss: 0.024476

Test set: Average loss: 0.0243, Accuracy: 1597/2000 (79.850%)

Train Epoch: 6	 Loss: 0.024182

Test set: Average loss: 0.0240, Accuracy: 1621/2000 (81.050%)

Train Epoch: 7	 Loss: 0.023894

Test set: Average loss: 0.0236, Accuracy: 1656/2000 (82.800%)

Train Epoch: 8	 Loss: 0.023650

Test set: Average loss: 0.0234, Accuracy: 1669/2000 (83.450%)

Train Epoch: 9	 Loss: 0.023457

Test set: Average loss: 0.0232, Accuracy: 1671/2000 (83.550%)

Train Epoch: 10	 Loss: 0.023314

Test set: Average

###  Accuracy of a  ANN model  after introducing Batch Normalization is around 84.4 % after training for 19 epochs whic is better than that of basic ANN model

### Implementing Drop Out Layer

In [21]:
#Creating ANN architecture by introducing Drop Out layer.Initialized p values as 0.2 for both of the hidden layers

class DropOutAnn(nn.Module):
    def __init__(self,input_features,hidden1,hidden2,out_features): 
        super().__init__()
        self.model_dropout = nn.Sequential(
            nn.Linear(input_features,hidden1),
            nn.Dropout(0.2),
            nn.ReLU(),
            nn.Linear(hidden1,hidden2),
            nn.Dropout(0.2),
            nn.ReLU(),
            nn.Linear(hidden2,out_features),
            nn.Sigmoid()
        )
             
    def forward(self, x):
        x = self.model_dropout(x)
        return x



In [22]:
####instantiate the  DropOutAnn Model

DropOutAnnModel=DropOutAnn(input_features,hidden1,hidden2,out_features)
DropOutoptimizer=torch.optim.Adam(DropOutAnnModel.parameters(),lr=learning_rate)

In [23]:
#Training and testing the DropOutAnnModel

for epoch in range(0,epochs):
    train(DropOutAnnModel, train_loader, DropOutoptimizer, epoch)
    test(DropOutAnnModel, test_loader)

Train Epoch: 0	 Loss: 0.028805

Test set: Average loss: 0.0264, Accuracy: 1574/2000 (78.700%)

Train Epoch: 1	 Loss: 0.025359

Test set: Average loss: 0.0236, Accuracy: 1574/2000 (78.700%)

Train Epoch: 2	 Loss: 0.024348

Test set: Average loss: 0.0231, Accuracy: 1574/2000 (78.700%)

Train Epoch: 3	 Loss: 0.024049

Test set: Average loss: 0.0230, Accuracy: 1574/2000 (78.700%)

Train Epoch: 4	 Loss: 0.023936

Test set: Average loss: 0.0229, Accuracy: 1574/2000 (78.700%)

Train Epoch: 5	 Loss: 0.023885

Test set: Average loss: 0.0229, Accuracy: 1574/2000 (78.700%)

Train Epoch: 6	 Loss: 0.023825

Test set: Average loss: 0.0229, Accuracy: 1574/2000 (78.700%)

Train Epoch: 7	 Loss: 0.023697

Test set: Average loss: 0.0229, Accuracy: 1574/2000 (78.700%)

Train Epoch: 8	 Loss: 0.023650

Test set: Average loss: 0.0229, Accuracy: 1574/2000 (78.700%)

Train Epoch: 9	 Loss: 0.023569

Test set: Average loss: 0.0229, Accuracy: 1574/2000 (78.700%)

Train Epoch: 10	 Loss: 0.023499

Test set: Average

### Accuracy has been improved in the model where Batch Normalization is implemented . Accuracy is around 84.5 % which is the   highest amongs all three models.
### Accuracy of ANN model with Drop Out Implementation is around 82.%
#### Accuracy of a basic ANN Model is around 81% which is the least amongst all.

### Conclusions:
#### 1> Batch Normalization helped in improving the accuracy of the Model by 3.5% .
#### 2>Drop Out didnt help much in increasing the Accuracy which makes sense as no. of  neurons is  very less in each hidden layer(We can't increase it much as no of inputs are also not very large) and with the results its seems that model is not overfitted . Drop Out layer provides good resulst when the model is overfitted which is not the case here.
