## Analysis of Churn Modelling dataset :

### About the datset :

In [None]:
This data set contains details of a banks customers :

Attributes:
1)  RowNumber        - Row Numbers from 1 to 10000
2)  Customer Id      - Unique Ids for bank customer identification
3)  Surname          - Customer's last name
4)  Credit Score     - Credit score of the customer
5)  Geography        - The country from which the customer belongs
6)  Gender           - Male or Female
7)  Age              - Age of the customer
8)  Tenure           - Number of years for which the customer has been with the bank
9)  Balance          - Bank balance of the customer
10) Num Of Products  - Number of bank products the customer is utilising
11) Has Credit Card  - Binary Flag for whether the customer holds a credit card with the bank or not
12) Is Active Member - Binary Flag for whether the customer is an active member with the bank or not
13) Estimated Salary - Estimated salary of the customer in Dollars
14) Exited           - Binary flag 1 if the customer closed account with bank and 0 if the customer is retained

In [None]:
The target variable is a binary variable reflecting the fact whether the customer left the bank (closed his account) or he continues to be a customer i.e. the goal is to predict whether the customer will leave or left the bank or will continues to be a customer.
In this I am going to:
1) Take a look at the data.
2) Analyze data.
3) Perform various operations like encoding, scaling etc.
4) Build an ANN model to analyze and predict the outcome.

### Importing the libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### Importing the dataset

In [2]:
dataset = pd.read_csv('C:\\Users\\Home\\Documents\\New folder\\Deep Learning\\Churn_Modelling.csv')


In [3]:
dataset.head(5)

#This gives first five 5 observations in the output.

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [4]:
#iloc is purely integer-location based indexing for selection by position.
#It is used to select rows and columns by number, in the order that they appear in the data frame.

X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

In [5]:
X

array([[619, 'France', 'Female', ..., 1, 1, 101348.88],
       [608, 'Spain', 'Female', ..., 0, 1, 112542.58],
       [502, 'France', 'Female', ..., 1, 0, 113931.57],
       ...,
       [709, 'France', 'Female', ..., 0, 1, 42085.58],
       [772, 'Germany', 'Male', ..., 1, 0, 92888.52],
       [792, 'France', 'Female', ..., 1, 0, 38190.78]], dtype=object)

In [6]:
y

array([1, 0, 1, ..., 1, 1, 0], dtype=int64)

### Encoding categorical data…

In [7]:
#Encoding Categorical Data.
#Encoding Independent variables.

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()

#To avoid from dummy variable trap, we will remove one dummy variable column.
X = X[:, 1:]
X

In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.


array([[0.0000000e+00, 0.0000000e+00, 6.1900000e+02, ..., 1.0000000e+00,
        1.0000000e+00, 1.0134888e+05],
       [0.0000000e+00, 1.0000000e+00, 6.0800000e+02, ..., 0.0000000e+00,
        1.0000000e+00, 1.1254258e+05],
       [0.0000000e+00, 0.0000000e+00, 5.0200000e+02, ..., 1.0000000e+00,
        0.0000000e+00, 1.1393157e+05],
       ...,
       [0.0000000e+00, 0.0000000e+00, 7.0900000e+02, ..., 0.0000000e+00,
        1.0000000e+00, 4.2085580e+04],
       [1.0000000e+00, 0.0000000e+00, 7.7200000e+02, ..., 1.0000000e+00,
        0.0000000e+00, 9.2888520e+04],
       [0.0000000e+00, 0.0000000e+00, 7.9200000e+02, ..., 1.0000000e+00,
        0.0000000e+00, 3.8190780e+04]])

### Splitting the dataset into the Training set and Test set

In [8]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
X_train.shape

(8000, 11)

### Feature Scaling

In [9]:
#Feature Scaling is a method to standardize the independent variables present in the data in a fixed range. 
#It is performed during the data pre-processing to handle highly varying magnitudes or values or units.
#If Feature Scaling is not done, then a machine learning algorithm tends to weigh greater values, higher and consider smaller values as the lower values, regardless of the unit of the values.

In [10]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

### Importing the Keras libraries and packages

In [11]:
# Sequential model to initialise our ANN and dense module to build the layers.

import keras
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.


### Initializing the ANN…

In [12]:
#Artificial Neural networks (ANN) are computational algorithms which is capable of machine learning as well as pattern recognition.
#It is presented as systems of interconnected “neurons” which can compute values from inputs.
#It is an information processing technique and it works like the way human brain processes information. 
#ANN includes a large number of connected processing units that work together to process information to generate meaningful results from it.
#We can apply Neural network not only for classification but for regression of continuous target attributes.
#ANN consists of 3 layers - 
#1) Input Layer.
#2) Hidden Layer.
#3) Output Layer.

In [13]:
classifier = Sequential()

### Adding the input layer and the first hidden layer…

In [14]:
#Activation functions are really important for a Artificial Neural Network to learn and make sense of something really complicated and Non-linear complex functional mappings between the inputs and response variable.
#They introduce non-linear properties to our Network.
#Their main purpose is to convert a input signal of a node in a A-NN to an output signal. 
#That output signal now is used as a input in the next layer in the stack.

In [15]:
#Most popular types of Activation functions -
# 1: Sigmoid or Logistic
# 2: Tanh - Hyperbolic tangent
# 3: ReLu - Rectified linear units

In [16]:
#ReLU or Rectified Linear Unit is a kind of activation function of form R(x) = max(0, x).
#It is the most commonly used activation function in neural networks, especially in CNNs. 
#If you are unsure of what activation function to use in your network, ReLU is usually a great first choice.

classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu', input_dim = 11))

  """


### Adding the second hidden layer…

In [17]:
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))

  """Entry point for launching an IPython kernel.


### Adding the output layer…

In [18]:
#Sigmoid is an activation function of form f(x) = 1 / 1 + exp(-x) . 
#It's range is between 0 and 1. 
#It is an S — shaped curve and it is easy to understand.

classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

  """


### Compiling the ANN…

In [19]:
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

### Fitting the ANN to the Training set

In [20]:
classifier.fit(X_train, y_train, batch_size = 10, nb_epoch = 100)

  """Entry point for launching an IPython kernel.


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.callbacks.History at 0x22e00c579e8>

### Making the predictions and evaluating the model

In [21]:
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
y_pred 

array([[False],
       [False],
       [False],
       ...,
       [False],
       [False],
       [False]])

### Making the Confusion Matrix

In [22]:
#A confusion matrix is a table that is often used to describe the performance of a classification model on a set of test data for which the true values are known. 
#It allows the visualization of the performance of an algorithm.
#Confusion matrix basically gives us an idea about how well our classifier has performed, with respect to performance on individual classes.

In [23]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
cm

array([[1546,   49],
       [ 262,  143]], dtype=int64)

In [24]:
# Import classification_report

from sklearn.metrics import precision_recall_fscore_support,classification_report
print(classification_report(y_pred,y_test))

              precision    recall  f1-score   support

       False       0.97      0.86      0.91      1808
        True       0.35      0.74      0.48       192

    accuracy                           0.84      2000
   macro avg       0.66      0.80      0.69      2000
weighted avg       0.91      0.84      0.87      2000



In [25]:
#The accuracy obtained here is 84%.

##  Conclusion

In [None]:
I have tried to explore and use many stuff with this dataset in order to understand the very basic topics about:
1) Encoding categorical data.
2) Feature Scaling.
3) Activation Functions.
4) Model building.

In [None]:
Thank You

Any comments, suggestions, corrections are welcome.