## What is Churn Rate?

**Churn rate** (sometimes called attrition rate), in its broadest sense, is a measure of the number of individuals or items moving out of a collective group over a specific period. It is one of two primary factors that determine the steady-state level of customers a business will support.


## Part 1 - Data Preprocessing

### Importing the Libraries


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.
import matplotlib.pyplot as plt
import seaborn as sns

### Importing The Dataset

In [None]:
dataset = pd.read_csv("/kaggle/input/bank-churn/Bank_churn_modelling.csv")
dataset

Now, After the first look on the dataset we can conclude that the dependent varaiables() for the prediction of bank churning is from the columns 3 to 12.

So we drop them from X which contains the features Indexes from 3 to 12.

In [None]:
X = dataset.iloc[:, 3:13].values
X

We store the **Dependent value/predicted value** in y by storing the 13th index in the variable **y**.

In [None]:
y = dataset.iloc[:, 13].values
y

### Converting Categorical Variable

Now, We need to convert our categorical dependent variables into numeric dependent variables.

Categorical variables are known to hide and mask lots of interesting information in a data set. It’s crucial to learn the methods of dealing with such variables.

The only 2 values are Gender and Region which need to converted into numerical data.

In [None]:
from sklearn.preprocessing import LabelEncoder

Creating label encoder object no. 2 to encode Gender name(index 2 in features)

In [None]:
labelencoder_X_2 = LabelEncoder()

Encoding Gender from string to just 2 numbers i.e. 0,1(male,female) respectively

In [None]:
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])

### Dummy Variables

A dummy variable (aka, an indicator variable) is a numeric variable that represents categorical data, such as gender, race, political affiliation, etc. Technically, dummy variables are dichotomous, quantitative variables. Their range of values is small; they can take on only two quantitative values.

Now creating Dummy variables.

In [None]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
transformer = ColumnTransformer(transformers=[("OneHot",OneHotEncoder(),[1])],remainder='passthrough')
X = transformer.fit_transform(X.tolist())
X = X.astype('float64')

### Dummy Variable Trap

By including dummy variable in a regression model however, one should be careful of the Dummy Variable Trap. The Dummy Variable trap is a scenario in which the independent variables are multicollinear - a scenario in which two or more variables are highly correlated; in simple terms one variable can be predicted from the others.

We can avoid the dummy variable trap by using one less varaible from all the variable. 

For example , There are three dummies created for the feature "Geography". Now if we remove any one of the dummy then the we will avoid the trap.So, In this case we will remove the first column which has index 0.

In [None]:
X = X[:, 1:]

### Splitting the dataset into the Training set and Test set

In Machine Learning, we make a model which is nothing but an algorithm where some parameters needs to be modified such that it is able to perform good at the application i.e it is able to predict values of one wants to.

How can we modify those parameters such that it can do well ?
We can train the model using data which we call as training data or training set. The training data is the one which already has the actual value that the model should have predicted and thus the algorithm changes the value of parameters to account for the data in the training set.

But how do we know after training the model is overall good ?
For that, we have test data/test set which is basically a different data for which we know the values but this data was never shown to the model before. Thus if the model after training is performing good on test set as well then we can say that the Machine Learning model is good.

If the model is not tested and is made such that it just perform good on training data then parameters will be such that they are only good enough to predict the value for data which was in training set. That is not general. This is called overfitting.

So we don’t land making a useless model which is only good for the training set and not general enough.

Thus test set and training set is important to make Machine Learning model better.

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

### Feature Scaling

Feature Scaling or Standardization  is applied to independent variables or features of data. It basically helps to normalise the data within a particular range. Sometimes, it also helps in speeding up the calculations in an algorithm.

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

As you can see the part 1 has a lot of work in it and we have not even started our neural network.

Now it is important to know that one of the **big part of being a data scientist is data preprocessing** so that we can have a  usable data to apply to our models or neural network.

So, now lets start making our Artificial Neural Network

You can also have a look at [ANN for Bank Churn Modeling](http://https://towardsdatascience.com/building-your-own-artificial-neural-network-from-scratch-on-churn-modeling-dataset-using-keras-in-690782f7d051)

## Part 2 - ANN

Listing out the steps involved in training the ANN with Stochastic Gradient Descent:-

1)Randomly initialize the weights to small numbers close to 0(But not 0).

2)Input the 1st observation of your dataset in the Input Layer, each Feature in one Input Node.

3)Forward-Propagation from Left to Right, the neurons are activated in a way that the impact of each neuron’s activation.
is limited by the weights.Propagate the activations until getting the predicted result y.

4)Compare the predicted result with the actual result. Measure the generated error.

5)Back-Propagation: From Right to Left, Error is back propagated.Update the weights according to how much they are
responsible for the error.The Learning Rate tells us by how much such we update the weights.

6)Repeat Steps 1 to 5 and update the weights after each observation(Reinforcement Learning).
Or: Repeat Steps 1 to 5 but update the weights only after a batch of observations(Batch Learning).

7)When the whole training set is passed through the ANN.That completes an Epoch. Redo more Epochs.

### Importing the Keras libraries 

In [None]:
import keras

### Importing Keras Packages

For building the Neural Network layer by layer

In [None]:
from keras.models import Sequential

To randomly initialize the weights to small numbers close to 0(But not 0)

In [None]:
from keras.layers import Dense

### Initializing the ANN

So there are actually 2 ways of initializing a deep learning model

1. Defining each layer one by one
2. Defining a Graph


We did not put any parameter in the Sequential object as we will be defining the Layers manually

In [None]:
classifier = Sequential()

### Adding the input layer and the first hidden layer

There is no thumb rule but you can set the number of nodes in Hidden Layers as an Average of the number of Nodes in Input and Output Layer Respectively.

* So set Output Dim=6
* Init will initialize the Hidden Layer weights uniformly
* Activation Function is Rectifier Activation Function(Relu)
* Input dim tells us the number of nodes in the Input Layer.This is done only once and wont be specified in further layers.

In [None]:
classifier.add(Dense(output_dim = 6, init = 'uniform' , activation = 'relu', input_dim = 11))

### Adding the second hidden layer

* Set Output Dim=6
* Init will initialize the Hidden Layer weights uniformly
* Activation Function is Rectifier Activation Function(Relu)
* No need for Input Dim.


In [None]:
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))

### Adding the output layer

* Set Output Dim=1
* Init will initialize the Hidden Layer weights uniformly
* Activation Function is Sigmoid Activation Function(sigmoid)

**Sigmoid activation function** is used whenever we need Probabilities of 2 categories or less(Similar to Logistic Regression)


Switch to **Softmax** when the dependent variable has more than 2 categories.

In [None]:
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

### Compiling the ANN

In [None]:
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

### Fitting the ANN to the Training set

In [None]:
classifier.fit(X_train, y_train, batch_size = 10, nb_epoch = 100)

## Part 3 — Making the predictions and evaluating the model

### Predicting the Test set results

In [None]:
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5) #if y_pred is larger than 0.5 it returns true(1) else false(2)

print(y_pred)

**This Model when trained on the train data and when tested on the test data gives us an accuracy of around 84% in both of the cases**.Which from our point of view is Great!!!

### Making the Confusion Matrix

In [None]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

Obtained from Confusion Matrix.You may change values as per what is obtained in your confusion matrix.


**Congratulations! you just wrote your own Neural Network for theBank which had given you this task.**