# Demo - Deep learning
##  Artificial Neural Network for bank's costumer prediction

This demo demonstrates the ability of an Artificial Neural Network (ANN) to generate preditions from a data set with several features and highly non-linear behaviour. 

A fake, but relalistic, bank's customer data set will be used to predict if a customer will leave or not a bank using historical data with several features like the number of products, credit score, estimate salary, etc.

This demo comes from an excellent [tutorial](https://www.superdatascience.com/deep-learning/) from Kirill Eremenko and Hadelin de Ponteves.

Requirements: pandas, keras with TensorFlow backend, sklearn libraries and packages are required. This codes has been tested on a Windows 10 machine and Python 3.5.

### Importing libraries

In [1]:
import pandas as pd

### Importing the data set

In [2]:
dataset = pd.read_csv('Churn_Modelling.csv')
dataset.shape

(10000, 14)

The dataset has 10,000 observables and 14 features. Each observable represents one unique costumer. The features are defined as follow:

0) RowNumber (from 1 to 10,000)

1) CustomerId: 8 numerical digits to identify the customer.

2) Surname: fake customer names.

3) CreditScore: credit score of the customer

4) Geography: country from the customer (3 countries: France, Spain, Germany)

5) Gender: Male or Female

6) Age

7) Tenure: number of years the customer is with the bank

8) Balance: bank account current balance

9) NumOfProducts: number of products the customer is subscribed to

10) HasCrCard: if the customer has a credit card (1) or not (0)

11) IsActiveMember: if the customer is active (1) or not (0)

12) EstimatedSalary: estimated salary of the customer (US\$)

13) Exited: historical information if the client exited the bank (1) or not (0)

The goal is to create an algorithm which will be able to predict if a customer will exit or not the bank (outcome 'Exited') given all the features above on a new data set.

As it is a supervised learning process, we need to separate the dependent variable (y) with the independent variables (X). Only features with prediction potential will be used:

In [3]:
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

In [4]:
X

array([[619, 'France', 'Female', ..., 1, 1, 101348.88],
       [608, 'Spain', 'Female', ..., 0, 1, 112542.58],
       [502, 'France', 'Female', ..., 1, 0, 113931.57],
       ..., 
       [709, 'France', 'Female', ..., 0, 1, 42085.58],
       [772, 'Germany', 'Male', ..., 1, 0, 92888.52],
       [792, 'France', 'Female', ..., 1, 0, 38190.78]], dtype=object)

In [5]:
y

array([1, 0, 1, ..., 1, 1, 0], dtype=int64)

### Data Pre-processing
#### Encoding categorical data

As we have the Geography and Gender columns which represent categorizes, they have to be prepared to force the algorithm to treat them as categorical data. 

In [6]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

Encode categorizes into numerical data:

For Geography:

In [7]:
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])

In [8]:
X[:, 1]

array([0, 2, 0, ..., 0, 1, 0], dtype=object)

For Gender:

In [9]:
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])

In [10]:
X[:, 2]

array([0, 0, 0, ..., 0, 1, 0], dtype=object)

For the new Geography column, it is needed to indicate to the algorithm that the columns are factors and not numbers, so the so-called "dummy variables" have to be generated:

In [11]:
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()

To avoid the dummy variables trap, one dummy variable column is removed:

In [12]:
X = X[:,1:]

In [13]:
X

array([[  0.00000000e+00,   0.00000000e+00,   6.19000000e+02, ...,
          1.00000000e+00,   1.00000000e+00,   1.01348880e+05],
       [  0.00000000e+00,   1.00000000e+00,   6.08000000e+02, ...,
          0.00000000e+00,   1.00000000e+00,   1.12542580e+05],
       [  0.00000000e+00,   0.00000000e+00,   5.02000000e+02, ...,
          1.00000000e+00,   0.00000000e+00,   1.13931570e+05],
       ..., 
       [  0.00000000e+00,   0.00000000e+00,   7.09000000e+02, ...,
          0.00000000e+00,   1.00000000e+00,   4.20855800e+04],
       [  1.00000000e+00,   0.00000000e+00,   7.72000000e+02, ...,
          1.00000000e+00,   0.00000000e+00,   9.28885200e+04],
       [  0.00000000e+00,   0.00000000e+00,   7.92000000e+02, ...,
          1.00000000e+00,   0.00000000e+00,   3.81907800e+04]])

#### Preparing training and testing data sets

For training and testing of the algorithm, the data set needs to be split:

In [14]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

The test set will have 20% of the data. A random seed of 0 is set for reproducibility.

#### Scaling of the data

ANN needs to have all the data scaled for computing reason (proper weigths computation and processing speed):

In [15]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

### Creating an ANN with Keras

#### Importing the Keras libraries and packages

In [16]:
from keras.wrappers.scikit_learn import KerasClassifier # keras wrapper for k-validation
from sklearn.model_selection import cross_val_score # k-fold validation
from keras.models import Sequential # used to initiate the ANN
from keras.layers import Dense # used to create the layers of the ANN
from sklearn.model_selection import cross_val_score # k-fold validation
from keras.layers import Dropout # used to avoid overfitting

Using TensorFlow backend.


Note: the parameters choices (number of layers, output dimension (a.k.a units), optimizer, number of epoch, and batch size) presented below have been already tuned and improved using the GridSearchCV to get a resonnable accuracy. For sake of simplicity, the tuning was done separately and will not be presented below.

#### Building the ANN

In [17]:
def build_classifier():
    # Initialising the ANN
    classifier = Sequential()
    # Adding the input layer and the FIRST hidden layer
    classifier.add(Dense(units = 16, init = 'uniform', activation = 'relu', input_dim = 11))
    classifier.add(Dropout(rate = 0.1))
    # Adding the two more hidden layers
    for i in range(1):
        classifier.add(Dense(units = 16, init = 'uniform', activation = 'relu'))
        classifier.add(Dropout(rate = 0.1))
    # Adding the output layer
    classifier.add(Dense(units = 1, init = 'uniform', activation = 'sigmoid'))
    # Compiling the ANN
    classifier.compile(optimizer = 'sgd', loss = 'binary_crossentropy', metrics = ['accuracy'])
    return classifier

Do the architecture and the fitting for the ANN:

In [18]:
classifier = KerasClassifier(build_fn = build_classifier, batch_size = 24, epochs=500)

#### Evaluating ANN with k-fold validation

In [19]:
# (Not run)
# Recommandation: download and run this script as a python script, not in jupyter.
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10, n_jobs = 1)
mean = accuracies.mean()
variance = accuracies.std()

The mean accuracy of the 10-folds validation is 86% with a variance of 0.7% This is an acceptable accuracy, and more tuning are still possible to improve the score. 

### Conclusion

This short demo demonstrates the ability of an ANN to predict if a bank customer will leave the bank or not with diverse client's information (numeric, categorizes, binnary classes, etc.) with a good accuracy on this data set.