# Artificial Neural Network Simplified (Churn Dataset)

### **Goals of the project -** 
* To understand the basic implemetation of the ANN
* To build the ANN layer by layer and understanding the significance of each layer and the arguments used
* To understand how to cross validate the results of ANN
* Learn to fine tune the ANN using Grid Search Mechanism


In [None]:
import pandas as pd
import warnings  
warnings.filterwarnings('ignore') # to ignore the warnings

In [None]:
data = pd.read_csv('../input/churn-modelling/Churn_Modelling.csv')
data.head()

## **Step 1** : Pre-processing

In [None]:
# encoding the categorical columns and getting rid of the redundant columns
geog = pd.get_dummies(data['Geography'], drop_first=True)
gend = pd.get_dummies(data["Gender"], drop_first=True)

In [None]:
# converting these columns to 'int'
geog = geog.astype(int)
gend = gend.astype(int)

In [None]:
# concatenating these encoded variables to the original dataset
data1 = pd.concat([data, gend, geog], axis=1)

In [None]:
# seperating the independent and dependent variables

# taking only the important variables(columns names) from the orignal dataset
feature_cols = ['CreditScore', 'Age', 'Tenure', 'Balance',
                'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary', 'Male', 'Germany', 'Spain']

x = data1[feature_cols]
y = data1['Exited']

In [None]:
# splitting the data into training and testing
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=2)

In [None]:
# scaling the data to crush the impact of variable with larger weight in the analysis
# this method equalizes range and variability in the dataset
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.fit_transform(x_test)

## **Step - 2** : Building the Artificial Neural Network

In [None]:
# importing the required libraries to form an Artificial Neural Network
from keras.models import Sequential     # required to initialize the neural network coz ANN is a sequence of layers
from keras.layers import Dense          # to build the layers in ANN

In [None]:
# initializing the ANN
ann_classifier = Sequential()

**Step 2.1 :** Adding the input layer and the 1st hidden layer

In [None]:
ann_classifier.add(Dense(units=6, kernel_initializer = 'uniform', activation='relu', input_dim=11))

**Arguments used -**
* `units` = no. of nodes in hidden layer, generally half of the total of all variables
* `kernel_initializer` = 'uniform' means assigining weights between 0 and 1 in a uniform manner
* `activation='relu'` means assigning rectifier function at the hidden layer
* `input_dim=11` means no. of input neurons in the input layer (no. of variables in training set)

**Step 2.2 :** Adding the 2nd hidden layer

This time, there is no need to specify the input layer as the operation above tells this layer what input to expect

In [None]:
ann_classifier.add(Dense(units=6, kernel_initializer = 'uniform', activation='relu'))

**Step 2.3 :** Adding the output layer

In [None]:
ann_classifier.add(Dense(units=1, kernel_initializer = 'uniform', activation='sigmoid'))

**Step 2.4 :** Compiling the ANN

In [None]:
ann_classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

**Arguments used -** 
* `optimizer` = name of the algorithm we want to apply, usually SGD algorithm known by 'adam'
* `loss` = it is a loss function within SGD algorithm, or the function we need to optimize to find optimal weights usually based on the activation function used for the o/p layer, or the type of dependent variable
* `metrics` parameter has [ ] coz it expects a list of values as the weights have been calculated after each observation or each batch of observations. Hence the algorithm uses this parameter to calculate the accuracy to improve the model performance

**Step 2.5 :** Fitting the ANN to the training set

**Arguments used -**
* `batch_size` means after how many observations the weights should be updated
* `epochs` means how many times you want to run through the network
* `1` epoch would signify that whole data has been passed through the network once

In [None]:
ann_classifier.fit(x_train, y_train, batch_size=10, epochs=100)

## **Step 3 :** Predicting the results for test set

In [None]:
y_pred = ann_classifier.predict(x_test)
y_pred = y_pred > 0.5       

* Here we set a threshold of 0.5 
* People having this score greater than 0.5 means a probability of leaving the bank
* Hence we apply a trick here that if values are less than 0.5 then it would return False and if greater than 0.5 it would return True
* Then we plot the Confusion Matrix for the same

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
print("The accuracy obtained on testing set is", round((accuracy_score(y_test, y_pred) * 100), 2), '%')

## **Step 4 :** Evluating the ANN (Cross Validation)

**Step 4.1 :** Wrapping k-fold cross validation into keras model

In [None]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score

**Step 4.2 -** Building a function to initialize the ANN and its respective layers

In [None]:
def build_classifier():
    from keras.models import Sequential
    from keras.layers import Dense
    ann_classifier = Sequential()
    ann_classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu', input_dim=11))
    ann_classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu'))
    ann_classifier.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))
    ann_classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return ann_classifier

**Step 4.3 -** Performing the cross validation

In [None]:
ann_classifier = KerasClassifier(build_fn = build_classifier, batch_size = 10, epochs = 100)
accuracies = cross_val_score(estimator=ann_classifier, X=x_train, y=y_train, cv=7, n_jobs=-1)
# will contain 10 accuracies returned by kfold cv
print("The average of the accuracies is", round((accuracies.mean() * 100), 2), '%')
print("The standard deviation of the accuracies is ", accuracies.std())

## **Step 5 :** Tuning the ANN 
* This is usually done for the sake of ease in choosing the best parameters for the ANN instead of manually imputing them over and over
* This method also saves time by avoiding trial and error
* We use the Grid Search method for this task

In [None]:
from sklearn.model_selection import GridSearchCV
def build_classifier(optimizer):
    from keras.models import Sequential
    from keras.layers import Dense
    ann_classifier = Sequential()
    ann_classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu', input_dim=11))
    ann_classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu'))
    ann_classifier.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))
    ann_classifier.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return ann_classifier

A small change we do while building this model is that the 'optimizer' argument is passed while calling the function so that it can use the optimizers provided in the list below

In [None]:
# here we set what parameters to pass to check for the optimal values suggested by this method
ann_classifier = KerasClassifier(build_fn = build_classifier)

# we pass these arguments of parameters as a list
params = {'batch_size': [25, 32], 'nb_epoch': [100, 200, 300], 'optimizer': ['adam', 'rmsprop']}

grid_search = GridSearchCV(estimator=ann_classifier, param_grid=params, cv=10, scoring='accuracy')
grid_search = grid_search.fit(x_train, y_train)
best_parameters = grid_search.best_params_      # will give the best parameters
best_accuracy = grid_search.best_score_         # will give the best accuracy score

In [None]:
# checking the parameters obtained by the grid search mechanism
print(best_parameters)
print(best_accuracy)

## **Step - 6 :** Running the ANN again based on parameters obtained above

In [None]:
# defining the layers
ann_classifier2 = Sequential()
ann_classifier2.add(Dense(units=6, kernel_initializer='uniform', activation='relu', input_dim=11))
ann_classifier2.add(Dense(units=6, kernel_initializer='uniform', activation='relu'))
ann_classifier2.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))
ann_classifier2.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
ann_classifier2.fit(x_train, y_train, batch_size=25, epochs=100)

In [None]:
# predicting the result
y_pred2 = ann_classifier2.predict(x_test)
y_pred2 = y_pred2 > 0.5

In [None]:
print('The accuracy obtained after tuning the ANN is', round((accuracy_score(y_test, y_pred2) * 100), 2), '%')