# # ARTIFICAL NEURAL NETWORKS

Below is an example of how to  create an ANN in Python. In this case, the data comes from the following link, which you can download as a csv. 
https://docs.google.com/spreadsheets/d/18-kNzW_smJjpNChsrzLBzzYKGJYcnDwbpt-FuJEyVzA/edit?usp=sharingWith this data the below ANN will build a modelto predict whether a customer will 
churn (yes/no)
However, this template can be used for other ANN tasks too. 

 # Part 1: DATA PREPROCESSING

Importing the libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Importing the dataset

In [None]:
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

Encoding categorical data
Encoding the Independent Variable

In [None]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])

Creating 3 dummy variables from index 1

In [None]:
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()

Removing one of these dummy variables to avoid the dummy varaible trap

In [None]:
X = X[:, 1:]

Splitting the dataset into the Training set and Test set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

Feature Scaling

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# PART 2: BUILDING THE ANN

Importing Keras packages and libraries

In [None]:
from keras.models import Sequential #used to initialise our model
from keras.layers import Dense #used to create the layers in our ANN

Initialising the ANN as a sequence of layers

In [None]:
classifier = Sequential()

Adding the input layer and the first hidden layer

In [None]:
classifier.add(Dense(units = 6,# number of nodes in hidden layer. tip; use avg. of no. nodes in input and output layer (in this case 11 & 1, which is 6)
                     kernel_initializer ='uniform', #initial weights, uniform function sets them randomly where they are close to, but not quite 0
                     activation = 'relu',#activiation function, in this case we will use recitifier ala 'relu' for hidden layer
                     input_dim = 11 #because we have 11 independant variables
                     ))

Adding the second hidden layer

In [None]:
classifier.add(Dense(units = 6,
                     kernel_initializer ='uniform', 
                     activation = 'relu'
                     #because we have created the first hidden layer, we don't have to code input dimensions for subsequent hidden layers
                     ))

Adding the output layer

In [None]:
classifier.add(Dense(units = 1, #output layer will be 1 node, becse it is categotical with a binary outcome
                     kernel_initializer ='uniform', 
                     activation = 'sigmoid'#sigmoid functions works well for generating probabilities in the output layer
                     ))

Compiling the ANN

In [None]:
classifier.compile(optimizer = 'adam', #aka stochastic gradient descent
                   loss = 'binary_crossentropy',#the loss function. Because we use a sigmoid function in output layer we'll use a logarithmic fuction (cross-entropy)
                   metrics = ['accuracy']#criterion used to evaluate your model, typically we use 'accuracy'. as it can be a list we need to use square brackets
                   )

Fitting the ANN to the training set

In [None]:
classifier.fit(X_train,
               y_train,
               batch_size = 10, #no rule of thumb on correct number
               epochs = 50  #no rule of thumb on correct number
               )

# PART 3: MAKING THE PREDICTIONS AND EVALUATING THE MODEL

Predicting the Test set results

In [None]:
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5) #equivalent of saying if y-pred >0.5 then true, else false aka (1,0)

Making the Confusion Matrix

In [None]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

Predicting a new single new observation
"""Predict if a customer with the folling info will leave bank:
    Geo: France
    Credit Score: 600
    Gender: Male
    Age: 40
    Tenure: 3
    Balance: 60000
    No. Products: 2
    Has Credit Card:Yes
    Is Active Memeber:Yes
    Estimated Salary: 50000"""

In [None]:
new_prediction = classifier.predict(sc.transform(
                                    np.array([[ #when using numpy array we need to use double quare brackets for horizontal arrays
                                                0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000 #in the same order/format as our other data
                                                ]])))

# PART 4:EVALUATING, IMPROVING AND TUNING THE ANN

Evaluating the ANN
We will use K-Fold Cross Validation to evaluate the variance and biases in our models's predicitive power
cross_val_score function will bring back the 10 accuracies of the 10 cross folds made on out test set
this function is only in skikit_learn so we will need to use keras wrapper to intergrate it to our keras model

In [None]:
import keras
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score 
from keras.models import Sequential # Used to initialise our model
from keras.layers import Dense
from keras.layers import Dropout # See section on improving ANN below

This is a function that will build the ANN we created above

In [None]:
def build_classifier(): 
                        classifier = Sequential()
                        classifier.add(Dense(units = 6,
                                             kernel_initializer ='uniform', 
                                             activation = 'relu',
                                             input_dim = 11 
                                             ))

With dropout if needed (add to each layer)

In [None]:
  classifier.add(Dropout(p =  0.1,# The fraction of neurons you want to disable at each iteration
                                               ))
                        
                        
                        classifier.add(Dense(units = 6,
                                             kernel_initializer ='uniform', 
                                             activation = 'relu'
                                             ))
                        classifier.add(Dense(units = 1, 
                                             kernel_initializer ='uniform', 
                                             activation = 'sigmoid'
                                             ))
                        classifier.compile(optimizer = 'adam',
                                           loss = 'binary_crossentropy',
                                           metrics = ['accuracy']
                                           )
                        return classifier # not only builds classifier but also returns it

In [None]:
classifier = KerasClassifier(build_fn = build_classifier,
                             batch_size = 10,
                             epochs = 100
                             )

The below will bring back the 10 accuracies created by the cross fold validation

In [None]:
accuracies =  cross_val_score(estimator = classifier, # object used to fit the data, which is our classifier
                              X = X_train, # the data to fit
                              y = y_train, # the target variable to try and predict
                              cv = 10, # the number of cross-folds, no real rule of thumb but 10 is a good number
                              n_jobs = 1 # the number of cpus to use to do the computation. -1 means all CPU's, so faster computation
                              )

Average prediction accuracy of the 10 k-folds

In [None]:
mean = accuracies.mean()

Improving the ANN:
Drop out regularization to reduce overfitting if needed. Overfitting could be determined by hgh variance in the k-fold prediction results

Tuning the ANN:
If we want to improve the prediction rate of our model, we can use prarmaeter tuning. 
There are 2 types of parameter in our model: learned paramters (weights) and hyperparamters thatn stay fixed (# epochs/neurons, batch sizes, optimisers etc..)
We will use GridSeach to test several combinations of these parameters and bring back the best selection

In [None]:
import keras
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential 
from keras.layers import Dense
def build_classifier(optimizer): 
                        classifier = Sequential()
                        classifier.add(Dense(units = 6,
                                             kernel_initializer ='uniform', 
                                             activation = 'relu',
                                             input_dim = 11 
                                             ))
                        classifier.add(Dense(units = 6,
                                             kernel_initializer ='uniform', 
                                             activation = 'relu'
                                             ))
                        classifier.add(Dense(units = 1, 
                                             kernel_initializer ='uniform', 
                                             activation = 'sigmoid'
                                             ))
                        classifier.compile(optimizer = 'adam',
                                           loss = 'binary_crossentropy',
                                           metrics = ['accuracy']
                                           )
                        return classifier 
                    
classifier = KerasClassifier(build_fn = build_classifier)

Now we create a 'dictionary' of all the hyperparameters that the gridsearch will try combiniationsof and return the the best one

In [None]:
parameters = {'batch_size':[25, 32],
              'epochs':[100, 500],
              'optimizer':['adam','rmsprop']
              }

It will still use k-fold cross-validation to ensure we aren't biased and that there isn't too much variance 

In [None]:
grid_search = GridSearchCV(estimator = classifier,
                          param_grid = parameters,
                          scoring = 'accuracy',
                          cv = 10)

Fitting it to our data

In [None]:
grid_search = grid_search.fit(X_train,
                              y_train)

Checking for the best parameters and best accuracy

In [None]:
best_parameters = grid_search.best_params_
best_accuracy = gird_search.best_score_