# Deep Learning with Keras
Tutorial from [Medium Post](https://medium.com/@pushkarmandot/build-your-first-deep-learning-neural-network-model-using-keras-in-python-a90b5864116d)

### Data and Business Problem:

Our basic aim is to predict customer churn for a certain bank i.e. which customer is going to leave this bank service. Dataset is small(for learning purpose) and contains 10000 rows with 14 columns. I am not explaining data in detail as dataset is self explanatory

**Step 1**: *Importing data. Pandas DataFrame gives massive functionality to work on data thus, here we are using pandas to import data.*

In [44]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [45]:
# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
display(dataset.iloc[:9, :])

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0
5,6,15574012,Chu,645,Spain,Male,44,8,113755.78,2,1,0,149756.71,1
6,7,15592531,Bartlett,822,France,Male,50,7,0.0,2,1,1,10062.8,0
7,8,15656148,Obinna,376,Germany,Female,29,4,115046.74,4,1,0,119346.88,1
8,9,15792365,He,501,France,Male,44,4,142051.07,2,0,1,74940.5,0


### Features and Target Value

**Step 2:** *Create matrix of features and matrix of target variable. In this case we are excluding column 1, 2 & 3 as those are `RowNumber`, `CustomerId` & `Surname` which are not useful in our analysis. Column 14, `Exited` is our Target Variable*

In [46]:
# Creating a Features and target
X = dataset.iloc[:, 3:13]
y = dataset.iloc[:, 13]

### Encoding String values

Now, lets encode the string values in `Geography` and `Gender` to numerical values

In [47]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

def one_hot_encode(column_values=None):
    column_encoder = {column: index for index, column in enumerate(set(column_values))}
    return [column_encoder[val] for val in column_values]

X['Geography'] = LabelEncoder().fit_transform(X['Geography'])
X['Gender'] = LabelEncoder().fit_transform(X['Gender'])

In [67]:
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]

**Step 5:** *We will make use of ScikitLearn’s `train_test_split` function to divide our data. Roughly people keep `80:20`, `75:25`, `60:40` as their train test split ratio. Here we are keeping it as `80:20`.*

In [70]:
# Split the dataset into the training and test dataset
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

**Step 6:** * **`StandardScaler`** is available in `ScikitLearn`. In the following code we are fitting and transforming StandardScaler method on train data. We have to standardize our scaling so we will use the same fitted method to transform/scale test data*

In [71]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

**Step 7:** *Importing required Modules. We need **Sequential module** for initializing NN and **dense module** to add Hidden Layers.*

In [77]:
# Importing keras modules
from keras.models import Sequential
from keras.layers.core import Dense, Activation

**Step 8:** *I am giving the name of model as **Classifier** as our business problem is the classification of customer churn.*

In [78]:
# Initializing the Neural Network
classifier = Sequential()

**Step 9:** Adding multiple hidden layer will take bit effort. We will add hidden layers one by one using **Dense** function
<br><br>
Our first parameter is **`output_dim`**. It is simply the number of nodes you want to add to this layer. **`init`** is the initialization of Stochastic Gradient Descent. In Neural Network we need to assign weights to each mode which is nothing but importance of that node. At the time of initialization, weights should be close to $0$ and we will randomly initialize weights using **`uniform`** function. **`input_dim`** parameter is needed only for first layer as model doesn’t know the number of our `input` variables. Remember in our case, the total number of input variables are $11$. In the second layer model automatically knows the number of input variable from the first hidden layer.

**`Activation Function`**: Very important to understand. Neuron applies activation function to `weighted sum` 

$$\sum{W_i x_i}$$

The `closer` the `activation function` value to $1$ the more activated is the neuron and more the neuron passes the signal. Which activation function should be used is critical task. Here we are using **`rectifier(relu)`** function in our hidden layer and **`Sigmoid`** function in our output layer as we want binary result from output layer but if the number of categories in output layer is $\geq{2}$ then use `softmax` function.

In [79]:
# Adding the input layer and the first hidden layer
classifier.add(Dense(66, input_dim = 11))
classifier.add(Activation('relu'))

# Adding the second hidden layer
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))

# Adding the output layer
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

  
  
