### Business Problem
- Dataset: Churn_Modelling.csv
- 10000 Customers (10000 rows)
- The bank has seen an unusual turn rate, so they give the dataset to see what is inside.
- Record for 6 months, and see which customers left, which customers stayed.


### How to install Tensorflow on Windows:
- Tensorflow only works with Python 3.5 x64 on Windows:
- Create a new kernel called "py35" or whatever the name:
```bash
conda create -n py35 python=3.5
```
- And then activate this py35 
```bash
activate py35
```
- Install ipykernel
```bash
pip install ipykernel
```
- Then install this new kernel to jupyter notebook:
```bash
python -m ipykernel install --name py35
```
- Finally, create a new notebook with the kernel py35 above

- Install keras:
```bash
pip install keras
```

- Keras is a wrapper of Theano and TensorFlow

In [1]:
import os, sys

sys.path.append(os.path.abspath("Datasets"))
sys.path.append(os.path.abspath("Images"))

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [3]:
dataset = pd.read_csv("Datasets/ML_a_z/Churn_Modelling.csv")

In [4]:
dataset.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [5]:
X = dataset.iloc[:, 3:-1].values
y = dataset.iloc[:, -1].values

In [6]:
# Encoding caetgorical data (Geography, Gender)
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
X[:, 2] = labelencoder_X.fit_transform(X[:, 2])
one_hot_encoder = OneHotEncoder(categorical_features = [1])
# Use one_hot_encoder only for Geography, because the gender only
# has 2 values
X = one_hot_encoder.fit_transform(X).toarray()

In [7]:
pd.DataFrame(X).head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,1.0,0.0,0.0,619.0,0.0,42.0,2.0,0.0,1.0,1.0,1.0,101348.88
1,0.0,0.0,1.0,608.0,0.0,41.0,1.0,83807.86,1.0,0.0,1.0,112542.58
2,1.0,0.0,0.0,502.0,0.0,42.0,8.0,159660.8,3.0,1.0,0.0,113931.57
3,1.0,0.0,0.0,699.0,0.0,39.0,1.0,0.0,2.0,0.0,0.0,93826.63
4,0.0,0.0,1.0,850.0,0.0,43.0,2.0,125510.82,1.0,1.0,1.0,79084.1


In [8]:
# Using K-1 to avoid Dummy Variable Trap:
X = X[:, 1:]

In [9]:
pd.DataFrame(X).head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,0.0,0.0,619.0,0.0,42.0,2.0,0.0,1.0,1.0,1.0,101348.88
1,0.0,1.0,608.0,0.0,41.0,1.0,83807.86,1.0,0.0,1.0,112542.58
2,0.0,0.0,502.0,0.0,42.0,8.0,159660.8,3.0,1.0,0.0,113931.57
3,0.0,0.0,699.0,0.0,39.0,1.0,0.0,2.0,0.0,0.0,93826.63
4,0.0,1.0,850.0,0.0,43.0,2.0,125510.82,1.0,1.0,1.0,79084.1


In [10]:
# Splitting the dataset into the training set and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                   test_size=0.25, 
                                                   random_state=0)

In [11]:
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

In [12]:
# Let's make the ANN using Keras
import keras

Using TensorFlow backend.


In [13]:
# Sequential module : initialize the Neural Network
# Dense module : to create the layer for the ANN
from keras.models import Sequential
from keras.layers import Dense

In [14]:
# Intializing the ANN
# Neural Network: Sequence of layers, hidden layers and an output
# layer
classifier = Sequential()

### Training the ANN with Stochastic Gradient Descent
<img src="Images/17_1_ann.JPG" width="600">
- The Dense function will take care of the first step
- 2nd Step: Apply the activation function
<img src="Images/17_1_2nd_step.JPG" width="600">
- Another activation function: Rectifier, sigmoid
<img src="Images/17_1_rectifier.JPG" width="600">
<img src="Images/17_1_sigmoid.JPG" width="600">
- 3rd Step: Pass on the AF value to y:
<img src="Images/17_1_3rd_step.JPG" width="600">
- We want: probability of a customer to leave the bank


In [15]:
?Dense

In [16]:
X.shape

(10000, 11)

In [17]:
y.shape

(10000,)

In [18]:
# Adding the input layer and the first hidden layer
# output_dim=int (Dense): 0.
# To choose the output_dim: We see that the number of independent
# variables is 11 (X) and the number of output is 1 node (binary
# outcome) 1 or 0. So the average is (11 + 1) / 2 = 6 nodes in the
# hidden layer
# Randomly initialize small number using uniform distribution
classifier.add(Dense(output_dim=6, init = 'uniform',
                    activation='relu', # rectifier activation function
                    input_dim=11))
# For the next layers, we dont need to specify the input_dim,
# the classifier will automatically know 11

<img src="Images/17_1_deep.JPG" width="600">
- Deep learning: artificial neural network with many hidden layers.
- Need to know how to add hidden layer to the network

In [19]:
# Adding the second hidden layer
# Should: use rectifier activation function for the hidden layer
# And the sigmoid activation function for the output layer
classifier.add(Dense(output_dim=6, init = 'uniform',
                    activation='relu'))

In [20]:
# Adding the output layer
classifier.add(Dense(output_dim=1, init = 'uniform',
                    activation='sigmoid'))

In [21]:
# optimizer: the algorithm we will use to find the optimal
# set of weights in the neural network. Up to this point, we
# only initialize the weights. So we need to apply some algorithm
# to make a powerful NN: Stochastic Gradient Descent (adam)
# loss: the loss function of the algorithm, to find the minimum
# value of this loss using gradient descent: Logarithmic loss
# metrics: accuracy
?classifier.compile

In [None]:
# Compiling the whole ANN
classifier.compile(optimizer='adam', loss='binary_crossentropy',
                  metrics=['accuracy'])

In [None]:
# Fitting to the training set: Step 6, 7
classifier.fit(X_train, y_train, batch_size=10,
              nb_epoch=100);
# Error here. Test with Spyder or Pycharm. Loss function getting
# smaller after each epoch, and accuracy is increased.

Epoch 1/100


In [1]:
# Predict
y_pred = classifier.predict(X_test) # Will return the probability
y_pred = (y_pred > 0.5)

NameError: name 'classifier' is not defined

In [None]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)