## Deep Learning

Deep learning is a machine learning technique that teaches computers to do what comes naturally to humans: learn by example. Deep learning is a key technology behind driverless cars, enabling them to recognize a stop sign, or to distinguish a pedestrian from a lamppost. It is the key to voice control in consumer devices like phones, tablets, TVs, and hands-free speakers. Deep learning is getting lots of attention lately and for good reason. It’s achieving results that were not possible before.

In deep learning, a computer model learns to perform classification tasks directly from images, text, or sound. Deep learning models can achieve state-of-the-art accuracy, sometimes exceeding human-level performance. Models are trained by using a large set of labeled data and neural network architectures that contain many layers.

In [1]:
#Importing the libraries
import numpy as np 
import pandas as pd 
# reading the dataset
dataset = pd.read_csv('../input/Churn_Modelling.csv')
dataset.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [2]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
RowNumber          10000 non-null int64
CustomerId         10000 non-null int64
Surname            10000 non-null object
CreditScore        10000 non-null int64
Geography          10000 non-null object
Gender             10000 non-null object
Age                10000 non-null int64
Tenure             10000 non-null int64
Balance            10000 non-null float64
NumOfProducts      10000 non-null int64
HasCrCard          10000 non-null int64
IsActiveMember     10000 non-null int64
EstimatedSalary    10000 non-null float64
Exited             10000 non-null int64
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB


In [3]:
dataset.describe()

Unnamed: 0,RowNumber,CustomerId,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5000.5,15690940.0,650.5288,38.9218,5.0128,76485.889288,1.5302,0.7055,0.5151,100090.239881,0.2037
std,2886.89568,71936.19,96.653299,10.487806,2.892174,62397.405202,0.581654,0.45584,0.499797,57510.492818,0.402769
min,1.0,15565700.0,350.0,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,2500.75,15628530.0,584.0,32.0,3.0,0.0,1.0,0.0,0.0,51002.11,0.0
50%,5000.5,15690740.0,652.0,37.0,5.0,97198.54,1.0,1.0,1.0,100193.915,0.0
75%,7500.25,15753230.0,718.0,44.0,7.0,127644.24,2.0,1.0,1.0,149388.2475,0.0
max,10000.0,15815690.0,850.0,92.0,10.0,250898.09,4.0,1.0,1.0,199992.48,1.0


There are no null values in the data, hence the data is clean. Remove the columns that are not required for the dataset and changing the dataframe to numpy array by dividing the dependent and independent variables.

In [4]:
dataset = dataset.drop(['RowNumber','CustomerId','Surname'], axis=1)
dataset.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


Feature selection, converting the categorical to numerical.

In [5]:
geography = pd.get_dummies(dataset['Geography'],drop_first=True)
# similarly for Gender colimn as well. If there are n dummy columns, consider n-1 so drop any one of the columns.
gender = pd.get_dummies(dataset['Gender'],drop_first=True)

In [6]:
dataset = dataset.drop(['Geography','Gender'], axis=1)
# add the columns to the original dataset.
dataset = pd.concat([dataset,geography,gender],axis=1)
dataset.head()

Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Germany,Spain,Male
0,619,42,2,0.0,1,1,1,101348.88,1,0,0,0
1,608,41,1,83807.86,1,0,1,112542.58,0,0,1,0
2,502,42,8,159660.8,3,1,0,113931.57,1,0,0,0
3,699,39,1,0.0,2,0,0,93826.63,0,0,0,0
4,850,43,2,125510.82,1,1,1,79084.1,0,0,1,0


Spearating dependent and independent variables.

In [7]:
X = dataset.drop("Exited",axis=1)
y = dataset['Exited']

Feature scaling can vary your results a lot while using certain algorithms and have a minimal or no effect in others. Example : if the data has the balance column in rupees and paise, then it will be scaled either to rupees or paise.

In [8]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()

Before fitting the model, let us understand the terms:

**fit**: when you want to train your model without any pre-processing on
the data

**transform**: when you want to do pre-processing on the data
using one of the functions from sklearn.preprocessing

**fit_transform**(): It's same as calling fit() and then transform() - a
shortcut

**Please go through this link for a better understanding**

https://ml-cheatsheet.readthedocs.io/en/latest/nn_concepts.html#weights

In [9]:
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

  return self.partial_fit(X, y)
  return self.fit(X, **fit_params).transform(X)
  


In [10]:
#Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
# Initialising the ANN
classifier = Sequential()

Using TensorFlow backend.


The Keras Python library for deep learning focuses on the creation of models as a sequence of layers. The simplest model is defined in the Sequential class which is a linear stack of Layers.
You can create a Sequential model and define all of the layers in the constructor, as below.

In [11]:
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))
# Adding the second hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform',activation = 'sigmoid'))

Instructions for updating:
Colocations handled automatically by placer.


**Input Layer**

Holds the data your model will train on. Each neuron in the input layer represents a unique attribute in your dataset (e.g. height, hair color, etc.).

**Hidden Layer**

Sits between the input and output layers and applies an activation function before passing on the results. There are often multiple hidden layers in a network. In traditional networks, hidden layers are typically fully-connected layers — each neuron receives input from all the previous layer’s neurons and sends its output to every neuron in the next layer. This contrasts with how convolutional layers work where the neurons send their output to only some of the neurons in the next layer.

**Output Layer**

The final layer in a network. It receives input from the previous hidden layer, optionally applies an activation function, and returns an output representing your model’s prediction.

**Definition of activation function**:- Activation function decides, whether a neuron should be activated or not by calculating weighted sum and further adding bias with it. The purpose of the activation function is to introduce non-linearity into the output of a neuron.

**Explanation** :-
We know, neural network has neurons that work in correspondence of weight, bias and their respective activation function. In a neural network, we would update the weights and biases of the neurons on the basis of the error at the output. This process is known as back-propagation. Activation functions make the back-propagation possible since the gradients are supplied along with the error to update the weights and biases.

ReLU (Rectified Linear Unit) Activation Function. The ReLU is the most used activation function in the world right now.Since, it is used in almost all the convolutional neural networks or deep learning.

In [12]:
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

**adam** : Adam is an optimization algorithm that can used instead of the classical stochastic gradient descent procedure to update network weights iterative based in training data.



In [13]:
classifier.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 6)                 72        
_________________________________________________________________
dense_2 (Dense)              (None, 6)                 42        
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 7         
Total params: 121
Trainable params: 121
Non-trainable params: 0
_________________________________________________________________


One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.


In [14]:
#Fitting the ANN to the Training set
history = classifier.fit(X_train, y_train, batch_size = 10, epochs = 10)

Instructions for updating:
Use tf.cast instead.
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


One epoch means that each sample in the training dataset has had an opportunity to update the internal model parameters. An epoch is comprised of one or more batches. For example, as above, an epoch that has one batch is called the batch gradient descent learning algorithm.

You can think of a for-loop over the number of epochs where each loop proceeds over the training dataset. Within this for-loop is another nested for-loop that iterates over each batch of samples, where one batch has the specified “batch size” number of samples

In [15]:
#history
# Part 3 - Making predictions and evaluating the model

# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
y_pred[:5]

array([[False],
       [False],
       [False],
       [False],
       [False]])

In [16]:
# Making the classification report
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
print(classification_report(y_test,y_pred))


              precision    recall  f1-score   support

           0       0.85      0.98      0.91      1595
           1       0.77      0.30      0.43       405

   micro avg       0.84      0.84      0.84      2000
   macro avg       0.81      0.64      0.67      2000
weighted avg       0.83      0.84      0.81      2000



In [17]:
print(accuracy_score(y_test, y_pred)*100)

84.0


In [18]:
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[1560   35]
 [ 285  120]]


In [19]:
score = classifier.evaluate(X_test,y_test)
print(score)
print('loss = ', score[0])
print('acc = ', score[1])

[0.40701151728630064, 0.84]
loss =  0.40701151728630064
acc =  0.84


In [20]:
# change the epochs to 5, 10 from 2
# we got 79% acc with 2 & 5 & 20 epochs with SGD
# we got 83% acc with 20 epochs with adam
# Initialising the ANN
classifier = Sequential()
classifier.add(Dense(units = 20, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))
# Adding the second hidden layer
classifier.add(Dense(units = 10, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the third hidden layer
classifier.add(Dense(units = 20, kernel_initializer = 'uniform', activation = 'relu'))
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
history = classifier.fit(X_train, y_train, batch_size = 10, epochs = 20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [21]:
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
y_pred[:5]

array([[False],
       [False],
       [False],
       [False],
       [False]])

In [22]:

from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.86      0.95      0.90      1595
           1       0.68      0.40      0.51       405

   micro avg       0.84      0.84      0.84      2000
   macro avg       0.77      0.68      0.71      2000
weighted avg       0.83      0.84      0.82      2000



Making the Confusion Matrix

In [23]:
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[1517   78]
 [ 241  164]]


In [24]:
score = classifier.evaluate(X_test,y_test)
print(score)
print('loss = ', score[0])
print('acc = ', score[1])

[0.39525816226005556, 0.8405]
loss =  0.39525816226005556
acc =  0.8405


In [25]:
print(accuracy_score(y_test, y_pred)*100)

84.05


More the epoch's, more the accuracy score! :)