# Deep Learning using Keras in Python : Customer Churn Predictions

In the targeted approach the company tries to identify in advance customers who are likely to churn. The company then targets those customers with special programs or incentives. This approach can bring in huge loss for a company, if churn predictions are inaccurate, because then firms are wasting incentive money on customers who would have stayed anyway. There are numerous predictive modeling techniques for predicting customer churn.

The data files state that the data are "artificial based on claims similar to real world". These data are also contained in the C50 R package.

Data and associated files are also available at: http://www.sgi.com/tech/mlc/db/churn.data

The analysis is done in Keras library in Python. The task is to predict whether the customer will churn or not, using the given features.

Start off by importing the libraries in Python

In [1]:
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Now read the dataset from the url and view it.

In [2]:
#Importing the dataset
custchurn = pd.read_csv("http://www.sgi.com/tech/mlc/db/churn.data", header = None)

In [3]:
custchurn.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,11,12,13,14,15,16,17,18,19,20
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False.
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False.
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False.
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False.
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False.
5,AL,118,510,391-8027,yes,no,0,223.4,98,37.98,...,101,18.75,203.9,118,9.18,6.3,6,1.7,0,False.
6,MA,121,510,355-9993,no,yes,24,218.2,88,37.09,...,108,29.62,212.6,118,9.57,7.5,7,2.03,3,False.
7,MO,147,415,329-9001,yes,no,0,157.0,79,26.69,...,94,8.76,211.8,96,9.53,7.1,6,1.92,0,False.
8,LA,117,408,335-4719,no,no,0,184.5,97,31.37,...,80,29.89,215.8,90,9.71,8.7,4,2.35,1,False.
9,WV,141,415,330-8173,yes,yes,37,258.6,84,43.96,...,111,18.87,326.4,97,14.69,11.2,5,3.02,0,False.


Data is in a very usable format. Check for NA's

## Part I : Data Preprocessing

In [4]:
custchurn.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 21 columns):
0     3333 non-null object
1     3333 non-null int64
2     3333 non-null int64
3     3333 non-null object
4     3333 non-null object
5     3333 non-null object
6     3333 non-null int64
7     3333 non-null float64
8     3333 non-null int64
9     3333 non-null float64
10    3333 non-null float64
11    3333 non-null int64
12    3333 non-null float64
13    3333 non-null float64
14    3333 non-null int64
15    3333 non-null float64
16    3333 non-null float64
17    3333 non-null int64
18    3333 non-null float64
19    3333 non-null int64
20    3333 non-null object
dtypes: float64(8), int64(8), object(5)
memory usage: 546.9+ KB


Now assign the column names from the description and give a random shuffle to the data

In [5]:
custchurn.columns = ["state","account_length","area_code","phone_number","international_plan","voice_mail_plan","number_vmail_messages","total_day_minutes","total_day_calls","total_day_charge","total_eve_minutes","total_eve_calls","total_eve_charge","total_night_minutes","total_night_calls","total_night_charge","total_intl_minutes","total_intl_calls","total_intl_charge","number_customer_service_calls","churned"]
custchurn = custchurn.sample(frac=1).reset_index(drop=True)
custchurn.head(10)

Unnamed: 0,state,account_length,area_code,phone_number,international_plan,voice_mail_plan,number_vmail_messages,total_day_minutes,total_day_calls,total_day_charge,...,total_eve_calls,total_eve_charge,total_night_minutes,total_night_calls,total_night_charge,total_intl_minutes,total_intl_calls,total_intl_charge,number_customer_service_calls,churned
0,MS,140,408,372-5262,no,no,0,162.6,98,27.64,...,109,17.53,141.6,66,6.37,8.2,2,2.21,1,False.
1,WY,16,415,400-3197,no,no,0,174.7,83,29.7,...,122,23.87,171.7,80,7.73,10.5,8,2.84,5,False.
2,IN,130,408,334-9818,no,no,0,115.6,129,19.65,...,104,14.26,141.8,124,6.38,12.6,9,3.4,1,False.
3,VA,121,415,357-7064,no,no,0,134.1,112,22.8,...,104,16.58,159.6,139,7.18,10.5,2,2.84,2,False.
4,OH,56,408,349-2654,no,no,0,91.1,90,15.49,...,115,15.24,300.7,89,13.53,11.9,8,3.21,2,False.
5,VT,43,408,331-8713,no,no,0,135.8,125,23.09,...,88,13.87,229.8,106,10.34,12.6,3,3.4,0,False.
6,SC,130,415,396-4410,no,no,0,212.8,102,36.18,...,137,16.13,170.1,105,7.65,10.6,4,2.86,0,True.
7,CT,37,408,347-7675,no,no,0,134.9,98,22.93,...,130,21.11,236.2,113,10.63,14.7,2,3.97,3,False.
8,OH,127,408,396-9462,no,no,0,139.6,94,23.73,...,112,20.48,127.1,88,5.72,8.8,4,2.38,2,False.
9,OK,138,510,406-5532,no,yes,33,155.2,139,26.38,...,79,22.81,186.4,71,8.39,9.7,4,2.62,3,False.


### Create Feature vector and target vector 

Now create the feature vectors, avoiding 'state', 'area_code','phone_number'. The target vector is the 'churned' variable

In [6]:
X = custchurn.iloc[:,[1,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]].values
y = custchurn.iloc[:,20].values

### Encoding Categorical Features

We have categorical features,'international_plan', 'voice_mail_plan' and 'churned'.Encode them into numerics

In [7]:
# Encoding categorical features
from sklearn.preprocessing import LabelEncoder
labelencoder_X_1 = LabelEncoder()
X[:,1] = labelencoder_X_1.fit_transform(X[:,1])
labelencoder_X_2 = LabelEncoder()
X[:,2] = labelencoder_X_2.fit_transform(X[:,2])
# Encoding Target variable 'churned'
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)

### Splitting the dataset into the Training set and Test set

In [8]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

### Feature Scaling

It is absolutely necessary to do feature scaling for neural networks because its computationally intensive !

In [9]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)



## Part II : Building the Artificial Neural Network

We build the ANN with the Keras library in Python. Also we import the class 'Sequential' for initializing the network as a sequence of layers and the class 'Dense' for building actual layers.

In [10]:
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential # initialize ANN
from keras.layers import Dense # build layers

Using TensorFlow backend.


In [11]:
# Initialising the ANN
# Defining ANN as a sequence of layers
classifier = Sequential()

Here we are building one input layer with 17 neurons corresponding to the 17 input features and two hidden layers with 9 neurons in it. The problem being a binary classification, ouput layer has only one neuron. No. of neurons in hidden layer may be calculated as : (17+1)/2 =9

Also, lets use the activation function 'recifier' for input and hidden layers and 'sigmoid' function for the ouput layer. 

In [12]:
# Adding the input layer and the first hidden layer
# rectifier act. fun for hidden layers
classifier.add(Dense(units = 9, kernel_initializer = 'uniform', activation = 'relu', input_dim = 17))
# output_dim = no of nodes in hidden layer = (17+1)/2 =9

In [13]:
# Adding the second hidden layer
classifier.add(Dense(units = 9, kernel_initializer = 'uniform', activation = 'relu'))

# Adding the output layer
# sigmoid act. function for output layers
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

Now compile the ANN using the optimizer 'adam', loss function, 'binary_crossentropy with the accuracy as the evaluation parameter.

In [14]:
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

Having compliled the ANN, lets now fit the ANN based learning to our training set, in batches of 10, for 100 epochs/iterations

In [15]:
# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x1293990b8>

## Part III : Making the predictions and evaluating the model

We now make predictions using the predict method and print the confusion matrix

In [16]:
#Part 3 - Making the predictions and evaluating the model

# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5) #important step here

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

In [17]:
cm

array([[563,   6],
       [ 32,  66]])

In [18]:
(cm[0,0]+cm[1,1])/len(y_test)

0.94302848575712139

## Concluding Remarks

1. The Deep Learning classifier has an accuracy of 94.3 % on the test set. It is a good range for the dataset, but we can still improve the accuracy by adding more hidden layers and training for more number of epochs

2. Statistical significance tests may be performed to find the relevant variables for the case
