# Churn Modelling

Customer churn is an imperative issue that is frequently connected with the existing cycle of the business.In recent year, churn prediction is becoming a very important issue in Banking and telecom industry. In order to deal with this problem, the industry must recognize these customers before they churn. Therefore, developing a unique classifier that will predict future churns is vital.
<br>
![Customer Churn](https://miro.medium.com/max/3384/1*WqId29D5dN_8DhiYQcHa2w.png)
<br>
This data set contains details of a bank's customers and the target variable is a binary variable reflecting the fact whether the customer left the bank (closed his account) or he continues to be a customer.

### Summary about the features and their types in the dataset:
<li>1) <b>RowNumber</b> : Serial number
<li>2) <b>CustomerId</b> :Unique Ids for bank customer identification
<li>3) <b>Surname</b> : Customer's last name
<li>4) <b>CreditScore</b> : Credit score of the customer
<li>5) <b>Geography</b> : The country from which the customer belongs
<li>6) <b>Gender</b> : Male or Female
<li>7) <b>Age</b> : The age of the customer
<li>8) <b>Tenure</b> : Number of years for which the customer has been with the bank
<li>9) <b>Balance</b> : Bank balance of the customer
<li>10) <b>NumOfProducts</b> : Number of bank products the customer is utilising
<li>11) <b>HasCrCard</b> : Binary Flag for whether the customer holds a credit card with the bank or not
<li>12) <b>IsActiveMember</b> : Binary Flag for whether the customer is an active member with the bank or not
<li>13) <b>EstimatedSalary</b> : Estimated salary of the customer in Dollars
<li>14) <b>Exited</b> : Binary flag 1 if the customer closed account with bank and 0 if the customer is retained

## Introduction
The main goal of this notebook will be exploring the data and predicting the fact whether the customer left the bank using Artifcial neural network and also how to perform hyperparameter optimization using Keras Tuner. I have also performed Exploratory Data Analysis and feature engineering.

## Importing Necesaary Libraries

In [None]:
# Import Libraries
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import seaborn as sns
import plotly.graph_objs as px
import plotly.express as ex
from plotly.subplots import make_subplots
from matplotlib import pyplot as plt
%matplotlib inline

# Importing the Keras libraries and packages
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LeakyReLU,PReLU,ELU
from keras.layers import Dropout
from tensorflow.keras import layers
from kerastuner.tuners import RandomSearch

# Feature Scaling and metrics
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score

In [None]:
# Read the csv file
data = pd.read_csv('/kaggle/input/churn-modelling/Churn_Modelling.csv')
data.head()

## Exploring the Dataset
<br>
In this step we are going to get the statistics and summary of the data which includes following steps.
<br>
<br>
<li>Shape of the raw dataset.
<li>Duplicate and null values if any.
<li>Dropping the column which will not make big impact on dependent variables.
<li>Detecting the outliers and removing the outliers data.

In [None]:
# Shape of the raw data
print("Shape of Data raw data: {}".format(data.shape))

> There are 10000 rows and 14 columns in the raw dataset.

In [None]:
data.info()

In [None]:
# Checking the duplicate value
data.duplicated().sum()

In [None]:
# Checking the null values
data.isnull().sum()

In [None]:
# Checking the null values with heatmap
sns.heatmap(data.isnull(),yticklabels = False,cbar = False,cmap = 'viridis')

> There is no duplicate and missing values in the dataset.

In [None]:
# Dropping the column which will not make big impact on dependent variables
data.drop(['RowNumber', 'CustomerId', 'Surname'], axis = 1, inplace=True)

> Dropping RowNumber, CustomerId and Surname column from the dataset since it will not make big impact on dependent variable.

In [None]:
# Stastical analysis of the data
data.describe()

In [None]:
# Detecting the ouliers
def detect_outlier(data):
    outlier = []
    threshold = 3
    mean = np.mean(data)
    std = np.std(data)
    for i in data:
        z_score = (i - mean)/std
        if np.abs(z_score)>threshold:
            outlier.append(i)
    return outlier

In [None]:
CreditScore_list = data['CreditScore'].tolist()
Balance_list = data['Balance'].tolist()
EstimatedSalary_list = data['EstimatedSalary'].tolist()

In [None]:
CreditScore_outlier = detect_outlier(CreditScore_list)
CreditScore_outlier

In [None]:
Balance_outlier = detect_outlier(Balance_list)
Balance_outlier

In [None]:
EstimatedSalary_outlier = detect_outlier(EstimatedSalary_list)
EstimatedSalary_outlier

In [None]:
# Shape of Data before removing the outliers
print("Shape of Data before removing outliers: {}".format(data.shape))

In [None]:
# Removing the outliers
data.drop(data[data['CreditScore'] <= 359].index, inplace = True)

In [None]:
#Shape of Data after removing the outliers
print("Shape of Data after removing outliers: {}".format(data.shape))

## Exploratory Data Analysis
<br>
In this step we are going to plot graphs to get a deeper insights of the data. Also we are going to illustrate the relationship between different variables using visualization tools.
<li>Proportion of churn vs not churns customers
<li>Customer churn vs not churns against Gender
<li>Customer churn vs not churns against country
<li>Credit card usage according to country
<li>Credit card usage according to gender
<li>Country with highest credit score
<li>Country with highest Estimated salary
<li>Distribution of Credit score of customer
<li>Distribution of Customer Age

In [None]:
plt.figure(figsize =(7,5))
sns.set_style('darkgrid')
sns.countplot(x = 'Exited', data = data)
plt.title('Exited(No/Yes)')
data['Exited'].value_counts()

In [None]:
plt.pie(data['Exited'].value_counts(), labels = ['No', 'Yes'], shadow = True, autopct = '%1.2f%%');

> As we can see from the above graphs, 20.31% of data samples represent the churn customers and 79.69% customer represent the not churn customers.

In [None]:
plt.figure(figsize =(7,5))
sns.set_style('darkgrid')
sns.countplot(x = 'Exited', hue = 'Gender', data = data)
plt.title('Exited against Gender');
pd.DataFrame(data.groupby(['Gender', 'Exited'])['Exited'].count())

In [None]:
plt.figure(figsize =(7,5))
sns.set_style('darkgrid')
sns.countplot(x = 'Exited', hue = 'Geography', data = data)
plt.title('Exited against Geography');
pd.DataFrame(data.groupby(['Geography', 'Exited'])['Exited'].count())

In [None]:
plt.figure(figsize =(7,5))
sns.set_style('darkgrid')
sns.countplot(x = 'HasCrCard', hue = 'Geography', data = data)
plt.title('Credit card used against Geography');
pd.DataFrame(data.groupby(['Geography', 'HasCrCard'])['HasCrCard'].count())

In [None]:
plt.figure(figsize =(7,5))
sns.set_style('darkgrid')
sns.countplot(x = 'HasCrCard', hue = 'Gender', data = data)
plt.title('Credit card used against Gender');
pd.DataFrame(data.groupby(['Gender', 'HasCrCard'])['HasCrCard'].count())

> <li>Proportionally more female customers are exited from the bank as comapred to male customers.
> <li>Proportionally more German customers are exited from the bank as comapred to other two countries.
> <li>More number of customers from France have credit card as comapred to other two countries.
> <li>Proportionality of customer having credit card against gender is almost equal.

In [None]:
fig = ex.box(data, x="Exited", y="CreditScore", color = 'Geography')
fig.update_layout(title_text="Different Country with mean Credit scores(Exited(No/Yes))")
fig.show();

In [None]:
fig = ex.box(data, x="Exited", y="EstimatedSalary", color = 'Geography')
fig.update_layout(title_text="Different Country with mean salary(Exited(No/Yes))")
fig.show();

> Credit score and Estimated salary does not effect much on exit rates. German People exited more and having high credit score as compared to other two countries.

In [None]:
fig = make_subplots(rows=1, cols=1)

hist=px.Histogram(x=data['CreditScore'],name='Credit Score Histogram')

fig.add_trace(hist,row=1,col=1)

fig.update_layout(height=500, width=700, title_text="Distribution of the Credit score")
fig.show()

In [None]:
fig = make_subplots(rows=1, cols=1)

hist=px.Histogram(x=data['Age'],name='Age Histogram')

fig.add_trace(hist,row=1,col=1)

fig.update_layout(height=500, width=700, title_text="Distribution of the Customer Ages")
fig.show()

> We can see that distribution of customer ages and credit score in our dataset follows a fairly normal distribution; thus we can use these features with the normality assumption.

In [None]:
# Correlation between the features using heatmap
corrmat = data.corr()
plt.figure(figsize=(10,8))
#plot heat map
sns.heatmap(corrmat, annot=True, cmap="RdYlGn")

## Data Preprocessing
<li>Splitting the dataset into dependent and independent variables.
<li>Creating dummy variables of categorical column using OneHotEncoding.
<li>Determining train and test set.
<li>Feature scaling using standard scaler

In [None]:
# Split the Dataset
X= data.drop(['Exited'], axis = 1)
y = data['Exited']

In [None]:
# Creating dummy variables
Dummies = pd.get_dummies(X[['Geography', 'Gender']],drop_first=True)
X = X.drop(['Geography', 'Gender'], axis = 1)
X = pd.concat([X, Dummies], axis = 1)

In [None]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

# Feature Scaling using standard scaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## Data Modelling
### Artificial Neural Network (ANN)

In [None]:
# Initialising the ANN
classifier = Sequential()

# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 16, kernel_initializer = 'he_uniform',activation='relu',input_dim = 11))

# Adding the second hidden layer
classifier.add(Dense(units = 8, kernel_initializer = 'he_uniform',activation='relu'))

# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'glorot_uniform', activation = 'sigmoid'))

# Compiling the ANN
classifier.compile(optimizer = 'Adamax', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Fitting the ANN to the Training set
model_history=classifier.fit(X_train, y_train,validation_split=0.33, batch_size = 128, epochs = 100, verbose=0)

In [None]:
# summarize history for accuracy
plt.plot(model_history.history['accuracy'])
plt.plot(model_history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# summarize history for loss
plt.plot(model_history.history['loss'])
plt.plot(model_history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

In [None]:
# Part 3 - Making the predictions and evaluating the model

# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)

# Making the classification report
print(classification_report(y_test, y_pred))

# Making the Confusion Matrix
print(confusion_matrix(y_test, y_pred))

# Calculate the Accuracy
print(accuracy_score(y_test, y_pred))

## Hyperparameter Optimization using keras tuner
<li>How many number of hidden layers we should have?
<li>How many number of neurons we should have in each hidden layers?
<li>How can be use different dropout in case of overfitting?
<li>Learning Rate

In [None]:
def build_model(hp):
    
    model = keras.Sequential()
    counter = 0
    
    for i in range(hp.Int('num_layers',min_value=1,max_value=10)):
        if counter == 0:
            model.add(layers.Dense(hp.Int('units_' + str(i),
                                min_value=8,
                                max_value=128,
                                step=8), kernel_initializer = 'he_uniform', activation='relu',input_dim = 11))
            model.add(Dropout(hp.Choice('dropout' + str(i), values=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9])))
        else:
            model.add(layers.Dense(units=hp.Int('units_' + str(i),
                                            min_value=8,
                                            max_value=128,
                                            step=8),
                               activation='relu', kernel_initializer = 'he_uniform'))
            model.add(Dropout(hp.Choice('dropout' + str(i), values=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9])))
        counter+=1
    
    # Adding the output layer
    model.add(layers.Dense(1, activation='sigmoid', kernel_initializer = 'glorot_uniform'))
    # Compiling the ANN
    model.compile(
        optimizer=keras.optimizers.Adam(
            hp.Choice('learning_rate', [1e-2, 1e-3, 1e-4])),
        loss='binary_crossentropy',
        metrics=['accuracy'])
    return model        

In [None]:
tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    seed=42,
    max_trials=5,
    executions_per_trial=3,
    directory='project4',
    project_name='Churn modelling4')

In [None]:
tuner.search(X_train, y_train, epochs=5, batch_size=128, 
             validation_data=(X_test, y_test))

In [None]:
tuner.get_best_hyperparameters()[0].values

In [None]:
model = tuner.get_best_models(num_models=1)[0]

In [None]:
model.fit(X_train, y_train, epochs=200, initial_epoch=6, validation_data=(X_test,y_test), verbose = 0)

In [None]:
model.summary()

In [None]:
# Evaluate the best model.
loss, accuracy = model.evaluate(X_test, y_test)

> After performing hyperparameter tuning using keras tuner accuracy, somewhat increased to **0.8449** from **0.8669**. Overall, the Keras Tuner library is a nice and easy to learn option to perform hyperparameter tuning for your Keras and Tensorflow 2.0 models.