# Introduction

+ A churn model is a mathematical representation of how churn impacts your business. Churn calculations are built on existing data (the number of customers who left your service during a given time period). A predictive churn model extrapolates on this data to show future potential churn rates.

+ Churn (aka customer attrition) is a scourge on subscription businesses. When your revenue is based on recurring monthly or annual contracts, every customer who leaves puts a dent in your cash flow. High retention rates are vital for your survival. So what if we told you there was a way to predict, at least to some degree, how and when your customers will cancel?

+ Building a predictive churn model helps you make proactive changes to your retention efforts that drive down churn rates. Understanding how churn impacts your current revenue goals and making predictions about how to manage those issues in the future also helps you stem the flow of churned customers. If you don’t take action against your churn now, any company growth you experience simply won’t be sustainable.

+ Comprehensive customer profiles help you see what types of customers are canceling their accounts. Now it’s time to figure out how and why they’re churning. Ask yourself the following questions to learn more about the pain points in your product and customer experience that lead to a customer deciding to churn.

# What is customer churn?

+ Customer churn (or customer attrition) is a tendency of customers to abandon a brand and stop being a paying client of a particular business. The percentage of customers that discontinue using a company’s products or services during a particular time period is called a customer churn (attrition) rate. One of the ways to calculate a churn rate is to divide the number of customers lost during a given time interval by the number of acquired customers, and then multiply that number by 100 percent. For example, if you got 150 customers and lost three last month, then your monthly churn rate is 2 percent.

+ Churn rate is a health indicator for businesses whose customers are subscribers and paying for services on a recurring basis, thus, a customer stays open for more interesting or advantageous offers. Plus, each time their current commitment ends, customers have a chance to reconsider and choose not to continue with the company. Of course, some natural churn is inevitable, and the figure differs from industry to industry. But having a higher churn figure than that is a definite sign that a business is doing something wrong.”

+ There are many things brands may do wrong, from complicated onboarding when customers aren’t given easy-to-understand information about product usage and its capabilities to poor communication, e.g. the lack of feedback or delayed answers to queries. Another situation: Longtime clients may feel unappreciated because they don’t get as many bonuses as the new ones.

# Step 1: Importing the Data and Libraries

The first step, as always, is to import the required libraries. Execute the following code to do so:

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
data = pd.read_csv('../input/credit-card-customers/BankChurners.csv')

In [None]:
data.isnull().sum()

LUCKY! NO need to handle with NaN or missing data.

In [None]:
data.columns

In [None]:
data.head(2)

Columns title is TO LONG, let's rename it.

In [None]:
old_names = data.columns
new_names = ['Clientnum', 'Attrition', 'Age', 'Gender', 'Dependent_count', 'Education', 'Marital_Status', 'Income', 
             'Card_Category', 'Months_on_book', 'Total_Relationship_Count', 'Months_Inactive', 'Contacts_Count', 
             'Credit_Limit', 'Total_Revolving_Bal','Avg_Open_To_Buy', 'Total_Amt_Chng_Q4_Q1', 'Total_Trans_Amt',
             'Total_Trans_Ct', 'Total_Ct_Chng_Q4_Q1', 'Avg_Utilization_Ratio','Naive_Bayes_1','Naive_Bayes_2']
data.rename(columns=dict(zip(old_names, new_names)), inplace=True)

# Step 2. Exploratory data analysis

After preprocessing the data, it is analyzed through visual exploration to gather insights about the model that can be applied to the data, understand the diversity in the data and the range of every field. We use a bar chart, box plot, distribution graph, etc. to explore each feature varies and its relation with other features including the target feature.

Import basic Librairies.

In [None]:
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.style as style
import numpy as np
import pandas as pd
import plotly.express as px
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings("ignore")

In [None]:
data.head(2)

Check to confirm if the Database is IMBALANCED? How?

In [None]:
#Churn vs. normal 
counts = data.Attrition.value_counts()
normal = counts[0]
Churn = counts[1]
perc_normal = (normal/(normal+Churn))*100
perc_Churn = (Churn/(normal+Churn))*100
print('There were {} non-Churn ({:.3f}%) and {} Churn ({:.3f}%).'.format(normal, perc_normal, Churn, perc_Churn))

The Dataset is IMBALANCED, but NOT typical, since I expected for this type should be something like: 98:2 or 99:1, NOT such 84:16 :). But, it's OK, let's dicover some illustrations.

In [None]:
style.use('ggplot')
sns.set_style('whitegrid')
plt.subplots(figsize = (30,30))
## Plotting heatmap. Generate a mask for the upper triangle (taken from seaborn example gallery)
mask = np.zeros_like(data.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
sns.heatmap(data.corr(), cmap=sns.diverging_palette(20, 220, n=200), annot=True, mask=mask, center = 0, );
plt.title("Heatmap of all the Features of Train data set", fontsize = 25);

In [None]:
data.columns

In [None]:
#visualizing the features whigh positive and negative correlation
f, axes = plt.subplots(nrows=3, ncols=3, figsize=(25,15))

f.suptitle('Features With High Negative Correlation', size=35)
sns.boxplot(x="Attrition", y="Total_Relationship_Count", data=data, ax=axes[0,0])
sns.boxplot(x="Attrition", y="Total_Revolving_Bal", data=data, ax=axes[0,1])
sns.boxplot(x="Attrition", y="Total_Amt_Chng_Q4_Q1", data=data, ax=axes[0,2])
sns.boxplot(x="Attrition", y="Total_Trans_Amt", data=data, ax=axes[1,0])
sns.boxplot(x="Attrition", y="Total_Trans_Ct", data=data, ax=axes[1,1])
sns.boxplot(x="Attrition", y="Total_Ct_Chng_Q4_Q1", data=data, ax=axes[1,2])
sns.boxplot(x="Attrition", y="Avg_Utilization_Ratio", data=data, ax=axes[2,0])
sns.boxplot(x="Attrition", y="Months_Inactive", data=data, ax=axes[2,1])
sns.boxplot(x="Attrition", y="Contacts_Count", data=data, ax=axes[2,2])

In [None]:
yprop = 'Age'
xprop = 'Months_Inactive'
h= 'Attrition'
px.scatter(data, x=xprop, y=yprop, color=h, marginal_y="violin", marginal_x="box", trendline="ols", template="simple_white")

In [None]:
yprop = 'Months_on_book'
xprop = 'Dependent_count'
h= 'Attrition'
px.scatter(data, x=xprop, y=yprop, color=h, marginal_y="violin", marginal_x="box", trendline="ols", template="simple_white")

In [None]:
fig = px.box(data, x='Gender',y='Avg_Utilization_Ratio', color='Attrition', notched=True)
fig.update_layout(legend=dict(orientation="h",yanchor="bottom",y=1.02,xanchor="right",x=1))
fig.show()

In [None]:
yprop = 'Total_Amt_Chng_Q4_Q1'
xprop = 'Total_Trans_Amt'
h= 'Attrition'
px.scatter(data, x=xprop, y=yprop, color=h, marginal_y="violin", marginal_x="box", trendline="ols", template="simple_white")

In [None]:
plt.figure(figsize=(15,10))
xprop = 'Gender'
yprop = 'Total_Trans_Amt'
sns.boxplot(data=data, x=xprop, y=yprop, hue='Attrition')
plt.xlabel('{} range'.format(xprop), size=14)
plt.ylabel('{}'.format(yprop), size=14)
plt.title('Boxplot of {}'.format(yprop), size=20)
plt.show()

In [None]:
prop = 'Avg_Utilization_Ratio'
xprop = 'Total_Trans_Ct'
h= 'Attrition'
px.scatter(data, x=xprop, y=yprop, color=h, marginal_y="violin", marginal_x="box", trendline="ols", template="simple_white")

In [None]:
yprop = 'Total_Revolving_Bal'
xprop = 'Credit_Limit'
h= 'Attrition'
px.scatter(data, x=xprop, y=yprop, color=h, marginal_y="violin", marginal_x="box", trendline="ols", template="simple_white")

Overally, it's NOT EASY to conclude something from these illustrations, OR, the analysis is not deep ENOUGH !

But YES, features have not strong correlation with each other.

# Step 3: Converting Categorical Columns to Numeric Columns

Machine learning algorithms work best with numerical data. However, in our dataset, we have some categorical columns. These columns contain data in textual format; we need to convert them to numeric columns.

Firsly, remove the Id column and rearrange the columns.

In [None]:
data = pd.DataFrame(data, columns = ['Age', 'Gender', 'Dependent_count','Education', 'Marital_Status', 
                                     'Income', 'Card_Category','Months_on_book', 'Total_Relationship_Count', 
                                     'Months_Inactive','Contacts_Count', 'Credit_Limit', 'Total_Revolving_Bal',
                                     'Avg_Open_To_Buy', 'Total_Amt_Chng_Q4_Q1', 'Total_Trans_Amt',
                                     'Total_Trans_Ct', 'Total_Ct_Chng_Q4_Q1', 'Avg_Utilization_Ratio',
                                     'Naive_Bayes_1', 'Naive_Bayes_2','Attrition'])

Let's copy the Data and Str Map the NON OBJECT columns.

In [None]:
import copy
df_train=copy.deepcopy(data)
cols=np.array(data.columns[data.dtypes != object])
for i in df_train.columns:
    if i not in cols:
        df_train[i]=df_train[i].map(str)
df_train.drop(columns=cols,inplace=True)

And then, coding the categorical parameters using LabelEncoder.

In [None]:
from sklearn.preprocessing import LabelEncoder
from collections import defaultdict

# build dictionary function
cols=np.array(data.columns[data.dtypes != object])
d = defaultdict(LabelEncoder)

# only for categorical columns apply dictionary by calling fit_transform 
df_train = df_train.apply(lambda x: d[x.name].fit_transform(x))
df_train[cols] = data[cols]

And check the conversion...

In [None]:
df_train.head(2)

In [None]:
df_train.columns

# Step 4: Feature Selection

Our data is now ready, and we can train our machine learning model. But first, we need to isolate the variable that we're predicting from the dataset.

In [None]:
df_train.columns

# The following columns/features can be split up in the following groups:       

+ 'Clientium' : Unique identifier for the customer holding the account: we remove this column
+ 'Attrition': Internal event (customer activity) variable - if the account is closed then 1 else 0: this is our target
+ 'Age': Customer's Age in Years.
+ 'Gender': M=Male, F=Female.
+ 'Dependent_count': Number of dependents.
+ 'Education': Educational Qualification of the account holder.
+ 'Marital_Status': Married, Single, Divorced, Unknown.
+ 'Income': Annual Income Category of the account holder
+ 'Card_Category': Type of Card (Blue, Silver, Gold, Platinum).
+ 'Months_on_book': Period of relationship with bank.
+ 'Total_Relationship_Count': Total no. of products held by the customer.
+ 'Months_Inactive': No. of Months in the last 12 months.
+ 'Contacts_Count': No. of Contacts in the last 12 months.
+ 'Credit_Limit': Credit Limit on the Credit Card.
+ 'Total_Revolving_Bal': Total Revolving Balance on the Credit Card.
+ 'Avg_Open_To_Buy': Open to Buy Credit Line (Average of last 12 months
+ 'Total_Amt_Chng_Q4_Q1': Change in Transaction Amount (Q4 over Q1).
+ 'Total_Trans_Amt': Total Transaction Amount (Last 12 months).
+ 'Total_Trans_Ct': Total Transaction Count (Last 12 months).
+ 'Total_Ct_Chng_Q4_Q1': Change in Transaction Count (Q4 over Q1).
+ 'Avg_Utilization_Ratio': Average Card Utilization Ratio.

AS NOTICED:

+ Naive_Bayes: It was mentioned that all columns containing the "N.B."-tag should be disregarded: we remove these 2 columns

In [None]:
df_train = pd.DataFrame(df_train, columns = ['Age', 'Gender', 'Dependent_count','Education', 'Marital_Status', 'Income', 
                                             'Card_Category','Months_on_book', 'Total_Relationship_Count', 'Months_Inactive',
                                             'Contacts_Count', 'Credit_Limit', 'Total_Revolving_Bal','Avg_Open_To_Buy', 
                                             'Total_Amt_Chng_Q4_Q1', 'Total_Trans_Amt','Total_Trans_Ct', 'Total_Ct_Chng_Q4_Q1',
                                             'Avg_Utilization_Ratio','Attrition'])

In [None]:
len(df_train.columns)

In [None]:
# Find most important features relative to target Price
print("Find most important features relative to Attrition-target")
corr = df_train.corr()
corr.sort_values(["Attrition"], ascending = False, inplace = True)
print(corr.Attrition)

In [None]:
df_train.describe()

In [None]:
style.use('ggplot')
sns.set_style('whitegrid')
plt.subplots(figsize = (30,30))
## Plotting heatmap. Generate a mask for the upper triangle (taken from seaborn example gallery)
mask = np.zeros_like(df_train.corr(), dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
sns.heatmap(df_train.corr(), cmap=sns.diverging_palette(20, 220, n=200), annot=True, mask=mask, center = 0, );
plt.title("Heatmap of all the Features of Train data set", fontsize = 25);

OK, let's remove some columns with very low correlation: "Avg_Open_To_Buy"

In [None]:
features = ['Age', 'Gender', 'Dependent_count', 'Education','Marital_Status', 'Income', 'Card_Category','Months_on_book',
            'Total_Relationship_Count','Months_Inactive','Contacts_Count', 'Credit_Limit', 'Total_Revolving_Bal','Total_Amt_Chng_Q4_Q1', 
            'Total_Trans_Amt','Total_Trans_Ct','Total_Ct_Chng_Q4_Q1','Avg_Utilization_Ratio','Attrition']
df_train = pd.DataFrame(df_train, columns = features)

In [None]:
df_train.columns

# Step 5: Churn Prediction Model

In this section we present several regression models for the churn prediction, and compare their relative performances in terms of standard metrics such as precision, recall, specificity, and AUC. These performances also allow us to evaluate the effectiveness of the outlier removal, undersampling and ensemble techniques.

Now, import ML Librairie pakages

In [None]:
from collections import Counter
from imblearn.over_sampling import ADASYN
from imblearn.over_sampling import SMOTE
from lightgbm import LGBMClassifier
from numpy import where
from sklearn import metrics
from sklearn import model_selection
from sklearn import preprocessing
from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble.gradient_boosting import GradientBoostingClassifier
from sklearn.ensemble.forest import ExtraTreesClassifier
from sklearn.inspection import permutation_importance
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, f1_score, precision_score, recall_score, roc_auc_score, roc_curve
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler, Normalizer
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import DecisionTreeRegressor
from xgboost import XGBClassifier

# A. Sampling

As observed, the number of churners is usually much smaller than the number of non-churners, leading to an imbalanced training set. This issue was also present in our analysis, we obsed the Churners rate is upto 16%, which might be NOT MUCH imbalanced. In particular, the most effective method was undersampling, which outperformed the sampling techniques that replicates the rare class, since the latter ones resulted in over-fitting the rare class, thus degrading the model performance significantly. In particular, the undersampling technique significantly improved the results only if combined with the outlier removal. The best proportion between classes was chosen so to optimize the classifier performance on the test set.

We define the Data for Input purpose as following:

In [None]:
def Definedata():
    # define dataset
    X=df_train.drop(columns=['Attrition']).values
    y=df_train['Attrition'].values
    return X, y

Here, X is our feature set; it contains all the columns except the one that we have to predict (Exited). The label set, y, contains only the Exited column. So we can later evaluate the performance of our machine learning model, let's also divide the data into a training and test set. The training set contains the data that will be used to train our machine learning model. The test set will be used to evaluate how good our model is. 

# We'll use 33% of the data for the test set and the remaining 67% for the training set.

In [None]:
from sklearn.metrics import classification_report
def Models_NO(models, graph):
    
    model = models
    X, y = Definedata()
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 25)
    model.fit(X_train,y_train)
    y_pred = model.predict(X_test)
    y_total = model.predict(X)
    
    if graph:
        train_matrix = pd.crosstab(y_train, model.predict(X_train), rownames=['Actual'], colnames=['Predicted'])    
        test_matrix = pd.crosstab(y_test, model.predict(X_test), rownames=['Actual'], colnames=['Predicted'])
        matrix = pd.crosstab(y, model.predict(X), rownames=['Actual'], colnames=['Predicted'])
    
        f,(ax1,ax2,ax3) = plt.subplots(1,3,sharey=True, figsize=(15, 2))
    
        g1 = sns.heatmap(train_matrix, annot=True, fmt=".1f", cbar=False,annot_kws={"size": 18},ax=ax1)
        g1.set_title("{}/train set".format(model))
        g1.set_ylabel('Total Churn = {}'.format(1- y_train.sum()), fontsize=14, rotation=90)
        g1.set_xlabel('Accuracy for TrainSet: {}'.format(accuracy_score(model.predict(X_train), y_train)))
        g1.set_xticklabels(['Churn','Not Churn'],fontsize=12)

        g2 = sns.heatmap(test_matrix, annot=True, fmt=".1f",cbar=False,annot_kws={"size": 18},ax=ax2)
        g2.set_title("{}/test set".format(model))
        g2.set_ylabel('Total Churn = {}'.format(1- y_test.sum()), fontsize=14, rotation=90)
        g2.set_xlabel('Accuracy for TestSet: {}'.format(accuracy_score(y_pred, y_test)))
        g2.set_xticklabels(['Churn','Not Churn'],fontsize=12)

        g3 = sns.heatmap(matrix, annot=True, fmt=".1f",cbar=False,annot_kws={"size": 18},ax=ax3)
        g3.set_title("{}/total set".format(model))
        g3.set_ylabel('Total Churn = {}'.format(1- y.sum()), fontsize=14, rotation=90)
        g3.set_xlabel('Accuracy for TotalSet: {}'.format(accuracy_score(y_total, y)))
        g3.set_xticklabels(['Churn','Not Churn'],fontsize=12)
    
        plt.show()
        print ("")
        print ("Classification Report: ")
        print (classification_report(y, y_total))
    else:
        print("\t\tError Table")
        print('Mean Absolute Error      : ', metrics.mean_absolute_error(y_test, (y_pred)))
        print('Mean Squared  Error      : ', metrics.mean_squared_error(y_test, (y_pred) ))
        print('Root Mean Squared  Error : ', np.sqrt(metrics.mean_squared_error(y_test, (y_pred) )))
        print('Accuracy on Traing set   : ', model.score(X_train,y_train))
        print('Accuracy on Testing set  : ', model.score(X_test,y_test))
        print('AUC score                :', roc_auc_score(y, y_total)*100,'%')        
    return y_total, y

# B. Machine Learning Algorithm Training

Now, we'll use a machine learning algorithm that will identify patterns or trends in the training data. This step is known as algorithm training. We'll feed the features and correct output to the algorithm; based on that data, the algorithm will learn to find associations between the features and outputs. After training the algorithm, you'll be able to use it to make predictions on new data.

There are several machine learning algorithms that can be used to make such predictions. In this work, we'll try some ALGORITHMS to select one of the most powerful algorithms for classification problems.

To train this algorithm, we call the fit method and pass in the feature set (X) and the corresponding label set (y). You can then use the predict method to make predictions on the test set. Look at the following script:

In [None]:
y_predict, y_test = Models_NO(DecisionTreeRegressor(), True)

We could confirm that the data is IMBALANCED and the regressor can not handle this data set. Let start the second step.

In [None]:
from sklearn.metrics import confusion_matrix,auc,roc_curve

y_predicted, y_actual = Models_NO(DecisionTreeRegressor(), False)
fpr, tpr, thresholds = roc_curve(y_actual, y_predicted)
roc_auc = auc(fpr, tpr)

plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b',label='AUC = %0.3f'% roc_auc)
plt.legend(loc='lower right')
plt.plot([0,1],[0,1],'r--')
plt.xlim([-0.1,1.0])
plt.ylim([-0.1,1.01])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

# C. SMOTE sampling

In [None]:
def SMOTE():
    # borderline-SMOTE for imbalanced dataset
    from collections import Counter
    from sklearn.model_selection import train_test_split
    from sklearn.datasets import make_classification
    from imblearn.over_sampling import SMOTE
    from matplotlib import pyplot
    from numpy import where
    
    X, y = Definedata()

# summarize class distribution
    counter = Counter(y)
    print(counter)
# transform the dataset
    smt = SMOTE(random_state=0)
    X, y = smt.fit_sample(X, y) 
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=2)
# summarize the new class distribution
    counter = Counter(y)
    print(counter)
# scatter plot of examples by class label
    for label, _ in counter.items():
        row_ix = where(y == label)[0]
        pyplot.scatter(X[row_ix, 0], X[row_ix, 1], label=str(label))
    pyplot.legend()
    pyplot.show()
    return X_train, X_test, y_train, y_test

In [None]:
def ADASYN():
    from collections import Counter
    from sklearn.model_selection import train_test_split
    from imblearn.over_sampling import ADASYN
    from matplotlib import pyplot
    from numpy import where

    X, y = Definedata()

# summarize class distribution
    counter = Counter(y)
    print(counter)
# transform the dataset
    X, y = ADASYN().fit_resample(X, y)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=2)
# summarize the new class distribution
    counter = Counter(y)
    print(counter)
# scatter plot of examples by class label
    for label, _ in counter.items():
        row_ix = where(y == label)[0]
        pyplot.scatter(X[row_ix, 0], X[row_ix, 1], label=str(label))
    pyplot.legend()
    pyplot.show()
    return X_train, X_test, y_train, y_test

In [None]:
%time X_train1, X_test1, y_train1, y_test1 = SMOTE()
%time X_train4, X_test4, y_train4, y_test4 = ADASYN()

In [None]:
def Models(models, X_train, X_test, y_train, y_test, title, graph):
    model = models
    model.fit(X_train,y_train)
    
    X, y = Definedata()
    train_matrix = pd.crosstab(y_train, model.predict(X_train), rownames=['Actual'], colnames=['Predicted'])    
    test_matrix = pd.crosstab(y_test, model.predict(X_test), rownames=['Actual'], colnames=['Predicted'])
    matrix = pd.crosstab(y, model.predict(X), rownames=['Actual'], colnames=['Predicted'])
    
    if graph:
        f,(ax1,ax2,ax3) = plt.subplots(1,3,sharey=True, figsize=(15, 2))
    
        g1 = sns.heatmap(train_matrix, annot=True, fmt=".1f", cbar=False,annot_kws={"size": 18},ax=ax1)
        g1.set_title(title)
        g1.set_ylabel('Total Churn = {}'.format(y_train.sum()), fontsize=14, rotation=90)
        g1.set_xlabel('Accuracy score (TrainSet): {}'.format(accuracy_score(model.predict(X_train), y_train)))
        g1.set_xticklabels(['Churn','Not Churn'],fontsize=12)

        g2 = sns.heatmap(test_matrix, annot=True, fmt=".1f",cbar=False,annot_kws={"size": 18},ax=ax2)
        g2.set_title(title)
        g2.set_ylabel('Total Churn = {}'.format(y_test.sum()), fontsize=14, rotation=90)
        g2.set_xlabel('Accuracy score (TestSet): {}'.format(accuracy_score(model.predict(X_test), y_test)))
        g2.set_xticklabels(['Churn','Not Churn'],fontsize=12)

        g3 = sns.heatmap(matrix, annot=True, fmt=".1f",cbar=False,annot_kws={"size": 18},ax=ax3)
        g3.set_title(title)
        g3.set_ylabel('Total Churn = {}'.format(y.sum()), fontsize=14, rotation=90)
        g3.set_xlabel('Accuracy score (Total): {}'.format(accuracy_score(model.predict(X), y)))
        g3.set_xticklabels(['Churn','Not Churn'],fontsize=12)

        plt.show()

    print("\t\tError Table")
    print('Accuracy on Traing set   : ', model.score(X_train,y_train))
    print('Accuracy on Testing set  : ', model.score(X_test,y_test))
    print('Overall Accuracy_Score   :',accuracy_score(y, model.predict(X))*100,'%')
    print('Recall ratio             :',metrics.recall_score(y, model.predict(X))*100,'%')
    print('AUC score                :', roc_auc_score(y, model.predict(X))*100,'%')

    return y, model.predict(X)

In [None]:
title = 'KNeighborsClassifier/SMOTE'
%time y_actual,y_predicted = Models(KNeighborsClassifier(n_neighbors=1),X_train1, X_test1, y_train1, y_test1, title, True)

In [None]:
title = 'KNeighborsClassifier/ ADASYN'
%time Models(KNeighborsClassifier(n_neighbors=1),X_train4, X_test4, y_train4, y_test4, title, True)

In [None]:
title = 'DecisionTreeClassifier/SMOTE'
%time y_actual,y_predicted = Models(DecisionTreeClassifier(max_depth=14),X_train1, X_test1, y_train1, y_test1, title, True)

In [None]:
title = 'DecisionTreeClassifier/ADASYN'
%time Models(DecisionTreeClassifier(max_depth=14),X_train4, X_test4, y_train4, y_test4, title, True)

In [None]:
title = 'RandomForestClassifier/SMOTE'
%time Models(RandomForestClassifier(),X_train1, X_test1, y_train1, y_test1, title, True)

In [None]:
title = 'RandomForestClassifier/ADASYN'
%time y_actual,y_predicted = Models(RandomForestClassifier(),X_train4, X_test4, y_train4, y_test4, title, True)

In [None]:
title = 'GradientBoostingClassifier/SMOTE'
%time Models(GradientBoostingClassifier(n_estimators=1500, learning_rate=1, max_features=10, max_depth=2, random_state=0),X_train1, X_test1, y_train1, y_test1, title, True)

In [None]:
title = 'GradientBoostingClassifier/ADASYN'
%time Models(GradientBoostingClassifier(n_estimators=500, learning_rate=1, max_features=2, max_depth=2, random_state=0),X_train4, X_test4, y_train4, y_test4, title, True)

In [None]:
title = 'XGBClassifier/SMOTE'
%time Models(XGBClassifier(colsample_bytree=0.9, learning_rate=0.2, max_depth=7),X_train1, X_test1, y_train1, y_test1, title, True)

In [None]:
title = 'XGBClassifier/ADASYN'
%time Models(XGBClassifier(colsample_bytree=0.9, learning_rate=0.2, max_depth=7),X_train4, X_test4, y_train4, y_test4, title, True)

In [None]:
title = 'LGBMClassifier/SMOTE'
%time Models(LGBMClassifier(max_depth=-1, random_state=20, silent=True, metric='None', n_jobs=5, n_estimators=1000),X_train1, X_test1, y_train1, y_test1, title, True)

In [None]:
title = 'LGBMClassifier/ADASYN'
%time Models(LGBMClassifier(),X_train4, X_test4, y_train4, y_test4, title, True)

In [None]:
title = 'LGBMClassifier/ADASYN'
%time Models(LGBMClassifier(max_depth=-1, random_state=310, silent=True, metric='None', n_jobs=15, n_estimators=2000),X_train4, X_test4, y_train4, y_test4, title, True)

In [None]:
title = 'GradientBoostingClassifier/SMOTE'
%time Models(GradientBoostingClassifier(),X_train1, X_test1, y_train1, y_test1, title, True)

In [None]:
title = 'GradientBoostingClassifier/ADASYN'
%time Models(GradientBoostingClassifier(),X_train4, X_test4, y_train4, y_test4, title, True)

In [None]:
title = 'GradientBoostingClassifier/ADASYN'
%time Models(GradientBoostingClassifier(learning_rate=0.005, n_estimators=1500,max_depth=20, min_samples_split=300, min_samples_leaf=60, subsample=0.85, random_state=10, max_features=7,warm_start=True),X_train4, X_test4, y_train4, y_test4, title, True)

In [None]:
title = 'ExtraTreesClassifier/SMOTE'
%time Models(ExtraTreesClassifier(),X_train1, X_test1, y_train1, y_test1, title, True)

In [None]:
title = 'ExtraTreesClassifier/SMOTE'
%time Models(ExtraTreesClassifier(),X_train1, X_test1, y_train1, y_test1, title, True)

In [None]:
title = 'ExtraTreesClassifier/ADASYN'
%time Models(ExtraTreesClassifier(),X_train4, X_test4, y_train4, y_test4, title, True)

In [None]:
title = 'ExtraTreesClassifier/ADASYN'
%time Models(ExtraTreesClassifier(n_estimators=305),X_train4, X_test4, y_train4, y_test4, title, True)

# For conclusion, ExtraTreesClassifier is the recommended model for this topic.
+ Model | Training set | Testing set | Overal |
+ KNeighborsClassifier: | 100% | 91.1% | 96.3% |
+ DecisionTreeClassifier: | 99.6% | 95.3% | 97.6% | 
+ RandomForestClassifier: | 100% | 97.9% | 98.9% | 
+ GradientBoostingClassifier: | 100% | 98.4% | 99.1% |
+ XGBClassifier: | 100% | 98.5% | 99.2% | 
+ LGBMClassifier: | 100% | 98.5% | 99.2% |
+ GradientBoostingClassifier: | 99.7% | 98.2% | 98.7% |
+ ExtraTreesClassifier: | 100% | 98.4% | 99.1% | ===> THIS IS THE BEST ONE

In [None]:
from sklearn.metrics import confusion_matrix,auc,roc_curve

title = 'RandomForestClassifier using the SMOTE'
y, ypred =  Models(RandomForestClassifier(),X_train1, X_test1, y_train1, y_test1, title, False)

fpr, tpr, thresholds = roc_curve(y, ypred)
roc_auc = auc(fpr, tpr)

plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b',label='AUC = %0.3f'% roc_auc)
plt.legend(loc='lower right')
plt.plot([0,1],[0,1],'r--')
plt.xlim([-0.1,1.0])
plt.ylim([-0.1,1.01])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

In [None]:
X=df_train.drop(columns=['Attrition'])
y=df_train['Attrition'].values
    
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 25)
    
model = ExtraTreesClassifier()
model.fit(X_train,y_train)
resultmymodel = permutation_importance(model, X_train, y_train, n_repeats=10,random_state=42, n_jobs=2)
sorted_idx = resultmymodel.importances_mean.argsort()

fig, ax = plt.subplots(figsize=(10,7))
ax.boxplot(resultmymodel.importances[sorted_idx].T,vert=False, labels=X_train.columns[sorted_idx])
ax.set_title("Permutation Importances (train set)")
fig.tight_layout()
plt.show()

In [None]:
#X=df_train.drop(columns=['Attrition'])
#y=df_train['Attrition'].values
    
#X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 25)
    
#model = ExtraTreesClassifier()
#model.fit(X_test,y_test)
resultmymodel = permutation_importance(model, X_test, y_test, n_repeats=10,random_state=42, n_jobs=2)
sorted_idx = resultmymodel.importances_mean.argsort()

fig, ax = plt.subplots(figsize=(10,7))
ax.boxplot(resultmymodel.importances[sorted_idx].T,vert=False, labels=X_test.columns[sorted_idx])
ax.set_title("Permutation Importances (test set)")
fig.tight_layout()
plt.show()

# Conclusion

Customer churn prediction is crucial to the long-term financial stability of a company. In this Notebook, we successfully created a machine learning model that's able to predict customer churn with an accuracy higher than 98%.