# Introduction

### Problem context
*A manager at the bank is disturbed with more and more customers leaving their credit card services. They would really appreciate if one could predict for them who is gonna get churned so they can proactively go to the customer to provide them better services and turn customers' decisions in the opposite direction.*

### Goal
This notebook aims to try to predict customers who are likely to get churned through Machine Learning algorithms such as Random Forest, AdaBoost and Gradient Boosting.

### Dataset
This dataset consists of 10,000 customers mentioning their age, salary, marital_status, credit card limit, credit card category, etc. There are nearly 18 features. It is an unbalanced dataset where only abouy 16% of customers who have churned.



## Importing Libraries

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns 
import matplotlib.ticker as mtick
import matplotlib.pyplot as plt
import plotly.graph_objs as go
import plotly.express as ex
import plotly.figure_factory as ff
import plotly.offline as offline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from imblearn.over_sampling import SMOTE 
from sklearn.metrics import accuracy_score
from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier

## Data Loading

We load the dataset and have a first look at it.

In [None]:
bank_data = pd.read_csv("../input/credit-card-customers/BankChurners.csv")

# Deleting columns that we do not need (according to "detail" of the data in kaggle.com)
bank_data = bank_data.drop(['Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2', 'Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1',"CLIENTNUM"], axis = 1)

bank_data.head()

In [None]:
bank_data.columns.values

In [None]:
bank_data.isnull().sum()

There are no null values, which is good!

## Converting Categorical Features

Next step we convert categorical features to numerical; I decided to use .map() instead of pd.get_dummies().
Also, I decided to group some features as equals, for example, in Marital_Status was either is Married (1) or not (0). Same with Education_Level.


In [None]:
# Make a copy of the original data to use the original one later
num_data = bank_data.copy()

num_data['Attrition_Flag'] = num_data['Attrition_Flag'].map({'Existing Customer':0, 'Attrited Customer':1})
num_data['Income_Category'] = num_data['Income_Category'].map({'Unknown':0, 'Less than $40K':0, '$40K - $60K':1, '$60K - $80K':2, '$80K - $120K':3, '$120K +':4})
num_data['Marital_Status'] = num_data['Marital_Status'].map({'Divorced':0, 'Unknown':0, 'Single':0, 'Married':1})
num_data['Education_Level'] = num_data['Education_Level'].map({'Unknown':0, 'Uneducated':0, 'High School':1, 'College':2,'Graduate':3,'Post-Graduate':4,'Doctorate':5})
num_data['Card_Category'] = num_data['Card_Category'].map({'Blue':0, 'Silver':1, 'Gold':2, 'Platinum':3})

#Just playing with different ways to do the same thing
num_data = pd.get_dummies(num_data) #Gender
num_data.head()

## Correlation Heatmap

Getting the correlation of target variable with all the other features.

In [None]:
plt.figure(figsize=(15,8))
num_data.corr()["Attrition_Flag"].sort_values(ascending=False).plot(kind="bar")

## Ploting Attrition_Flag

We ca clearly see in this plot the difference between Current Customers and Attritied Customers.

In [None]:
ax = sns.catplot(y="Attrition_Flag", kind="count", data=bank_data, height=2.6, aspect=2.5, orient='h')

## Attrited Customers vs Existing Customers

### By Marital Status

In [None]:
def barplot_percentages(feature, orient='v', axis_name="Percentage of customers"):
    ratios = pd.DataFrame()
    g = bank_data.groupby(feature)["Attrition_Flag"].value_counts().to_frame()
    g = g.rename({"Attrition_Flag": axis_name}, axis=1).reset_index()
    g[axis_name] = g[axis_name]/len(bank_data)
    if orient == 'v':
        ax = sns.barplot(x=feature, y= axis_name, hue='Attrition_Flag', data=g, orient=orient)
        ax.set_yticklabels(['{:,.0%}'.format(y) for y in ax.get_yticks()])
    else:
        ax = sns.barplot(x= axis_name, y=feature, hue='Attrition_Flag', data=g, orient=orient)
        ax.set_xticklabels(['{:,.0%}'.format(x) for x in ax.get_xticks()])
    ax.plot()
barplot_percentages("Marital_Status")

### By Dependant Count

In [None]:
def barplot_percentages(feature, orient='v', axis_name="Percentage of customers"):
    ratios = pd.DataFrame()
    g = bank_data.groupby(feature)["Attrition_Flag"].value_counts().to_frame()
    g = g.rename({"Attrition_Flag": axis_name}, axis=1).reset_index()
    g[axis_name] = g[axis_name]/len(bank_data)
    if orient == 'v':
        ax = sns.barplot(x=feature, y= axis_name, hue='Attrition_Flag', data=g, orient=orient)
        ax.set_yticklabels(['{:,.0%}'.format(y) for y in ax.get_yticks()])
    else:
        ax = sns.barplot(x= axis_name, y=feature, hue='Attrition_Flag', data=g, orient=orient)
        ax.set_xticklabels(['{:,.0%}'.format(x) for x in ax.get_xticks()])
    ax.plot()
barplot_percentages("Dependent_count")

### By Education Level

In [None]:
def barplot_percentages(feature, orient='v', axis_name="Percentage of customers"):
    ratios = pd.DataFrame()
    g = bank_data.groupby(feature)["Attrition_Flag"].value_counts().to_frame()
    g = g.rename({"Attrition_Flag": axis_name}, axis=1).reset_index()
    g[axis_name] = g[axis_name]/len(bank_data)
    if orient == 'v':
        ax = sns.barplot(x=feature, y= axis_name, hue='Attrition_Flag', data=g, orient=orient)
        ax.set_yticklabels(['{:,.0%}'.format(y) for y in ax.get_yticks()])
    else:
        ax = sns.barplot(x= axis_name, y=feature, hue='Attrition_Flag', data=g, orient=orient)
        ax.set_xticklabels(['{:,.0%}'.format(x) for x in ax.get_xticks()])
    ax.plot()
barplot_percentages("Education_Level")

## Correlation Heatmap 

In [None]:
corrs = num_data.corr()
figure = ff.create_annotated_heatmap(
    z=corrs.values,
    x=list(corrs.columns),
    y=list(corrs.index),
    annotation_text=corrs.round(2).values,
    showscale=True)
offline.iplot(figure,filename='corrheatmap')

## Features Importances

Next, I will plot the importance of every feature.

For this, I will first remove redundant columns; which means remove columns with 2 unique values.
Then will set Attrition_Flag as my Target Variable (y), to after that use Ranfom Forest to find this features importances.

In [None]:
# # Drop redundant columns and set target variable
# drop = ['Attrition_Flag', 'Gender_F']
# x, y = num_data.drop(drop,axis=1), num_data['Attrition_Flag']

# # Fit RandomForest Classifier
# clf = RandomForestClassifier(n_estimators=350, criterion='entropy', max_features='auto', random_state=1)
# clf = clf.fit(x, y)

# # Plot features importances
# imp = pd.Series(data=clf.feature_importances_, index=x.columns).sort_values(ascending=False)
# plt.figure(figsize=(10,12))
# plt.title("Feature importance")
# ax = sns.barplot(y=imp.index, x=imp.values, palette="Blues_d")

## Data Preprocessing

### Dividing Dataset
We divide into label and feature sets

In [None]:
X = num_data.drop(['Attrition_Flag'], axis = 1) # Features
Y = num_data['Attrition_Flag'] # Labels
print(type(X))
print(type(Y))
print(X.shape)
print(Y.shape)

### Normalizing
Next, we normalize numerical so that each feature has mean 0 and variance 1 using Standar Scaler.

In [None]:
feature_scaler = StandardScaler()
X_scaled = feature_scaler.fit_transform(X)

### Dividing Dataset 
Now we divide into training and test sets

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split( X_scaled, Y, test_size = 0.3, random_state = 100)

print(X_train.shape)
print(X_test.shape)

## Implementing Oversampling 
To balance the dataset we use SMOTE which stands for Synthetic Minority Oversampling Technique.

In [None]:
print("Number of observations in each class before oversampling (training data): \n", pd.Series(Y_train).value_counts())

smote = SMOTE(random_state = 101)
X_train,Y_train = smote.fit_sample(X_train,Y_train)

print("Number of observations in each class after oversampling (training data): \n", pd.Series(Y_train).value_counts())

# Random Forest Classifier

### Tuning Random Forest

Now, we will find the best parameters to run Random Forest and implementing cross-validation using Grid Search.

In the GridSearch parameters we use "recall" as in this case I will be trying to reduce False Negatives.


In [None]:
rfc = RandomForestClassifier(criterion='entropy', max_features='auto', random_state=1)
grid_param = {'n_estimators': [200, 250, 300, 350, 400, 450]}

gd_sr = GridSearchCV(estimator=rfc, param_grid=grid_param, scoring='recall', cv=5)

gd_sr.fit(X_train, Y_train)

best_parameters = gd_sr.best_params_
print(best_parameters)

best_result = gd_sr.best_score_ # Mean cross-validated score of the best_estimator
print(best_result)


### Building Random Forest

We build now our Random Forest using the tuned parameter (350 number of trees in the forest).
Also, we plot a Confusion Matrix to appreciate better the results.



In [None]:
rfc = RandomForestClassifier(n_estimators=350, criterion='entropy', max_features='auto', random_state=1)
rfc.fit(X_train,Y_train)

Y_pred = rfc.predict(X_test)

conf_mat = metrics.confusion_matrix(Y_test, Y_pred)
plt.figure(figsize=(8,6))
sns.heatmap(conf_mat,annot=True)
plt.title("Confusion_matrix")
plt.xlabel("Predicted Class")
plt.ylabel("Actual class")
plt.show()
print('Confusion matrix: \n', conf_mat)
print('TP: ', conf_mat[1,1])
print('TN: ', conf_mat[0,0])
print('FP: ', conf_mat[0,1])
print('FN: ', conf_mat[1,0])

In [None]:
print(metrics.classification_report(Y_test,rfc.predict(X_test)))
print('Accuracy_Score:',accuracy_score(Y_test,Y_pred)*100,'%')
print('Recall:',metrics.recall_score(Y_test,Y_pred)*100,'%')

### Finding Feature Importances

Now we look for the features with higher importance, to run a new Random Forest using only some of the most important ones.

In [None]:
featimp = pd.Series(rfc.feature_importances_, index=list(X)).sort_values(ascending=False)
print(featimp)

### Selecting features with higher significance

Now we will create a new subset (X1), using only some features, and run a new Random Forest to compare them.

In [None]:
X1 = num_data[['Total_Trans_Ct', 'Total_Trans_Amt', 'Total_Revolving_Bal', 'Total_Ct_Chng_Q4_Q1', 'Total_Relationship_Count',
                 'Months_Inactive_12_mon','Total_Amt_Chng_Q4_Q1']]

feature_scaler = StandardScaler()
X1_scaled = feature_scaler.fit_transform(X1)

X1_train, X1_test, Y1_train, Y1_test = train_test_split( X1_scaled, Y, test_size = 0.3, random_state = 100)

smote = SMOTE(random_state = 101)
X1_train,Y1_train = smote.fit_sample(X1_train,Y1_train)

rfc = RandomForestClassifier(n_estimators=350, criterion='entropy', max_features='auto', random_state=1)
rfc.fit(X1_train,Y1_train)

Y_pred = rfc.predict(X1_test)

conf_mat = metrics.confusion_matrix(Y1_test, Y_pred)
plt.figure(figsize=(8,6))
sns.heatmap(conf_mat,annot=True)
plt.title("Confusion_matrix")
plt.xlabel("Predicted Class")
plt.ylabel("Actual class")
plt.show()
print('Confusion matrix: \n', conf_mat)
print('TP: ', conf_mat[1,1])
print('TN: ', conf_mat[0,0])
print('FP: ', conf_mat[0,1])
print('FN: ', conf_mat[1,0])



In [None]:
print(metrics.classification_report(Y1_test,Y_pred))
print('Accuracy_Score:',accuracy_score(Y1_test,Y_pred)*100,'%')
print('Recall:',metrics.recall_score(Y1_test,Y_pred)*100,'%')

# AdaBoost Classifier


### Tuning AdaBoost

Same as with Random Forest, we will try to find the best parameters to run now AdaBoost, implementing cross-validation using Grid Search.

In [None]:
abc = AdaBoostClassifier(random_state=1)
grid_param = {'n_estimators': [5,10,20,30,40,50]}

gd_sr = GridSearchCV(estimator=abc, param_grid=grid_param, scoring='recall', cv=5)

gd_sr.fit(X_train, Y_train)

best_parameters = gd_sr.best_params_
print(best_parameters)

best_result = gd_sr.best_score_ # Mean cross-validated score of the best_estimator
print(best_result)


### Building AdaBoost

After running GridSearch we found that the ideal maximum number of estimators at which boosting is terminatedo is 50 so we now build our AdaBoost Classifier using the this parameter and then we plot a Confusion Matrix to appreciate better the results.



In [None]:
abc = AdaBoostClassifier(n_estimators=50, random_state=1)
abc.fit(X_train,Y_train)

Y_pred = abc.predict(X_test)

conf_mat = metrics.confusion_matrix(Y_test, Y_pred)
plt.figure(figsize=(8,6))
sns.heatmap(conf_mat,annot=True)
plt.title("Confusion_matrix")
plt.xlabel("Predicted Class")
plt.ylabel("Actual class")
plt.show()
print('Confusion matrix: \n', conf_mat)
print('TP: ', conf_mat[1,1])
print('TN: ', conf_mat[0,0])
print('FP: ', conf_mat[0,1])
print('FN: ', conf_mat[1,0])

In [None]:
print('Classification report: \n', metrics.classification_report(Y_test, Y_pred))
print('Accuracy_Score:',accuracy_score(Y_test, Y_pred)*100,'%')
print('Recall:',metrics.recall_score(Y_test, Y_pred)*100,'%')


### Feature Importances

In [None]:
featimp = pd.Series(abc.feature_importances_, index=list(X)).sort_values(ascending=False)
print(featimp)

# Gradient Boosting Classifier

### Tuning Gradient Boosting

Last, we do the same thing to find the best parameters to run now Gradient Boosting, implementing cross-validation using Grid Search.

In [None]:
gbc = GradientBoostingClassifier(random_state=1)
grid_param = {'n_estimators': [10,20,30,40,50], 'max_depth': [5,6,7,8,9,10,11,12], 'max_leaf_nodes': [8,12,16,20,24,28,32]}

gd_sr = GridSearchCV(estimator=gbc, param_grid=grid_param, scoring='recall', cv=5)

gd_sr.fit(X_train, Y_train)

best_parameters = gd_sr.best_params_
print(best_parameters)

best_result = gd_sr.best_score_ # Mean cross-validated score of the best_estimator
print(best_result)


### Building Gradient Boosting

Now we run Gradient Boosting Classifier using the tuned parameters which are n_estimators=40, max_depth=9 and max_leaf_nodes=32 and we plot a Confusion Matrix.



In [None]:
# Building Gradient Boost using the tuned parameter
gbc = GradientBoostingClassifier(n_estimators=40, max_depth=9, max_leaf_nodes=32, random_state=1)
gbc.fit(X_train,Y_train)

Y_pred = gbc.predict(X_test)
# print('Classification report: \n', metrics.classification_report(Y_test, Y_pred))

conf_mat = metrics.confusion_matrix(Y_test, Y_pred)
plt.figure(figsize=(8,6))
sns.heatmap(conf_mat,annot=True)
plt.title("Confusion_matrix")
plt.xlabel("Predicted Class")
plt.ylabel("Actual class")
plt.show()
print('Confusion matrix: \n', conf_mat)
print('TP: ', conf_mat[1,1])
print('TN: ', conf_mat[0,0])
print('FP: ', conf_mat[0,1])
print('FN: ', conf_mat[1,0])


In [None]:
print('Classification report: \n', metrics.classification_report(Y_test, Y_pred))
print('Accuracy_Score:',accuracy_score(Y_test, Y_pred)*100,'%')
print('Recall:',metrics.recall_score(Y_test, Y_pred)*100,'%')

### Feature Importances

In [None]:
featimp = pd.Series(gbc.feature_importances_, index=list(X)).sort_values(ascending=False)
print(featimp)

# Conclusion

I decided to work keeping the classifier with the highest possible recall score, this in order to get a minimum number of False Negatives, the reason of this is because those False Negatives are basically customers that our model is predicting that are still "Existing Customers" but the reality is that they are "Attrited Customers". 
In other words, this are customers that we think they still are with us, but the actually have already churned. In this kind of situations it is better to try to reduce this Falses Predictions, because the other ones, the False Positives, are customers we are not expecting to be with us and however they are, which means we have more customers than expected.
__________

We could see that AdaBoost Classifier was the one which has performed better; even though the accuracy score was a little lower than the other models it got a recall score of 90.82 (higher than all the other models) which helps in having only 48 False Negatives.

