In this notebook we will try to find the most important reasons that a customer would churn and also devise multiple models that would predict churning customers

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

In [None]:
data=pd.read_csv("/kaggle/input/credit-card-customers/BankChurners.csv")
data.head()

In [None]:
data=data.iloc[:,:-2]#deleting last two rows as mentioned in database
data.head()

# Exploratory Data Analysis

In [None]:
data[data.select_dtypes(['object']).columns] = data.select_dtypes(['object']).apply(lambda x: x.astype('category'))

In [None]:
data.info()

In [None]:
data.Attrition_Flag.value_counts()

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='Attrition_Flag', data=data, hue='Gender')
plt.title("Distribution of Gender Among Attrited and Existing Customers")
plt.show()

In [None]:
sns.countplot(x='Attrition_Flag', data=data, hue='Marital_Status')
plt.title("Distribution of Marital Status Among Attrited and Existing Customers")
plt.show()

In [None]:
sns.countplot(x='Attrition_Flag', data=data, hue='Income_Category')
plt.title("Distribution of Income Category Among Attrited and Existing Customers")
plt.show()

In [None]:
sns.countplot(x='Attrition_Flag', data=data, hue='Card_Category')
plt.title("Distribution of Card_Category Among Attrited and Existing Customers")
plt.show()

# Model Building

**Preprocessing**
 <br>Here the most important metric is recall since we want most of if not all the customers who want to churn so that the bank manager can propose plans to minimise the churning rate. 

In [None]:
from sklearn.model_selection import train_test_split
X=data.drop("Attrition_Flag",axis=1)
y=data.Attrition_Flag
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [None]:
X_train.shape

In [None]:
X_train_num=X_train[X_train.select_dtypes(['int64',"float64"]).columns]
X_test_num=X_test[X_test.select_dtypes(['int64',"float64"]).columns]

In [None]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X_train_num)
X_train_scaled=pd.DataFrame(scaler.transform(X_train_num),columns=X_train_num.columns,index=X_train_num.index)
X_test_scaled=pd.DataFrame(scaler.transform(X_test_num),columns=X_test_num.columns,index=X_test_num.index)

In [None]:
X_train_dum=pd.get_dummies(X_train[X_train.select_dtypes(['category']).columns],drop_first=True)
X_test_dum=pd.get_dummies(X_test[X_test.select_dtypes(['category']).columns],drop_first=True)

In [None]:
X_train_pd=pd.concat([X_train_scaled, X_train_dum], axis=1)
X_test_pd=pd.concat([X_test_scaled, X_test_dum], axis=1)

In [None]:
X_train_pd.drop("CLIENTNUM",axis=1,inplace=True)
X_test_pd.drop("CLIENTNUM",axis=1,inplace=True)

# **1. Logistic Regression**

In [None]:
from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression()
logmodel.fit(X_train_pd,y_train)
lrpredictions = logmodel.predict(X_test_pd)

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_test,lrpredictions))

# **2. Naive Bayes**

In [None]:
from sklearn.naive_bayes import MultinomialNB

nbmodel = MultinomialNB().fit(X_train_pd, y_train)
nbpredictions=nbmodel.predict(X_test_pd)

print(classification_report(y_test,nbpredictions))



# **3. KNN Classifier**

In [None]:
from sklearn.neighbors import KNeighborsClassifier
knnmodel = KNeighborsClassifier(n_neighbors=3)
knnmodel.fit(X_train_pd,y_train)
knnpredictions=knnmodel.predict(X_test_pd)
print(classification_report(y_test,knnpredictions))

# **4. Decision Tree Classifier**

In [None]:
from sklearn.tree import DecisionTreeClassifier
dtmodel = DecisionTreeClassifier(random_state=0)
dtmodel.fit(X_train_pd,y_train)
dtpredictions=dtmodel.predict(X_test_pd)
print(classification_report(y_test,dtpredictions))

# **5. Random Forest Classifier**

In [None]:
from sklearn.ensemble import RandomForestClassifier
rfmodel = RandomForestClassifier(random_state=0,max_depth=100,n_estimators=50)
rfmodel.fit(X_train_pd,y_train)
rfpredictions=rfmodel.predict(X_test_pd)
print(classification_report(y_test,rfpredictions))

# **6. Gradient Boosting Classifier**

In [None]:
from sklearn.ensemble import GradientBoostingClassifier
gbmodel = GradientBoostingClassifier(random_state=0)
gbmodel.fit(X_train_pd,y_train)
gbpredictions=gbmodel.predict(X_test_pd)
print(classification_report(y_test,gbpredictions))

# **7. Extreme Gradient Boosting Classifier(XGBoost)**

In [None]:
import xgboost as xgb
xgbmodel = xgb.XGBClassifier(random_state=0)
xgbmodel.fit(X_train_pd,y_train)
xgbpredictions=xgbmodel.predict(X_test_pd)
print(classification_report(y_test,xgbpredictions))

# 8. ANN

In [None]:
#dummyfying the target variable since Deep Learning doesn't accept categorical variables
y_train_dum=pd.get_dummies(y_train,drop_first=True)
y_test_dum=pd.get_dummies(y_test,drop_first=True)

from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state = 42)#performing SMOTE to resolve class imbalances
X_train_oversampled, y_train_oversampled = sm.fit_sample(X_train_pd, y_train_dum)
X_train_dumsmote = pd.DataFrame(X_train_oversampled, columns=X_train_pd.columns)

In [None]:
from keras.models import Sequential
from keras.layers import Dense,Dropout
from keras import regularizers
import keras

annmodel = Sequential()
annmodel.add(Dense(64, input_dim=32, activation='relu'))
annmodel.add(Dense(32,kernel_regularizer=regularizers.l2(0.01), activation='relu'))
annmodel.add(Dropout(0.1))
annmodel.add(Dense(1, activation='sigmoid'))

# compile the keras model
annmodel.compile(loss='binary_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=0.001), metrics=['accuracy'])
# fit the keras model on the dataset
annmodel.fit(X_train_dumsmote, y_train_oversampled, epochs=150, batch_size=32,verbose=0)

In [None]:
annpredictions=annmodel.predict(X_test_pd)
annpredictions=[1 if x>0.7 else 0 for x in annpredictions]
print(classification_report(y_test_dum,annpredictions))

Here we can see that even after performing various pre processing and adjusting various hyperparameters, the maximum recall and accuracy we are able to achieve is only 86% and 90% respectively. After a lot of adjustments, I have come to the conclusion that there isn't enough data to acheive better results than the XGBoost Algorithm here for deep learning to perform better. Hence, there seems to be no point going any further. If anyone has any better algorithms, please write it down in the comments and I'll try that out. 

# Feature Importances
Now Let's look at which features contributed most to the classification algorithm

In [None]:
feature_important = xgbmodel.get_booster().get_score(importance_type='weight')
keys = list(feature_important.keys())
values = list(feature_important.values())

data = pd.DataFrame(data=values, index=keys, columns=["score"]).sort_values(by = "score", ascending=False)
data.plot(kind='barh',figsize=(5,10))

From the above Graph we notice that the most important feature that decides between the churning customer and the existing customer is the Total_Trans_Amt. It makes logical sense since anyone who is planning on churning will try out another bank's services before closing down the current account. 