# Credit Card customers : Why are they churning?

In this notebook, we would be providing a visualization which would help us understand the reasons customers are churning and how these factors be tuned to improve the retention.

Following steps are there in this notebook for easy reference:

* Data Import and Basic Check
* Data types and Null values check
* One hot encoding for categorical data
* Decision Tree classifier for Feature Importance
* Data Visualization for high importance fatures for Churned vs Non Churned Population
* Data Visualization for Positive and Negative Correlation

In [None]:
#Importing required packages
import pandas as pd
import numpy as np

In [None]:
#Loading the dataset
df_cc=pd.read_csv('../input/credit-card-customers/BankChurners.csv')

In [None]:
#First 5 rows in the dataset
df_cc.head(5)

In [None]:
#Deleting last 2 columns as suggested in Data description
df_cc=df_cc.drop(['Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1', 'Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2'], axis = 1) 

In [None]:
#First 5 rows in the dataset
df_cc.head(5)

In [None]:
#Information about columns and datatype
df_cc.info()

This shows there are no null values in entire dataframe.

In [None]:
#Checking unique values of Attrition_flag
df_cc['Attrition_Flag'].unique()

In [None]:
#Changing Attrition_flag to numeric
df_cc.loc[df_cc['Attrition_Flag'] == 'Existing Customer', 'Attrition_Flag'] = 0
df_cc.loc[df_cc['Attrition_Flag'] == 'Attrited Customer', 'Attrition_Flag'] = 1
df_cc.Attrition_Flag = df_cc.Attrition_Flag.astype(int)

In [None]:
#One hot encoding for Categorical variables
df_cc_encoded = pd.get_dummies(df_cc)

#Dropping client ID
df_cc_encoded=df_cc_encoded.drop(['CLIENTNUM'], axis = 1) 

In [None]:
#Defining X and Y
df_cc_X = df_cc_encoded.loc[ : , df_cc_encoded.columns != 'Attrition_Flag']
df_cc_y = df_cc_encoded['Attrition_Flag']

**Feature Importance**

In [None]:
#Feature Importance based on Decision Tree classifier
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier()
clf.fit(df_cc_X, df_cc_y)

pd.Series(clf.feature_importances_, index=df_cc_X.columns[:]).plot.bar(color='steelblue', figsize=(12, 6))

**Visualization for important features: Churned vs Non Churned Customers**

In [None]:
import matplotlib.pyplot as plt
correlations = df_cc.loc[ : , df_cc.columns != 'Attrition_Flag'].corrwith(df_cc.Attrition_Flag)
correlations = correlations[correlations!=1]
correlations.plot.bar(
        figsize = (18, 10), 
        fontsize = 15, 
        color = '#ec838a',
        rot = 45, grid = True)
plt.title('Correlation with Churn Rate \n',
horizontalalignment="center", fontstyle = "normal", 
fontsize = "22", fontfamily = "sans-serif")

It is evident that more Inactivity and relationships drive higher churn rate whereas lower activity (Transaction Count, Utilization ratio etc) drives higher chruning. 

In [None]:
import plotly.express as px
from plotly.subplots import make_subplots

In [None]:
df_txn_count=df_cc_encoded.groupby("Attrition_Flag").agg({"Total_Trans_Ct" : np.median}).reset_index()

In [None]:
fig = px.bar(df_txn_count,
             y='Total_Trans_Ct',
             x='Attrition_Flag',
             color='Attrition_Flag')
fig.update_layout(autosize=False, width=800, height=400,title='Median Txn Counts of Attrition vs Non Attrition')
fig.show()

Customers with higher number of transactions are expected to retain logner. Let's look at what is attrition rate among buckets.