# **CUSTOMER CHURN ANALYSIS**

<img src ="https://miro.medium.com/max/800/0*dzmm3qresODlScte">

IN THE NOTEBOOK WE WILL PERFORM FOLLOWING ACTIVITES:

1.DATA CLEANING

2.DATA VISUALIZATION

3.MODEL CREATION

4.TASKS GIVEN IN THE CONTEXT OF THE PROBLEM

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
df=pd.read_csv("/kaggle/input/credit-card-customers/BankChurners.csv")

In [None]:
df.head()

In [None]:
df.drop(["CLIENTNUM","Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2",
        "Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1"],
       axis=1,inplace=True)

## DATA VISUALIZATION

In [None]:
sns.countplot(df["Attrition_Flag"])

ATTRITION_FLAG IS OUR TARGET VARIABLE. FROM THE ABOVE PLOT WE CAN SEE OUR DATA IS HIGHLY IMBALANCED. SO, WE HAVE TO TUNE OUR DATA PROPERLY.

In [None]:
sns.distplot(df["Customer_Age"])

FROM THE ABOVE PLOT WE CAN SEE THAT OUR DATA IS NORMALLY DISTRIBUTED.

In [None]:
print("Mean value of age is:",df["Customer_Age"].mean())
print("Variance of age is:",df["Customer_Age"].var())

In [None]:
sns.countplot(df["Gender"])

NO. OF FEMALE IS MORE THAN THAT OF MALE.

In [None]:
sns.countplot(df["Gender"],hue=df["Attrition_Flag"])

ATTRITION COMPARISION OF MALE & FEMALE CUSTOMERS.

In [None]:
sns.countplot(df["Card_Category"],hue=df["Attrition_Flag"])

CUSTOMERS WITH PREMIUM PLANS HAVE LOWER RATE OF ATTRITION. THIS SIGNIFIES PREMIUM CUSTOMERS ARE LOYAL CUSTOMERS

In [None]:
sns.countplot(df["Education_Level"])
plt.xticks(rotation=90)

MOST CUSTOMERS ARE GRADUATE & LEAST NO. OF CUSTOMERS ARE DOCTORATE.

In [None]:
sns.countplot(df["Marital_Status"])

IN OUR DATA SET MOST PEOPLE BELONGS TO MARRIED CLASS.

In [None]:
sns.countplot(df["Income_Category"])
plt.xticks(rotation=90)

MOST PEOPLE EARNS LESS THAN 40K

## DATA CLEANING

*WE WILL CONVERT THE STRING VALUES TO INT/FLOAT VALUE, SO THAT WE CAN FEED OUR
DATA TO THE ML MODEL*

In [None]:
df["Attrition_Flag"].replace("Attrited Customer",1,inplace=True)
df["Attrition_Flag"].replace("Existing Customer",0,inplace=True)

In [None]:
df.rename(columns={"Attrition_Flag":"Churn"},inplace=True)

In [None]:
df["Gender"].replace("M",1,inplace=True)
df["Gender"].replace("F",0,inplace=True)

In [None]:
df["Marital_Status"].replace('Single',1,inplace=True)
df["Marital_Status"].replace('Married',2,inplace=True)
df["Marital_Status"].replace('Divorced',3,inplace=True)
df["Marital_Status"].replace('Unknown',4,inplace=True)

In [None]:
df["Card_Category"].replace('Blue',1,inplace=True)
df["Card_Category"].replace('Silver',2,inplace=True)
df["Card_Category"].replace('Gold',3,inplace=True)
df["Card_Category"].replace('Platinum',4,inplace=True)

In [None]:
df["Education_Level"].replace('Uneducated',1,inplace=True)
df["Education_Level"].replace('High School',2,inplace=True)
df["Education_Level"].replace('College',3,inplace=True)
df["Education_Level"].replace('Graduate',4,inplace=True)
df["Education_Level"].replace('Post-Graduate',5,inplace=True)
df["Education_Level"].replace('Doctorate',6,inplace=True)
df["Education_Level"].replace('Unknown',7,inplace=True)

In [None]:
df["Income_Category"].unique()

In [None]:
df["Income_Category"].replace("$60K - $80K",3,inplace=True)
df["Income_Category"].replace("Less than $40K",1,inplace=True)
df["Income_Category"].replace("$80K - $120K",4,inplace=True)
df["Income_Category"].replace("$40K - $60K",2,inplace=True)
df["Income_Category"].replace("$120K +",5,inplace=True)
df["Income_Category"].replace("Unknown",np.nan,inplace=True)

In [None]:
df.dropna(axis=0,inplace=True)

In [None]:
df=df.astype(float)

In [None]:
df.head()

In [None]:
plt.figure(figsize=(20,8))
sns.heatmap(df.corr(),annot=True,cmap="summer")

WE WILL DROP THE COLUMNS WHICH HAVE LOW CORRELATION WITH OUR TARGATE COLUMN (CHURN)

In [None]:
#dropping the columns
df.drop(['Gender','Customer_Age','Dependent_count','Education_Level','Marital_Status',
         'Income_Category','Card_Category','Months_on_book','Credit_Limit'],axis=1,inplace=True)

In [None]:
df.head()

## MODEL PRE-PROCESSING

In [None]:
X=df.drop(["Churn"],axis=1)
y=df["Churn"]

TASK:RELATION WITH THE I/P FEATURES TO OUR TARGET FEATURE

In [None]:
from sklearn.ensemble import ExtraTreesClassifier
import matplotlib.pyplot as plt
model=ExtraTreesClassifier()
model.fit(X,y)

In [None]:
ranked_features=pd.Series(model.feature_importances_,index=X.columns)
ranked_features.plot(kind='barh')

ABOVE PLOT SHOWS THE IMPORTANCE OF I/P FEATURES TO OUR TARGET FEATURE(CHURN).

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=0)

## MODEL BUILDING

In [None]:
from sklearn.ensemble import RandomForestClassifier
model=RandomForestClassifier()
model.fit(X_train,y_train)

In [None]:
y_pred=model.predict(X_test)

from sklearn.metrics import confusion_matrix,classification_report,accuracy_score
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print("Model accuracy score is:",accuracy_score(y_test,y_pred))

HYPERPARAMETER TUNING

In [None]:
from sklearn.model_selection import GridSearchCV

#number of trees
n_estimators=[200,400,600,800,1000]

#max number of levels in tree
max_depth=[int(x) for x in np.linspace(5,20,num=3)]

#min number of samples reqd at each leaf node
min_samples_leaf=[1,2,5,10]

#create the param grid
param_grid={'n_estimators':n_estimators,
           'max_depth':max_depth,
           'min_samples_leaf':min_samples_leaf
           }

#grid serach cv
model2=GridSearchCV(estimator=model,param_grid=param_grid,cv=8,n_jobs=-1)

In [None]:
model2.fit(X_train,y_train)

In [None]:
model2.best_estimator_

In [None]:
model3=RandomForestClassifier(max_depth=20, min_samples_leaf=2, n_estimators=200)

In [None]:
model3.fit(X_train,y_train)

In [None]:
y_pred=model3.predict(X_test)

from sklearn.metrics import confusion_matrix,classification_report,accuracy_score
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print("Model accuracy score is:",accuracy_score(y_test,y_pred))

STRANGELY OUR ACCURACY GOT DECREASED AFTER HYPERPARAMETER TUNING.

### *KINDLY UPVOTE IF YOU GOT HELPED :)*


***UPDATES COMING SOON..