<a href="https://colab.research.google.com/github/silversilencee/Customer_Churn/blob/main/Customer_Churn_ML_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Importing Libraries**

In [None]:
import pandas as pd
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.tree import DecisionTreeClassifier
from imblearn.combine import SMOTEENN

## **Reading csv**

In [None]:
df=pd.read_csv("tel_churn.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,OnlineSecurity,OnlineBackup,...,InternetService_DSL,InternetService_Fiber_optic,InternetService_No,Contract_Month_to_month,Contract_One_year,Contract_Two_year,PaymentMethod_Bank_transfer_automatic,PaymentMethod_Credit_card_automatic,PaymentMethod_Electronic_check,PaymentMethod_Mailed_check
0,0,0,0,1,0,1,0,0,0,1,...,1,0,0,1,0,0,0,0,1,0
1,1,1,0,0,0,34,1,0,1,0,...,1,0,0,0,1,0,0,0,0,1
2,2,1,0,0,0,2,1,0,1,1,...,1,0,0,1,0,0,0,0,0,1
3,3,1,0,0,0,45,0,0,1,0,...,1,0,0,0,1,0,1,0,0,0
4,4,0,0,0,0,2,1,0,0,0,...,0,1,0,1,0,0,0,0,1,0


In [None]:
df=df.drop('Unnamed: 0',axis=1)

In [None]:
x=df.drop('Churn',axis=1)
x

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,OnlineSecurity,OnlineBackup,DeviceProtection,...,InternetService_DSL,InternetService_Fiber_optic,InternetService_No,Contract_Month_to_month,Contract_One_year,Contract_Two_year,PaymentMethod_Bank_transfer_automatic,PaymentMethod_Credit_card_automatic,PaymentMethod_Electronic_check,PaymentMethod_Mailed_check
0,0,0,1,0,1,0,0,0,1,0,...,1,0,0,1,0,0,0,0,1,0
1,1,0,0,0,34,1,0,1,0,1,...,1,0,0,0,1,0,0,0,0,1
2,1,0,0,0,2,1,0,1,1,0,...,1,0,0,1,0,0,0,0,0,1
3,1,0,0,0,45,0,0,1,0,1,...,1,0,0,0,1,0,1,0,0,0
4,0,0,0,0,2,1,0,0,0,0,...,0,1,0,1,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7027,1,0,1,1,24,1,1,1,0,1,...,1,0,0,0,1,0,0,0,0,1
7028,0,0,1,1,72,1,1,0,1,1,...,0,1,0,0,1,0,0,1,0,0
7029,0,0,1,1,11,0,0,1,0,0,...,1,0,0,1,0,0,0,0,1,0
7030,1,1,1,0,4,1,1,0,0,0,...,0,1,0,1,0,0,0,0,0,1


In [None]:
y=df['Churn']
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)
y

array([0, 0, 1, ..., 0, 1, 0])

## **Train Test Split**

In [None]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)

## **Decision Tree Classifier**

In [None]:
model_dt=DecisionTreeClassifier(criterion = "gini",random_state = 100,max_depth=6, min_samples_leaf=8)

In [None]:
model_dt.fit(x_train,y_train)

DecisionTreeClassifier(max_depth=6, min_samples_leaf=8, random_state=100)

In [None]:
y_pred=model_dt.predict(x_test)
y_pred

array([0, 0, 0, ..., 1, 1, 1])

In [None]:
model_dt.score(x_test,y_test)

0.7818052594171997

In [None]:
print(classification_report(y_test, y_pred, labels=[0,1]))

              precision    recall  f1-score   support

           0       0.82      0.89      0.85      1008
           1       0.65      0.51      0.57       399

    accuracy                           0.78      1407
   macro avg       0.73      0.70      0.71      1407
weighted avg       0.77      0.78      0.77      1407



As you can see that the accuracy is quite low, and as it's an imbalanced dataset, we shouldn't consider Accuracy as our metrics to measure the model, as Accuracy is cursed in imbalanced datasets.
Hence, we need to check recall, precision & f1 score for the minority class, and it's quite evident that the precision, recall & f1 score is too low for Class 1, i.e. churned customers.
Hence, moving ahead to call SMOTEENN (UpSampling + ENN)

In [None]:
sm = SMOTEENN()
x_resampled, y_resampled = sm.fit_resample(x,y)

In [None]:
xr_train,xr_test,yr_train,yr_test=train_test_split(x_resampled, y_resampled,test_size=0.2)

In [None]:
model_dt_smote=DecisionTreeClassifier(criterion = "gini",random_state = 100,max_depth=6, min_samples_leaf=8)

In [None]:
model_dt_smote.fit(xr_train,yr_train)
yr_predict = model_dt_smote.predict(xr_test)
model_score_r = model_dt_smote.score(xr_test, yr_test)
print(model_score_r)
print(metrics.classification_report(yr_test, yr_predict))

0.9292493528904228
              precision    recall  f1-score   support

           0       0.92      0.93      0.92       539
           1       0.94      0.93      0.93       620

    accuracy                           0.93      1159
   macro avg       0.93      0.93      0.93      1159
weighted avg       0.93      0.93      0.93      1159



In [None]:
print(metrics.confusion_matrix(yr_test, yr_predict))

[[499  40]
 [ 42 578]]


Now we can see quite better results, i.e. Accuracy: 92 %, and a very good recall, precision & f1 score for minority class.
Let's try with some other classifier.

## **Random Forest Classifier**

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
model_rf=RandomForestClassifier(n_estimators=100, criterion='gini', random_state = 100,max_depth=6, min_samples_leaf=8)

In [None]:
model_rf.fit(x_train,y_train)

RandomForestClassifier(max_depth=6, min_samples_leaf=8, random_state=100)

In [None]:
y_pred=model_rf.predict(x_test)

In [None]:
model_rf.score(x_test,y_test)

0.7882018479033405

In [None]:
print(classification_report(y_test, y_pred, labels=[0,1]))

              precision    recall  f1-score   support

           0       0.82      0.90      0.86      1008
           1       0.67      0.50      0.57       399

    accuracy                           0.79      1407
   macro avg       0.74      0.70      0.72      1407
weighted avg       0.78      0.79      0.78      1407



In [None]:
sm = SMOTEENN()
x_resampled1, y_resampled1 = sm.fit_resample(x,y)

In [None]:
xr_train1,xr_test1,yr_train1,yr_test1=train_test_split(x_resampled1, y_resampled1,test_size=0.2)

In [None]:
model_rf_smote=RandomForestClassifier(n_estimators=100, criterion='gini', random_state = 100,max_depth=6, min_samples_leaf=8)

In [None]:
model_rf_smote.fit(xr_train1,yr_train1)

RandomForestClassifier(max_depth=6, min_samples_leaf=8, random_state=100)

In [None]:
yr_predict1 = model_rf_smote.predict(xr_test1)

In [None]:
model_score_r1 = model_rf_smote.score(xr_test1, yr_test1)

In [None]:
print(model_score_r1)
print(metrics.classification_report(yr_test1, yr_predict1))

0.9452173913043478
              precision    recall  f1-score   support

           0       0.95      0.92      0.94       508
           1       0.94      0.96      0.95       642

    accuracy                           0.95      1150
   macro avg       0.95      0.94      0.94      1150
weighted avg       0.95      0.95      0.95      1150



In [None]:
print(metrics.confusion_matrix(yr_test1, yr_predict1))

[[468  40]
 [ 23 619]]


With RF Classifier, also we are able to get quite good results, infact better than Decision Tree.
We can now further go ahead and create multiple classifiers to see how the model performance is, but that's not covered here

## **Performing PCA**

In [None]:
# Applying PCA
from sklearn.decomposition import PCA
pca = PCA(0.9)
xr_train_pca = pca.fit_transform(xr_train1)
xr_test_pca = pca.transform(xr_test1)
explained_variance = pca.explained_variance_ratio_

In [None]:
model=RandomForestClassifier(n_estimators=100, criterion='gini', random_state = 100,max_depth=6, min_samples_leaf=8)

In [None]:
model.fit(xr_train_pca,yr_train1)

RandomForestClassifier(max_depth=6, min_samples_leaf=8, random_state=100)

In [None]:
yr_predict_pca = model.predict(xr_test_pca)

In [None]:
model_score_r_pca = model.score(xr_test_pca, yr_test1)

In [None]:
print(model_score_r_pca)
print(metrics.classification_report(yr_test1, yr_predict_pca))

0.717391304347826
              precision    recall  f1-score   support

           0       0.71      0.61      0.66       508
           1       0.72      0.80      0.76       642

    accuracy                           0.72      1150
   macro avg       0.72      0.71      0.71      1150
weighted avg       0.72      0.72      0.71      1150



With PCA, we couldn't see any better results, hence let's finalise the model which was created by RF Classifier, and save the model so that we can use it in a later stage :)

## **Pickling the model**


In [None]:
import pickle

In [None]:
filename = 'model.sav'

In [None]:
pickle.dump(model_rf_smote, open(filename, 'wb'))

In [None]:
load_model = pickle.load(open(filename, 'rb'))

In [None]:
model_score_r1 = load_model.score(xr_test1, yr_test1)

In [None]:
model_score_r1

0.9452173913043478

Our final model i.e. RF Classifier with SMOTEENN, is now ready and dumped in model.sav, which we will use and prepare API's so that we can access our model from UI.

In [None]:
df.columns.values

array(['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'tenure',
       'PhoneService', 'MultipleLines', 'OnlineSecurity', 'OnlineBackup',
       'DeviceProtection', 'TechSupport', 'StreamingMovies',
       'PaperlessBilling', 'MonthlyCharges', 'TotalCharges', 'Churn',
       'no_internet_service', 'StreamingTV', 'InternetService_DSL',
       'InternetService_Fiber_optic', 'InternetService_No',
       'Contract_Month_to_month', 'Contract_One_year',
       'Contract_Two_year', 'PaymentMethod_Bank_transfer_automatic',
       'PaymentMethod_Credit_card_automatic',
       'PaymentMethod_Electronic_check', 'PaymentMethod_Mailed_check'],
      dtype=object)