# Model Evaluation

In this notebook we will evaluate the performance of our models for the `Telco Customer Churn` dataset. We will compare Logistic Regression and with its Tuned Hyperparameter C, Decision Tree and with its Tuned Hyperparameters.

By the end of this notebook, we will identify the strongest model for predicting customer churn for Telco.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, cross_validate, cross_val_score
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.metrics import log_loss, roc_curve, roc_auc_score
import pickle
import warnings
warnings.filterwarnings('ignore')
plt.style.use('ggplot')
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:,.2f}'.format)

df = pd.read_csv('../data/encoded_telco_churn.csv')
df

Unnamed: 0,Male,Partner,Dependents,SeniorCitizen,DurationMonths,PhoneService,MultipleLines,NoInternet,DSLInternet,FiberOpticInternet,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,MonthlyContract,AnnualContract,BiannualContract,MonthlyCharges,Churn
0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,0,1,0,0,29.85,0
1,1,0,0,0,34,1,0,0,1,0,1,0,1,0,0,0,0,1,0,56.95,0
2,1,0,0,0,2,1,0,0,1,0,1,1,0,0,0,0,1,0,0,53.85,1
3,1,0,0,0,45,0,0,0,1,0,1,0,1,1,0,0,0,1,0,42.30,0
4,0,0,0,0,2,1,0,0,0,1,0,0,0,0,0,0,1,0,0,70.70,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,1,1,1,0,24,1,1,0,1,0,1,0,1,1,1,1,0,1,0,84.80,0
7039,0,1,1,0,72,1,1,0,0,1,0,1,1,0,1,1,0,1,0,103.20,0
7040,0,1,1,0,11,0,0,0,1,0,1,0,0,0,0,0,1,0,0,29.60,0
7041,1,1,0,1,4,1,1,0,0,1,0,0,0,0,0,0,1,0,0,74.40,1


### Model Builds from `model_development.ipynb`

In [2]:
X = df.drop('Churn', axis=1)
y = df['Churn']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

logreg_base = LogisticRegression()
logreg_base.fit(X_train, y_train)
logreg_base_ypred = logreg_base.predict(X_test)
logreg_base_ypred_proba = logreg_base.predict_proba(X_test)

logreg_tune = LogisticRegression(C=1e-2)
logreg_tune.fit(X_train, y_train)
logreg_tune_ypred = logreg_tune.predict(X_test)
logreg_tune_ypred_proba = logreg_tune.predict_proba(X_test)

dtree_base = DecisionTreeClassifier()
dtree_base.fit(X_train, y_train)
dtree_base_ypred = dtree_base.predict(X_test)
dtree_base_ypred_proba = dtree_base.predict_proba(X_test)

dtree_tune = DecisionTreeClassifier(max_depth=10, min_samples_split=100, min_samples_leaf=75, criterion='entropy')
dtree_tune.fit(X_train, y_train)
dtree_tune_ypred = dtree_tune.predict(X_test)
dtree_tune_ypred_proba = dtree_tune.predict_proba(X_test)

6. Model evaluation
- Explore the reports of the model
- Feature importance, connect it back to your EDA (we inferred this and our model proved that)
- Connect it back to the problem, how does the model solve the issue at hand, try and test the model, show its prediction with a sample
- Use visualizations from the model, no more basic bullshit models
- Create 3 recommendations if relevant, that the model predicts upon and use those to address the problem.

## Model Evaluation `Scores`

### Simple Scores

Logistic Regression Base Model:

In [3]:
logreg_base_accuracy = accuracy_score(y_test, logreg_base_ypred)
logreg_base_logloss = log_loss(y_test, logreg_base_ypred_proba)
print(logreg_base_accuracy)
print(logreg_base_logloss)

0.8035967818267865
0.41196871806881724


Logistic Regression Tuned C Model

In [4]:
logreg_tune_accuracy = accuracy_score(y_test, logreg_tune_ypred)
logreg_tune_logloss = log_loss(y_test, logreg_tune_ypred_proba)
print(logreg_tune_accuracy)
print(logreg_tune_logloss)

0.8054898248935163
0.41815537267319935


Decision Tree Base Model

In [5]:
dtree_base_accuracy = accuracy_score(y_test, dtree_base_ypred)
dtree_base_logloss = log_loss(y_test, dtree_base_ypred_proba)
print(dtree_base_accuracy)
print(dtree_base_logloss)

0.7231424514907714
9.866192915319965


Decision Tree Tuned Model

In [6]:
dtree_tune_accuracy = accuracy_score(y_test, dtree_tune_ypred)
dtree_tune_logloss = log_loss(y_test, dtree_tune_ypred_proba)
print(dtree_tune_accuracy)
print(dtree_tune_logloss)

0.7931850449597728
0.4767381856865943


### Classification Metrics

Logistic Regression Base Model

In [7]:
print('Logistic Regression Base Model')
print('----------')
logreg_base_report = classification_report(y_test, logreg_base_ypred)
print(logreg_base_report)

Logistic Regression Base Model
----------
              precision    recall  f1-score   support

           0       0.84      0.90      0.87      1539
           1       0.67      0.55      0.61       574

    accuracy                           0.80      2113
   macro avg       0.76      0.73      0.74      2113
weighted avg       0.80      0.80      0.80      2113



Logistic Regression Tuned C Model

In [8]:
print('Logistic Regression Tuned Model')
print('----------')
logreg_tune_report = classification_report(y_test, logreg_tune_ypred)
print(logreg_tune_report)

Logistic Regression Tuned Model
----------
              precision    recall  f1-score   support

           0       0.83      0.92      0.87      1539
           1       0.69      0.51      0.59       574

    accuracy                           0.81      2113
   macro avg       0.76      0.71      0.73      2113
weighted avg       0.80      0.81      0.80      2113



Decision Tree Base Model

In [9]:
print('Decision Tree Base Model')
print('----------')
dtree_base_report = classification_report(y_test, dtree_base_ypred)
print(dtree_base_report)

Decision Tree Base Model
----------
              precision    recall  f1-score   support

           0       0.81      0.81      0.81      1539
           1       0.49      0.48      0.49       574

    accuracy                           0.72      2113
   macro avg       0.65      0.65      0.65      2113
weighted avg       0.72      0.72      0.72      2113



Decision Tree Tuned Model

In [10]:
print('Decision Tree Tuned Model')
print('----------')
dtree_tune_report = classification_report(y_test, dtree_tune_ypred)
print(dtree_tune_report)

Decision Tree Tuned Model
----------
              precision    recall  f1-score   support

           0       0.83      0.90      0.86      1539
           1       0.65      0.51      0.57       574

    accuracy                           0.79      2113
   macro avg       0.74      0.71      0.72      2113
weighted avg       0.78      0.79      0.78      2113



### Plots