<a href="https://colab.research.google.com/github/rizky23/training_ds/blob/main/Module_4_Take_home_Assignment_Muhammad_Rizky_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Import Packages

In [3450]:
import sys
import os
import gdown
import logging
logging.getLogger('matplotlib.font_manager').setLevel(level=logging.CRITICAL)

import warnings
warnings.filterwarnings('ignore')

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

random_state = 1234 # get reproducible trees


# Prepare Data

**Restartnet** is a telecommunication company that are market leader in in Wakanda since 1990 and the first to create high speed mobile internet that integrate satellite and ground cable.

In the last 5 years, there are more fierce competition with new competitor emerging. A lot of Restartnet customer is moving to that new competitor and Restartnet CEO is quite concern about that issue.

After digging some data, Restarnet CEO realize that their churn rate is high at 25%.



As a **CEO Analyst**, we initiate to find which customers are likely to churn by creating a **customer churn model** so that we can offer accurate engagement packages to the targeted customers.

After we provide the list of customer, we calculate the impact for the company.

With assumsions:

* For each customer churn, we lost $500.

* Engagement program cost is $100, and

* All customer that get will stay


The **data** is provided in this [link](https://drive.google.com/file/d/1jAFn03vk055D9gZrrzM70_cdPyUDg-bv/view) which consist of sample **unique customer** that have already bought internet package in Restartnet company from 2010 to 2020. The customer data consist of their demographic data and the summary of their transaction in Restartnet. The detail of the data definition can be seen below.

Data Definition:

| Field           | Description                                     |
|-----------------|-------------------------------------------------|
| customerID      | Customer's unique identifier                     |
| gender          | Whether the customer is a male or a female      |
| SeniorCitizen   | Whether the customer is a senior citizen or not |
| Partner         | Whether the customer has a partner or not       |
| Dependents      | Whether the customer has dependents or not      |
| tenure          | Number of months the customer has stayed        |
| PhoneService    | Whether the customer has a phone service or not |
| MultipleLines   | Whether the customer has multiple lines or not  |
| InternetService | Customer's internet service provider            |
| OnlineSecurity  | Whether the customer has online security or not |
| OnlineBackup    | Whether the customer has online backup or not   |
| DeviceProtection| Whether the customer has device protection or not |
| TechSupport     | Whether the customer has tech support or not    |
| StreamingTV     | Whether the customer has streaming TV or not    |
| StreamingMovies | Whether the customer has streaming movies or not|
| Contract        | The contract term of the customer               |
| PaperlessBilling| Whether the customer has paperless billing or not |
| PaymentMethod   | The customer's payment method                   |
| MonthlyCharges  | The amount charged to the customer monthly      |
| TotalCharges    | The total amount charged to the customer        |
| Churn           | Whether the customer churned or not              |



In [3451]:
# Download Data
gdrive_url = "https://drive.google.com/file/d/1jAFn03vk055D9gZrrzM70_cdPyUDg-bv/view"
file_name = 'ustomer Churn - data.csv'
gdown.download(gdrive_url, file_name, fuzzy=True)


In [3452]:
df = pd.read_csv('Customer Churn - data.csv')

In [3453]:
numeric_features = ['tenure', 'MonthlyCharges', 'TotalCharges']
categorical_features = ['gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod']

features = numeric_features + categorical_features
target = 'Churn'

print("numeric_features : ", numeric_features)
print("categorical_features : ", categorical_features)
print("features: ", features)
print("target: ", target)
print("columns used: ", features + [target])


numeric_features :  ['tenure', 'MonthlyCharges', 'TotalCharges']
categorical_features :  ['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod']
features:  ['tenure', 'MonthlyCharges', 'TotalCharges', 'gender', 'SeniorCitizen', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod']
target:  Churn
columns used:  ['tenure', 'MonthlyCharges', 'TotalCharges', 'gender', 'SeniorCitizen', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'Churn']


In [3454]:
df = df[ features + [target] ]


In [3455]:
# Handle missing value on TotalCharges with value 0
df['TotalCharges'] = df['TotalCharges'].fillna(0)

# Handle Categorical Data
## we transform categorical into several column as it will treated differently
df = pd.get_dummies(df, columns = categorical_features)


In [3456]:
# transform target to 1 if Yes, 0 if No
df[target] = (df[target] == 'Yes').astype(int)

In [3457]:
#df.replace({False:0, True:1}, inplace=True)

In [3458]:
df

Unnamed: 0,tenure,MonthlyCharges,TotalCharges,Churn,gender_Female,gender_Male,SeniorCitizen_0,SeniorCitizen_1,Partner_No,Partner_Yes,...,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaperlessBilling_No,PaperlessBilling_Yes,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
0,1,29.85,29.85,0,True,False,True,False,False,True,...,False,True,False,False,False,True,False,False,True,False
1,34,56.95,1889.50,0,False,True,True,False,True,False,...,False,False,True,False,True,False,False,False,False,True
2,2,53.85,108.15,1,False,True,True,False,True,False,...,False,True,False,False,False,True,False,False,False,True
3,45,42.30,1840.75,0,False,True,True,False,True,False,...,False,False,True,False,True,False,True,False,False,False
4,2,70.70,151.65,1,True,False,True,False,True,False,...,False,True,False,False,False,True,False,False,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,24,84.80,1990.50,0,False,True,True,False,False,True,...,True,False,True,False,False,True,False,False,False,True
7039,72,103.20,7362.90,0,True,False,True,False,False,True,...,True,False,True,False,False,True,False,True,False,False
7040,11,29.60,346.45,0,True,False,True,False,False,True,...,False,True,False,False,False,True,False,False,True,False
7041,4,74.40,306.60,1,False,True,False,True,False,True,...,False,True,False,False,False,True,False,False,False,True


In [3459]:
# Split data
## Asumming df_test data is new data
df_train, df_test = train_test_split(df, test_size=0.33, random_state=random_state)

In [3460]:
df_train.head()

Unnamed: 0,tenure,MonthlyCharges,TotalCharges,Churn,gender_Female,gender_Male,SeniorCitizen_0,SeniorCitizen_1,Partner_No,Partner_Yes,...,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaperlessBilling_No,PaperlessBilling_Yes,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
2632,55,64.75,3617.1,0,True,False,True,False,True,False,...,True,False,False,True,False,True,False,False,False,True
1210,17,69.0,1149.65,1,False,True,True,False,False,True,...,False,True,False,False,False,True,False,False,True,False
5018,72,19.7,1379.8,0,True,False,True,False,False,True,...,False,False,False,True,True,False,False,True,False,False
4891,4,65.6,250.1,0,False,True,True,False,False,True,...,False,True,False,False,True,False,False,False,True,False
3794,8,54.75,445.85,0,False,True,True,False,False,True,...,False,True,False,False,False,True,False,False,False,True


In [3461]:
df_train.columns

Index(['tenure', 'MonthlyCharges', 'TotalCharges', 'Churn', 'gender_Female',
       'gender_Male', 'SeniorCitizen_0', 'SeniorCitizen_1', 'Partner_No',
       'Partner_Yes', 'Dependents_No', 'Dependents_Yes', 'PhoneService_No',
       'PhoneService_Yes', 'MultipleLines_No',
       'MultipleLines_No phone service', 'MultipleLines_Yes',
       'InternetService_DSL', 'InternetService_Fiber optic',
       'InternetService_No', 'OnlineSecurity_No',
       'OnlineSecurity_No internet service', 'OnlineSecurity_Yes',
       'OnlineBackup_No', 'OnlineBackup_No internet service',
       'OnlineBackup_Yes', 'DeviceProtection_No',
       'DeviceProtection_No internet service', 'DeviceProtection_Yes',
       'TechSupport_No', 'TechSupport_No internet service', 'TechSupport_Yes',
       'StreamingTV_No', 'StreamingTV_No internet service', 'StreamingTV_Yes',
       'StreamingMovies_No', 'StreamingMovies_No internet service',
       'StreamingMovies_Yes', 'Contract_Month-to-month', 'Contract_One year',

In [3462]:
features = list(df_test.columns)
features.remove(target)

features

['tenure',
 'MonthlyCharges',
 'TotalCharges',
 'gender_Female',
 'gender_Male',
 'SeniorCitizen_0',
 'SeniorCitizen_1',
 'Partner_No',
 'Partner_Yes',
 'Dependents_No',
 'Dependents_Yes',
 'PhoneService_No',
 'PhoneService_Yes',
 'MultipleLines_No',
 'MultipleLines_No phone service',
 'MultipleLines_Yes',
 'InternetService_DSL',
 'InternetService_Fiber optic',
 'InternetService_No',
 'OnlineSecurity_No',
 'OnlineSecurity_No internet service',
 'OnlineSecurity_Yes',
 'OnlineBackup_No',
 'OnlineBackup_No internet service',
 'OnlineBackup_Yes',
 'DeviceProtection_No',
 'DeviceProtection_No internet service',
 'DeviceProtection_Yes',
 'TechSupport_No',
 'TechSupport_No internet service',
 'TechSupport_Yes',
 'StreamingTV_No',
 'StreamingTV_No internet service',
 'StreamingTV_Yes',
 'StreamingMovies_No',
 'StreamingMovies_No internet service',
 'StreamingMovies_Yes',
 'Contract_Month-to-month',
 'Contract_One year',
 'Contract_Two year',
 'PaperlessBilling_No',
 'PaperlessBilling_Yes',
 'P

# Evaluation metrics comparison from several models

## Train & Evaluate Decision Tree Classifier

with specs
```
max depth = 7
class weight = balanced
random state = 1234
```

In [3463]:
df_train.shape

(4718, 47)

In [3464]:
df_test.shape

(2325, 47)

In [3465]:
# import model

from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor

# initiate model
model_tree = DecisionTreeClassifier(max_depth=7, random_state=random_state, class_weight='balanced')

# Train model
model_tree.fit(df_train[features].values, df_train[target].values)


In [3466]:
# Evaluate Precision, Recall, and F1 using Test Data
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix


In [3467]:
features

['tenure',
 'MonthlyCharges',
 'TotalCharges',
 'gender_Female',
 'gender_Male',
 'SeniorCitizen_0',
 'SeniorCitizen_1',
 'Partner_No',
 'Partner_Yes',
 'Dependents_No',
 'Dependents_Yes',
 'PhoneService_No',
 'PhoneService_Yes',
 'MultipleLines_No',
 'MultipleLines_No phone service',
 'MultipleLines_Yes',
 'InternetService_DSL',
 'InternetService_Fiber optic',
 'InternetService_No',
 'OnlineSecurity_No',
 'OnlineSecurity_No internet service',
 'OnlineSecurity_Yes',
 'OnlineBackup_No',
 'OnlineBackup_No internet service',
 'OnlineBackup_Yes',
 'DeviceProtection_No',
 'DeviceProtection_No internet service',
 'DeviceProtection_Yes',
 'TechSupport_No',
 'TechSupport_No internet service',
 'TechSupport_Yes',
 'StreamingTV_No',
 'StreamingTV_No internet service',
 'StreamingTV_Yes',
 'StreamingMovies_No',
 'StreamingMovies_No internet service',
 'StreamingMovies_Yes',
 'Contract_Month-to-month',
 'Contract_One year',
 'Contract_Two year',
 'PaperlessBilling_No',
 'PaperlessBilling_Yes',
 'P

In [3468]:
target

'Churn'

In [3469]:
prediction_tree = model_tree.predict(df_test[features]) # no longer use df, but df_test
label_tree = df_test[target].values # no longer use df, but df_test

In [3470]:
prediction_tree

array([0, 0, 0, ..., 0, 0, 0])

In [3471]:
label_tree

array([0, 0, 0, ..., 0, 0, 0])

In [3472]:
print("accuracy_score \t:" ,accuracy_score(label_tree, prediction_tree))
print("precision_score\t:" ,precision_score(label_tree, prediction_tree))
print("recall_score \t:" ,recall_score(label_tree, prediction_tree))
print("f1_score \t:" ,f1_score(label_tree, prediction_tree))

print("confusion_matrix:")
confusion_matrix(label_tree, prediction_tree)

accuracy_score 	: 0.7118279569892473
precision_score	: 0.46303901437371664
recall_score 	: 0.754180602006689
f1_score 	: 0.573791348600509
confusion_matrix:


array([[1204,  523],
       [ 147,  451]])

## Train & Evaluate Random Forest

with specs
```
n estimators = 10
max_depth = 3
random_state=random_state
class_weight = 'balanced'
```

In [3473]:
features

['tenure',
 'MonthlyCharges',
 'TotalCharges',
 'gender_Female',
 'gender_Male',
 'SeniorCitizen_0',
 'SeniorCitizen_1',
 'Partner_No',
 'Partner_Yes',
 'Dependents_No',
 'Dependents_Yes',
 'PhoneService_No',
 'PhoneService_Yes',
 'MultipleLines_No',
 'MultipleLines_No phone service',
 'MultipleLines_Yes',
 'InternetService_DSL',
 'InternetService_Fiber optic',
 'InternetService_No',
 'OnlineSecurity_No',
 'OnlineSecurity_No internet service',
 'OnlineSecurity_Yes',
 'OnlineBackup_No',
 'OnlineBackup_No internet service',
 'OnlineBackup_Yes',
 'DeviceProtection_No',
 'DeviceProtection_No internet service',
 'DeviceProtection_Yes',
 'TechSupport_No',
 'TechSupport_No internet service',
 'TechSupport_Yes',
 'StreamingTV_No',
 'StreamingTV_No internet service',
 'StreamingTV_Yes',
 'StreamingMovies_No',
 'StreamingMovies_No internet service',
 'StreamingMovies_Yes',
 'Contract_Month-to-month',
 'Contract_One year',
 'Contract_Two year',
 'PaperlessBilling_No',
 'PaperlessBilling_Yes',
 'P

In [3474]:
target

'Churn'

In [3475]:
df.head()

Unnamed: 0,tenure,MonthlyCharges,TotalCharges,Churn,gender_Female,gender_Male,SeniorCitizen_0,SeniorCitizen_1,Partner_No,Partner_Yes,...,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaperlessBilling_No,PaperlessBilling_Yes,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
0,1,29.85,29.85,0,True,False,True,False,False,True,...,False,True,False,False,False,True,False,False,True,False
1,34,56.95,1889.5,0,False,True,True,False,True,False,...,False,False,True,False,True,False,False,False,False,True
2,2,53.85,108.15,1,False,True,True,False,True,False,...,False,True,False,False,False,True,False,False,False,True
3,45,42.3,1840.75,0,False,True,True,False,True,False,...,False,False,True,False,True,False,True,False,False,False
4,2,70.7,151.65,1,True,False,True,False,True,False,...,False,True,False,False,False,True,False,False,True,False


In [3476]:
# import model

from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

# initiate model
model_rf = RandomForestClassifier(max_depth=3, random_state=random_state, class_weight='balanced', n_estimators=10)

# Train model
model_rf.fit(df_train[features].values, df_train[target].values)

In [3477]:
prediction_rf = model_rf.predict(df_test[features]) # no longer use df, but df_test
label_rf = df_test[target].values # no longer use df, but df_test

In [3478]:
# Evaluate Precision, Recall, and F1 using Test Data
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

print("precision \t:" ,round(precision_score(label_rf, prediction_rf), 4))
print("recall \t\t:" ,round(recall_score(label_rf, prediction_rf), 4))
print("f1_score \t:" ,round(f1_score(label_rf, prediction_rf), 4))


precision 	: 0.4733
recall 		: 0.801
f1_score 	: 0.595


## Train & Evaluate Your own model

Feel free to pick any classification model in https://scikit-learn.org/stable/supervised_learning.html

But you required to have higher f1_score more than `0.61`


In [3479]:
# initiate model
model_rf_improvement = RandomForestClassifier(max_depth=8, random_state=random_state, class_weight='balanced', n_estimators=100)

# Train model
model_rf_improvement.fit(df_train[features].values, df_train[target].values)



In [3480]:
# Evaluate Precision, Recall, and F1 using Test Data

prediction_rfi = model_rf_improvement.predict(df_test[features]) # no longer use df, but df_test
label_rfi = df_test[target].values # no longer use df, but df_test

print('Hasil Dengan Mengubah max dept =8 dan n_estimators =100 \t')
print("precision \t:" ,round(precision_score(label_rfi, prediction_rfi), 4))
print("recall \t\t:" ,round(recall_score(label_rfi, prediction_rfi), 4))
print("f1_score \t:" ,round(f1_score(label_rfi, prediction_rfi), 4))



Hasil Dengan Mengubah max dept =8 dan n_estimators =100 	
precision 	: 0.5169
recall 		: 0.7659
f1_score 	: 0.6173


In [3481]:
import xgboost as xgb

# Inisialisasi model XGBoost
model_xgb = xgb.XGBClassifier(
    max_depth=5,
    learning_rate=0.1,
    n_estimators=100,
    objective='binary:logistic',
    booster='gbtree',
    reg_alpha=0.1,
    reg_lambda=0.1,
    min_child_weight=2,
    gamma=0.1,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=random_state,
    scale_pos_weight=1,
)

# Latih model XGBoost
model_xgb.fit(df_train[features], df_train[target])

# Prediksi menggunakan model XGBoost
prediction_xgb = model_xgb.predict(df_test[features])
label_xgb = df_test[target].values

# Evaluasi model XGBoost
from sklearn.metrics import precision_score, recall_score, f1_score

print('Penggunaan XGBoost \t')
print("precision:", round(precision_score(label_xgb, prediction_xgb), 4))
print("recall:", round(recall_score(label_xgb, prediction_xgb), 4))
print("f1_score:", round(f1_score(label_xgb, prediction_xgb), 4))


Penggunaan XGBoost 	
precision: 0.6228
recall: 0.5217
f1_score: 0.5678


In [3482]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split

# Pisahkan data menjadi data latih dan data uji
X_train, X_test, y_train, y_test = train_test_split(df[features], df[target], test_size=0.2, random_state=42)

# Inisialisasi model Gradient Boosting Classifier
model_gb = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

# Latih model
model_gb.fit(X_train, y_train)

# Prediksi pada data uji
predictions = model_gb.predict(X_test)

# Evaluasi model
accuracy = accuracy_score(y_test, predictions)
precision = precision_score(y_test, predictions)
recall = recall_score(y_test, predictions)
f1 = f1_score(y_test, predictions)

print('Penggunaan Gradient Boosting \t')
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)


Penggunaan Gradient Boosting 	
Accuracy: 0.8090844570617459
Precision: 0.6721854304635762
Recall: 0.5442359249329759
F1 Score: 0.6014814814814815


# Business impact comparison from several models

[recall the assumption]

assumsions:

* For each customer churn, we lost $500.

* Engagement program cost is $100, and

* All customer that get engagement will stay

----
We want to compare the business impact on:
* Case 1: if no engagement program
* Case 2: if we send engagement program to all user
* Case 3: if we send engagement program based on above decision tree (`model_tree`)
* Case 4: if we send engagement program based on above random forest (`model_rf`)  
* Case 5: if we send engagement program based on above the best model (`model`)

----

First we calculate how many customer and churn customer in test dataset

In [3483]:
total_customer = len(df_test)
real_churn = len(df_test.loc[df_test[target] == 1])

print("Total customer \t:", total_customer)
print("Total churn \t:", real_churn)

Total customer 	: 2325
Total churn 	: 598


Save the assumption into variable

In [3484]:
churn_value_lost_per_customer = 500
engagement_cost_per_customer = 100

print("Churn Value Lost per customer\t:", churn_value_lost_per_customer)
print("Engagement Cost per customer\t:", engagement_cost_per_customer)

Churn Value Lost per customer	: 500
Engagement Cost per customer	: 100


## Case 1: if no engagement program

In [3485]:
print("CASE 1: If no engagement program")

value_lost_case1 = real_churn * churn_value_lost_per_customer
engagement_cost_case1 = 0 # because no engagement
total_cost_case1 = value_lost_case1 + engagement_cost_case1
print("\t Value Lost \t: $", value_lost_case1)
print("\t Engagement cost: $", engagement_cost_case1)
print("\t Total cost \t: $",  total_cost_case1)


CASE 1: If no engagement program
	 Value Lost 	: $ 299000
	 Engagement cost: $ 0
	 Total cost 	: $ 299000


## Case 2: if we send engagement program to all user

In [3486]:
print("Case 2: if we send engagement program to all user")

value_lost_case2 = 0 # because no customer lost
engagement_cost_case2 = total_customer * engagement_cost_per_customer
total_cost_case2 = value_lost_case2 + engagement_cost_case2
print("\t Value Lost \t: $", value_lost_case2)
print("\t Engagement cost: $", engagement_cost_case2)
print("\t Total cost \t: $",  total_cost_case2)


Case 2: if we send engagement program to all user
	 Value Lost 	: $ 0
	 Engagement cost: $ 232500
	 Total cost 	: $ 232500


Looks like if we send engagement program to all customer, it is more beneficial for the company (232500 < 299000)

But lets see how the model performs

## Case 3: if we send engagement program based on above decision tree (`model_tree`)

Tips, you need to find the number of
* how many customer that predicted as churn (`predict_churn`)
* how many customer that actually churn **but** we predict it as stay (`real_churn_predict_stay`)

Hint: you can use confussion matrix
```python
confusion_matrix(y_true_test, y_pred_test)
```
explore the indexing of `confusion_matrix` like using `[0,0]` to get the number inside confusion matrix
```python
confusion_matrix(y_true_test, y_pred_test)[0,0]
```

for reminder, this is the content of confusion matrix
![Confusion metrics](https://miro.medium.com/v2/resize:fit:974/1*H_XIN0mknyo0Maw4pKdQhw.png)

In [3487]:
confusion_matrix(label_tree, prediction_tree)

array([[1204,  523],
       [ 147,  451]])

In [3488]:
print("CASE 3:  if we send engagement program based on above decision tree (model_tree)")
value_lost_case3 = confusion_matrix(label_tree, prediction_tree)[1,0]*churn_value_lost_per_customer
engagement_cost_case3 = (confusion_matrix(label_tree, prediction_tree)[0,1] + confusion_matrix(label_tree, prediction_tree)[1,1]) * engagement_cost_per_customer
total_cost_case3 = value_lost_case3 + engagement_cost_case3

print("\t Value Lost \t: $", value_lost_case3)
print("\t Engagement cost: $", engagement_cost_case3)
print("\t Total cost \t: $",  total_cost_case3)


CASE 3:  if we send engagement program based on above decision tree (model_tree)
	 Value Lost 	: $ 73500
	 Engagement cost: $ 97400
	 Total cost 	: $ 170900


## Case 4: if we send engagement program based on above random forest (`model_rf`)  

In [3489]:
print("Case 4: if we send engagement program based on above random forest (model_rf) ")

value_lost_case4 = confusion_matrix(label_rf, prediction_rf)[1,0]*churn_value_lost_per_customer
engagement_cost_case4 = (confusion_matrix(label_rf, prediction_rf)[0,1] + confusion_matrix(label_rf, prediction_rf)[1,1]) * engagement_cost_per_customer
total_cost_case4 = value_lost_case4 + engagement_cost_case4


print("\t Value Lost \t: $", value_lost_case4)
print("\t Engagement cost: $", engagement_cost_case4)
print("\t Total cost \t: $",  total_cost_case4)


Case 4: if we send engagement program based on above random forest (model_rf) 
	 Value Lost 	: $ 59500
	 Engagement cost: $ 101200
	 Total cost 	: $ 160700


## Case 5: if we send engagement program based on above the best model (model)

In [3490]:
print("Case 5: if we send engagement program based on above the best model (model)")

value_lost_case5 = confusion_matrix(label_rfi, prediction_rfi)[1,0]*churn_value_lost_per_customer
engagement_cost_case5 = (confusion_matrix(label_rfi, prediction_rfi)[0,1] + confusion_matrix(label_rf, prediction_rf)[1,1]) * engagement_cost_per_customer
total_cost_case5 = value_lost_case5 + engagement_cost_case5

print("\t Value Lost \t: $", value_lost_case5)
print("\t Engagement cost: $", engagement_cost_case5)
print("\t Total cost \t: $",  total_cost_case5)


Case 5: if we send engagement program based on above the best model (model)
	 Value Lost 	: $ 70000
	 Engagement cost: $ 90700
	 Total cost 	: $ 160700
