### **Importing Related Notebooks** 

In [2]:
import import_ipynb
import Model_Benchmarking

gb = Model_Benchmarking.gb
X_trainval = Model_Benchmarking.X_trainval
y_trainval = Model_Benchmarking.y_trainval
X_test = Model_Benchmarking.X_test
y_test = Model_Benchmarking.y_test
skfold = Model_Benchmarking.skfold

### **Imbalanced Dataset**

As previously known, the dataset as a whole has an unequal proportion of target classes where in this case study, the number of customers who churn is far less than subscribers who are still subscribed. This can cause several disadvantages, including model performance that tends to be biased, poor generalization, non-representative measurement evaluation results, and wrong interpretation of existing problems. This happens because the amount of data for the minority class, in this case churn customers, is insufficient when the training process is carried out on the algorithm model used. As a result, the algorithm model will produce poor performance in detecting churn customers even though overall it has fairly good accuracy. These deficiencies can be anticipated by manipulating the data so that the number of positive and negative classes is equal.

##### **Imbalanced Cross Validation Score**

The strategy that can be done is to use several class that are already available by **imblearn**, including `RandomOverSampler()`, `SMOTE()`, `RandomUnderSampler()`, and ` NearMiss()`. Each class has its own advantages and disadvantages in handling imbalanced data cases so a cross validation score process will be carried out to find the average performance of each class against the dataset used for the training process and validation, namely `X_trainval` and `y_trainval`. Then, a testing process will be carried out on dataset `X_test` and `y_test`.

This will be easier by creating an artificial class that is capable of carrying out the process. This class consists of three methods which have their respective designations, namely `imbalance_validation()`, `imbalance_test_score()`, and `imbalance_classification_report()`.

In [3]:
from imblearn.over_sampling import RandomOverSampler, SMOTE
from imblearn.under_sampling import RandomUnderSampler, NearMiss
from sklearn.metrics import fbeta_score, average_precision_score, precision_score, recall_score, classification_report

class imbalance_val_score:
    def __init__(self,estimator):
        self.estimator = estimator
        self.methods = ['1','2','3','4','5']
    
    def imbalance_validation(self,X_trainval,y_trainval):
        ap_scores, f2_scores, pr_scores, re_scores = [], [], [], []
        result_scores = [ap_scores, f2_scores, pr_scores, re_scores]
        
        for method in self.methods:
            ap_method, f2_method, pr_method, re_method = [], [], [], []
            method_scores = [ap_method, f2_method, pr_method, re_method]
            
            for train, validation in skfold.split(X_trainval,y_trainval):
                X_train = X_trainval.iloc[train]
                y_train = y_trainval.iloc[train]
                X_val = X_trainval.iloc[validation]
                y_val = y_trainval.iloc[validation]
                scores = (self.method(
                    estimator=self.estimator,
                    X_train=X_train,
                    y_train=y_train,
                    X_test=X_val,
                    y_test=y_val,
                    method=method
                ).method_score())
                
                for i in zip(method_scores,scores):
                    i[0].append(i[1])
            
            for i in zip(result_scores,method_scores):
                i[0].append(i[1])
        
        return ap_scores, f2_scores, pr_scores, re_scores
    
    def imbalance_test_score(self,X_trainval,y_trainval,X_test,y_test):
        ap_scores, f2_scores, pr_scores, re_scores = [], [], [], []
        result_scores = [ap_scores, f2_scores, pr_scores, re_scores]
        
        for method in self.methods:
            scores = (self.method(
                estimator=self.estimator,
                X_train=X_trainval,
                y_train=y_trainval,
                X_test=X_test,
                y_test=y_test,
                method=method
            ).method_score())
            
            for i in zip(result_scores,scores):
                i[0].append(i[1])
        
        return ap_scores, f2_scores, pr_scores, re_scores
    
    def imbalance_classification_report(self,X_trainval,y_trainval,X_test,y_test):
        reports = []

        for method in self.methods:
            reports.append(self.method(
                estimator=self.estimator,
                X_train=X_trainval,
                y_train=y_trainval,
                X_test=X_test,
                y_test=y_test,
                method=method
            ).method_report())
        
        return reports

    class method:
        def __init__(self,estimator,X_train,y_train,X_test,y_test,method):
            self.X_train = X_train
            self.y_train = y_train
            self.X_test = X_test
            self.y_test = y_test
            self.estimator = estimator 
            self.method = method

        def method_score(self):
            X_resampled, y_resampled = self.selection(
                X_train=self.X_train,
                y_train=self.y_train,
                method=self.method
            ).get_samples()

            scores = (self.calculate(
                estimator=self.estimator,
                X_train=X_resampled,
                y_train=y_resampled
            ).get_results(
                X_test=X_test,
                y_test=y_test
            ))

            return scores
        
        def method_report(self):
            X_resampled, y_resampled = self.selection(
                X_train=self.X_train,
                y_train=self.y_train,
                method=self.method
            ).get_samples()

            report = self.calculate(
                estimator=self.estimator,
                X_train=X_resampled,
                y_train=y_resampled
            ).get_report(
                X_test=X_test,
                y_test=y_test
            )

            return report

        class selection:
            def __init__(self,X_train,y_train,method):
                self.X_train = X_train
                self.y_train = y_train
                self.method = method

            def get_samples(self):
                if self.method in ['1','2','3']:
                    method_dict = {
                        '1':RandomOverSampler,
                        '2':RandomUnderSampler,
                        '3':SMOTE
                    }

                    X_resampled, y_resampled = method_dict.setdefault(self.method)\
                        (random_state=1995).fit_resample(
                              X=self.X_train, 
                              y=self.y_train
                        )

                elif self.method == '4':
                    X_resampled, y_resampled = NearMiss().fit_resample(
                        X=self.X_train,
                        y=self.y_train
                    )

                else:
                    X_resampled = self.X_train
                    y_resampled = self.y_train

                return X_resampled, y_resampled

        class calculate:
            def __init__(self,estimator,X_train,y_train):
                self.estimator = estimator.fit(
                    X=X_train,
                    y=y_train
                )
            
            def get_results(self,X_test,y_test):
                ap_score = average_precision_score(
                    y_true=y_test,
                    y_score=self.estimator.predict_proba(X_test)[:,1]
                )

                f2_score = fbeta_score(
                    y_true=y_test,
                    y_pred=self.estimator.predict(X_test),
                    beta=2
                )

                pr_score = precision_score(
                    y_true=y_test,
                    y_pred=self.estimator.predict(X_test)
                )
                re_score = recall_score(
                    y_true=y_test,
                    y_pred=self.estimator.predict(X_test)
                )
                                    
                return [ap_score,f2_score,pr_score,re_score]
            
            def get_report(self,X_test,y_test):
                return classification_report(
                    y_true=y_test,
                    y_pred=self.estimator.predict(X_test)
                )

First of all, we will process cross validation score using method `imbalance_validation()` and display it in dataframe using training and validation data from `X_trainval` and `y_trainval`.

In [4]:
import pandas as pd

estimator = gb
sampling_score = imbalance_val_score(estimator).imbalance_validation(
    X_trainval=X_trainval,
    y_trainval=y_trainval
)

ap_scores, f2_scores, pr_scores, re_scores = [], [], [], []

for i,var in enumerate([ap_scores,f2_scores,pr_scores,re_scores]):
    for val in sampling_score[i]:
        var.append(val)

sampling_comparison = pd.DataFrame(data={
    ('Over Sampling','AP Score'):ap_scores[0],
    ('Over Sampling','F2 Score'):f2_scores[0],
    ('Over Sampling','Precision'):pr_scores[0],
    ('Over Sampling','Recall'):re_scores[0],
    ('Under Sampling','AP Score'):ap_scores[1],
    ('Under Sampling','F2 Score'):f2_scores[1],
    ('Under Sampling','Precision'):pr_scores[1],
    ('Under Sampling','Recall'):re_scores[1],
    ('SMOTE','AP Score'):ap_scores[2],
    ('SMOTE','F2 Score'):f2_scores[2],
    ('SMOTE','Precision'):pr_scores[2],
    ('SMOTE','Recall'):re_scores[2],
    ('Near Miss','AP Score'):ap_scores[3],
    ('Near Miss','F2 Score'):f2_scores[3],
    ('Near Miss','Precision'):pr_scores[3],
    ('Near Miss','Recall'):re_scores[3],
    ('Normal Sampling','AP Score'):ap_scores[4],
    ('Normal Sampling','F2 Score'):f2_scores[4],
    ('Normal Sampling','Precision'):pr_scores[4],
    ('Normal Sampling','Recall'):re_scores[4]
})

for function in [lambda x: x.mean(),lambda x: x.std()]:
    sampling_comparison.loc[len(sampling_comparison)] = sampling_comparison.apply(
        func=function,
        axis=0
    )
    
sampling_comparison = sampling_comparison.rename(index={
    0:'Fold 1',
    1:'Fold 2',
    2:'Fold 3',
    3:'Fold 4',
    4:'Fold 5',
    5:'Average',
    6:'STD'
})

sampling_comparison.apply(
    func=lambda x: round(
        number=x*100,
        ndigits=2
    )
).transpose()

Unnamed: 0,Unnamed: 1,Fold 1,Fold 2,Fold 3,Fold 4,Fold 5,Average,STD
Over Sampling,AP Score,69.22,69.4,70.43,68.88,70.54,69.69,0.67
Over Sampling,F2 Score,75.75,76.29,75.76,77.83,75.88,76.3,0.79
Over Sampling,Precision,56.48,55.52,54.9,55.69,55.22,55.56,0.53
Over Sampling,Recall,82.81,84.16,83.71,86.43,83.71,84.16,1.21
Under Sampling,AP Score,68.99,68.89,68.13,69.0,67.54,68.51,0.58
Under Sampling,F2 Score,75.4,76.7,76.61,76.8,75.75,76.25,0.57
Under Sampling,Precision,52.53,54.31,54.81,52.46,52.66,53.35,1.0
Under Sampling,Recall,84.62,85.52,85.07,86.88,85.07,85.43,0.78
SMOTE,AP Score,68.78,68.84,68.92,70.16,70.59,69.46,0.76
SMOTE,F2 Score,70.4,72.87,73.89,74.01,72.27,72.69,1.31


It can be seen that the resampling method using SMOTE, over sampling, and under sampling has an Average Precision average result that is relatively equally good, with the best level of stability owned by `RandomUnderSampling()` while the NearMiss produced the worst performance. On the other hand, based on the F2 Score measurement results, the over sampling and under sampling methods produce relatively equally good performance. For normal sampling, the Average Precision is quite good but has a much worse F2 Score because there is no resampling process to balance the amount of data for each target class.

F2 Score measurement can be used as an additional reference in assessing the performance of a resampling method in which metrics combine precision and recall values ​​with more emphasis on recall values. In the case of this study, the recall value indicates how many actual occurrences of churn customers can be predicted by the algorithm model. Therefore, it is very useful when the losses incurred due to false negatives or failure to predict cases of churn customers are relatively greater than the losses incurred from false positives or wrong predictions of churn customers.

##### **Imbalanced Testing**

Next, a testing process will be carried out using data test for all resampling methods tested. This can be done using method `imbalance_test_score()`.

In [5]:
testing_score = imbalance_val_score(estimator).imbalance_test_score(
    X_trainval=X_trainval,
    y_trainval=y_trainval,
    X_test=X_test,
    y_test=y_test
)

testing_comparison = pd.DataFrame(testing_score).apply(
    func=lambda x: round(
        number=x*100,
        ndigits=2
    )
).transpose()

testing_comparison.rename(
    index={
        0:'Over Sampling',
        1:'Under Sampling',
        2:'SMOTE',
        3:'Near Miss',
        4:'Normal Sampling'
    },
    columns={
        0:'AP Score',
        1:'F2 Score',
        2:'Precision',
        3:'Recall'
    }
)

Unnamed: 0,AP Score,F2 Score,Precision,Recall
Over Sampling,69.25,75.37,55.45,82.81
Under Sampling,67.75,76.42,54.34,85.07
SMOTE,69.34,73.23,57.24,78.73
Near Miss,30.31,59.57,34.86,72.4
Normal Sampling,70.24,61.04,69.31,59.28


It can be seen that SMOTE method is able to outperform other methods in terms of Average Precision measurement but has no better performance in terms of F2 Score measurement. This is similar to the results of the previous cross validation score. On the other hand, the over sampling and under sampling methods have relatively the same performance in terms of Average Precision and F2 Score. However, based on the results of the cross validation score, the under sampling method has a better level of stability in predicting the probability of each possible recall value. Based on overall performance, the under sampling method will be used in the process of developing the necessary algorithm model according to the existing business problem.

Next, a classification report for each resampling method will be displayed as additional information using method `imbalance_classification_report()`.

In [6]:
reports = imbalance_val_score(estimator).imbalance_classification_report(
    X_trainval=X_trainval,
    y_trainval=y_trainval,
    X_test=X_test,
    y_test=y_test
)

for report,j in zip(reports,['Over Sampling','Under Sampling','SMOTE','Near Miss','Normal Sampling']):
    print(
        j.center(55,'='),
        '\n\n'+report+'\n'+('=').center(55,'='),
        '\n'
    )


              precision    recall  f1-score   support

           0       0.94      0.80      0.86       723
           1       0.55      0.83      0.66       221

    accuracy                           0.80       944
   macro avg       0.75      0.81      0.76       944
weighted avg       0.85      0.80      0.82       944



              precision    recall  f1-score   support

           0       0.94      0.78      0.86       723
           1       0.54      0.85      0.66       221

    accuracy                           0.80       944
   macro avg       0.74      0.82      0.76       944
weighted avg       0.85      0.80      0.81       944



              precision    recall  f1-score   support

           0       0.93      0.82      0.87       723
           1       0.57      0.79      0.66       221

    accuracy                           0.81       944
   macro avg       0.75      0.80      0.77       944
weighted avg       0.84      0.81      0.82       944



            