### **Importing Related Notebooks** 

In [2]:
import import_ipynb
import Imbalanced_Dataset

estimator = Imbalanced_Dataset.estimator
X_trainval = Imbalanced_Dataset.X_trainval
y_trainval = Imbalanced_Dataset.y_trainval
X_test = Imbalanced_Dataset.X_test
y_test = Imbalanced_Dataset.y_test

### **Hyperparameter Tuning**

Basically the developed algorithm model can produce more optimal performance by adjusting a set of hyperparameters from the algorithm model itself. The goal of hyperparameter tuning is to find the combination of parameters that produces the best model performance, including searching through hyperparameter tuning to find optimal values ​​based on certain evaluation metrics, such as recall score.

##### **Best Estimator Searching**

The process can be executed with the technical assistance of `GridSearchCV()` using a predefined set of hyperparameter values. This algorithm will fully evaluate all possible hyperparameter combinations using cross validation. Until finally will choose a combination that is able to produce the best performance. Hyperparameter tuning is very important to do because the selected hyperparameter can have a significant impact on the learning ability of the algorithm model and is able to generalize from the data used.

Based on previous findings, the algorithm model used is `GradientBoostingClassifier()` and is given resampling treatment using `RandomUnderSampler()`. For hyperparameter space used, refer to the documentation from **Scikit-learn** at [following link](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html). The main reference evaluation metrics will use recall score because it has a direct impact on predicting customers who were actually churned.

In [3]:
from sklearn.model_selection import GridSearchCV
from imblearn.pipeline import Pipeline
from imblearn.under_sampling import RandomUnderSampler

model_pipeline = Pipeline([
    (
        'under_sampler',
        RandomUnderSampler(random_state=1995)
    ),
    (
        'model',
        estimator
    )
])

hyperparam_space = {
    'model__loss':['log_loss','exponential'],
    'model__learning_rate':[0.001,0.1,1],
    'model__n_estimators':[50,100,150],
    'model__max_depth':[3,5,7],
    'model__max_features':['sqrt','log2',None],
    'model__warm_start':[True,False],
    'model__validation_fraction':[0.1,0.2,0.3],
    'model__n_iter_no_change':[5,10,20]
}

grid_search = GridSearchCV(
    estimator=model_pipeline,
    param_grid=hyperparam_space,
    scoring='recall',
    cv=5,
    n_jobs=-1
)

grid_search.fit(
    X=X_trainval,
    y=y_trainval
)

After hyperparameter tuning process, the next step is to find hyperparameter combinations that are able to provide the best algorithm model performance based on metrics recall score evaluation.

In [4]:
import pandas as pd

grid_results = pd.DataFrame(data=grid_search.cv_results_)
best_result = grid_results[grid_results['rank_test_score']<=1]
display(best_result['params'].values[0])

{'model__learning_rate': 0.001,
 'model__loss': 'log_loss',
 'model__max_depth': 3,
 'model__max_features': None,
 'model__n_estimators': 50,
 'model__n_iter_no_change': 5,
 'model__validation_fraction': 0.2,
 'model__warm_start': True}

The hyperparameter combination above will govern the behavior of the algorithm model in learning the given training data. This behavior will of course result in different performance from default hyperparameter of the algorithm model itself.

##### **Best Estimator Testing and Comparison**

Then a testing process will be carried out on the test data to measure the performance of the algorithm model in predicting new data and comparing the results with the algorithm model in the default hyperparameter condition.

In [5]:
from sklearn.metrics import average_precision_score, fbeta_score, precision_score, recall_score, classification_report

best_model = grid_search.best_estimator_
default_model = model_pipeline
ap_scores, f2_scores, pr_scores, re_scores = [], [], [], []

for i, model in enumerate([default_model,best_model]):
    model.fit(
        X=X_trainval,
        y=y_trainval
    )

    ap_scores.append(average_precision_score(
        y_true=y_test,
        y_score=model.predict_proba(X_test)[:,1]
    ))

    f2_scores.append(fbeta_score(
        y_true=y_test,
        y_pred=model.predict(X_test),
        beta=2
    ))

    pr_scores.append(precision_score(
        y_true=y_test,
        y_pred=model.predict(X_test)
    ))

    re_scores.append(recall_score(
        y_true=y_test,
        y_pred=model.predict(X_test)
    ))

    info = 'Tuned Classification Report' if i == 1 else 'Default Classification Report'
    report = classification_report(
        y_true=y_test,
        y_pred=model.predict(X_test)
    )

    print(
        info.center(55,'='),
        '\n\n'+report+'\n\n',
        ('=').center(55,'='),'\n'
    )


              precision    recall  f1-score   support

           0       0.94      0.78      0.86       723
           1       0.54      0.85      0.66       221

    accuracy                           0.80       944
   macro avg       0.74      0.82      0.76       944
weighted avg       0.85      0.80      0.81       944




              precision    recall  f1-score   support

           0       0.97      0.62      0.75       723
           1       0.43      0.93      0.58       221

    accuracy                           0.69       944
   macro avg       0.70      0.77      0.67       944
weighted avg       0.84      0.69      0.71       944





Based on the classification report above, the algorithm model with tuned hyperparameter is able to produce slightly better performance than the algorithm model with default hyperparameter. This can be seen in recall score evaluation metrics. On other metrics evaluations, tuned hyperparameter produces relatively worse performance as an impact of precision-recall trade-off. This indicates that the algorithm model with tuned hyperparameter is able to provide better performance and comply with business problem solutions.

The following table can be used as an additional reference to see the performance of the two algorithm models.

In [6]:
comparison_result = pd.DataFrame(data={
    'AP Score':[
        ap_scores[0],
        ap_scores[1]
    ],
    'F2 Score':[
        f2_scores[0],
        f2_scores[1]
    ],
    'Precision':[
        pr_scores[0],
        pr_scores[1]
    ],
    'Recall':[
        re_scores[0],
        re_scores[1]
    ]},
    index=['Default','Tuned'])

comparison_result.apply(func=lambda x: round(
    number=x*100,
    ndigits=2
))

Unnamed: 0,AP Score,F2 Score,Precision,Recall
Default,67.75,76.42,54.34,85.07
Tuned,54.9,75.09,42.62,92.76
