#Machine Learning Models for prepared significant variables dataset

**In this notebook I'm utilizing five Machine Learning algorithms and one Deep Learning algorithm on initially cleaned dataset. Making use of RandomizedSearch in pipelines to find out best hyperparameters for ML algorithms. I'll perform some additional preparations of dataset, divide into train and test subsets, encoding into numbers with pandas get_dummies and OrdinalEncoder, using StandardScaller for scaling, SMOTEENN to make classes equal and PCA to decrease amount of variables**

Imports:

In [352]:
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from xgboost.sklearn import XGBClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression

from sklearn.preprocessing import (StandardScaler, 
                                   OrdinalEncoder, 
                                   MinMaxScaler)

from sklearn.model_selection import (train_test_split, 
                                     GridSearchCV, 
                                     StratifiedKFold, 
                                     RandomizedSearchCV)

from imblearn.over_sampling import SMOTE
from imblearn.combine import SMOTEENN
from imblearn.pipeline import Pipeline as imbpipeline
from sklearn.pipeline import Pipeline
from sklearn.metrics import (classification_report, 
                             roc_auc_score, 
                             make_scorer, 
                             recall_score, 
                             confusion_matrix, 
                             accuracy_score,
                            get_scorer_names)
from sklearn.decomposition import PCA

Loading dataset:

In [243]:
data_clean = pd.read_pickle("data/data_clear.pkl")

Dividing into predictor variables X and target y ("is_canceled"):

In [245]:
X = data_clean.drop("is_canceled", axis=1)
y = data_clean.is_canceled

Splitting dataset into train and test subsets with test size 30% and train 70%:

In [246]:
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.3,
                                                    stratify=y,
                                                    random_state=42
                                                   )

Shape after division

In [247]:
X_train.shape

(83573, 27)

In [248]:
X_test.shape

(35817, 27)

Outlier value of column adr found in a file "Data_Preparations" now is to be replaced with mean of adr column.

In [249]:
(X_train["adr"]==5400).sum()

1

In [250]:
(X_test["adr"]==5400).sum()

0

In [251]:
if (X_train["adr"]==5400).sum() > 0:
    X_train.replace({5400.0:np.round(X_train.adr.mean(), 2)}, inplace=True) #filling inordinary adr value with mean of training set adr column
    print("Outlier observations in train subset = ", (X_train["adr"]==5400).sum())
elif (X_test["adr"]==5400).sum() > 0:
    X_test.replace({5400.0:np.round(X_train.adr.mean(), 2)}, inplace=True)
    print("Outlier observations in test subset = ", (X_test["adr"]==5400).sum())

Abnormal observations in train subset =  0


Encoding columns of most numerous classes with OrdinalEncoder:

In [252]:
data_label_train = X_train[["agent", "company", "country", "reservation_status_date", "arrival_date"]]
data_label_test = X_test[["agent", "company", "country", "reservation_status_date", "arrival_date"]]

In [253]:
ode = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1)
ode.fit(data_label_train)
data_label_train_ode = pd.DataFrame(ode.transform(data_label_train),
                                    columns=["agent", "company", "country", "reservation_status_date", "arrival_date"])
data_label_test_ode = pd.DataFrame(ode.transform(data_label_test), 
                                   columns=["agent", "company", "country", "reservation_status_date", "arrival_date"])

In [254]:
data_label_train_ode

Unnamed: 0,agent,company,country,reservation_status_date,arrival_date
0,288.0,323.0,125.0,400.0,562.0
1,98.0,323.0,125.0,375.0,258.0
2,316.0,92.0,125.0,886.0,770.0
3,316.0,76.0,56.0,449.0,330.0
4,316.0,323.0,125.0,714.0,597.0
...,...,...,...,...,...
83568,0.0,323.0,12.0,339.0,220.0
83569,143.0,323.0,125.0,640.0,660.0
83570,99.0,323.0,31.0,817.0,699.0
83571,193.0,323.0,125.0,304.0,231.0


Updating encoded columns:

In [255]:
X_train.drop(["agent", "company", "country", "reservation_status_date", "arrival_date"], axis=1, inplace=True)
X_test.drop(["agent", "company", "country", "reservation_status_date", "arrival_date"], axis=1, inplace=True)

In [256]:
X_train = pd.concat([X_train.reset_index(drop=True), data_label_train_ode.reset_index(drop=True)], axis=1)
X_test = pd.concat([X_test.reset_index(drop=True), data_label_test_ode.reset_index(drop=True)], axis=1)

In [257]:
X_train.shape

(83573, 27)

Encoding training and test subsets with get_dummies:

In [258]:
X_train = pd.get_dummies(X_train, drop_first=True)

In [259]:
X_test = pd.get_dummies(X_test, drop_first=True)
X_test = X_test.reindex(columns = X_train.columns, fill_value=0)

In [260]:
X_train.shape

(83573, 59)

Initiating StandardScaler for further data scaling:

In [261]:
scaler = StandardScaler()

Initiating Principal Components with ten components reducing dimentions to ten components :

In [262]:
pca = PCA(n_components=10)

Initiating algorithm to ballance unballanced data- SMOTEENN:

In [265]:
SMOTEEN = SMOTEENN()

RandomForestClassifier algorythm with RandomizedGridSearch in pipeline, scaling reducing, ballancing:

In [266]:
stratified_kfold = StratifiedKFold(n_splits=5,
                                       shuffle=True,
                                       random_state=11)
#imbpipeline
pipeline_rf = imbpipeline(steps=[
    ['scaler', scaler],
    ['pca', pca],
    ['smote', SMOTEEN],
    ['rf', RandomForestClassifier()]])
    
param_distributions_rf = {
    'rf__n_estimators': [20, 100, 300],
    'rf__max_depth': [10, 20],
    'rf__min_samples_split': [5, 10],
    'pca__n_components': [5, 10, 20]
}

search_rf = RandomizedSearchCV(pipeline_rf, 
                               param_distributions_rf, 
                               n_iter=10, 
                               cv=stratified_kfold, 
                               scoring='roc_auc',
                               verbose=3
                              )

search_rf.fit(X_train, y_train)
y_pred_rf = search_rf.best_estimator_.predict(X_test)
print("Random Forest:")
print(search_rf.best_params_)
print(f'Results on test: {search_rf.best_estimator_.score(X_test, y_test)}')
print(f'Results on train: {search_rf.best_estimator_.score(X_train, y_train)}')

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5] END pca__n_components=20, rf__max_depth=10, rf__min_samples_split=5, rf__n_estimators=300;, score=0.891 total time= 1.4min
[CV 2/5] END pca__n_components=20, rf__max_depth=10, rf__min_samples_split=5, rf__n_estimators=300;, score=0.887 total time= 1.4min
[CV 3/5] END pca__n_components=20, rf__max_depth=10, rf__min_samples_split=5, rf__n_estimators=300;, score=0.894 total time= 1.3min
[CV 4/5] END pca__n_components=20, rf__max_depth=10, rf__min_samples_split=5, rf__n_estimators=300;, score=0.891 total time= 1.3min
[CV 5/5] END pca__n_components=20, rf__max_depth=10, rf__min_samples_split=5, rf__n_estimators=300;, score=0.895 total time= 1.3min
[CV 1/5] END pca__n_components=10, rf__max_depth=20, rf__min_samples_split=5, rf__n_estimators=300;, score=0.898 total time= 1.1min
[CV 2/5] END pca__n_components=10, rf__max_depth=20, rf__min_samples_split=5, rf__n_estimators=300;, score=0.897 total time= 1.1min
[CV 3/5] END pc

Achieving scores of classification, saving accuracy, recall and F1 score in data frame:

Best hyperparameters:

In [None]:
search_rf.best_params_

In [267]:
#print(get_scorer_names())

In [268]:
y_pred_rf

array([0, 1, 1, ..., 0, 0, 0])

In [269]:
print(classification_report(y_test, y_pred_rf))

              precision    recall  f1-score   support

           0       0.80      0.91      0.85     22550
           1       0.79      0.61      0.69     13267

    accuracy                           0.80     35817
   macro avg       0.80      0.76      0.77     35817
weighted avg       0.80      0.80      0.79     35817



In [270]:
A_report_rf = pd.DataFrame(classification_report(y_test, y_pred_rf, output_dict=True))

In [324]:
for i, name in enumerate(A_report_rf.columns):
  A_report_rf = A_report_rf.rename(columns={(A_report_rf.iloc[:,i].name): ('RF_'+A_report_xgb.iloc[:,i].name)})

In [325]:
A_report_rf

Unnamed: 0,RF_0,RF_1,RF_accuracy,RF_macro avg,RF_weighted avg
precision,0.798638,0.791443,0.796577,0.795041,0.795973
recall,0.9051,0.61212,0.796577,0.75861,0.796577
f1-score,0.848543,0.690326,0.796577,0.769435,0.789938
support,22550.0,13267.0,0.796577,35817.0,35817.0


DecisionTreeClassifier algorythm with RandomizedGridSearch in pipeline, scaling reducing, ballancing:

In [271]:
stratified_kfold = StratifiedKFold(n_splits=5,
                                       shuffle=True,
                                       random_state=13)

pipeline = imbpipeline(steps = [['scaler', scaler],
                                ['pca', pca],
                                ['smote', SMOTEEN],
                                ['dtc', DecisionTreeClassifier()]])

    
param_grid = {'dtc__max_leaf_nodes' : [2, 5, 10, 30], 
             'dtc__max_depth': [4, 10, 20, 40],
             'dtc__random_state' : [23],
             'pca__n_components': [5, 10, 20]
             }

search_dtc = GridSearchCV(estimator=pipeline,
                           param_grid=param_grid,
                           scoring='roc_auc',
                           cv=stratified_kfold,                           
                          verbose=3,
                           #n_jobs=3
                         )

search_dtc.fit(X_train, y_train)
y_pred_dtc = search_dtc.best_estimator_.predict(X_test)
cv_score = search_dtc.best_score_
test_score = search_dtc.score(X_test, y_test)
print(f'Cross-validation score: {cv_score}\nTest score: {test_score}')
print("Decision Tree:")
print(search_rf.best_params_)
print(f'Results on test: {search_rf.best_estimator_.score(X_test, y_test)}')
print(f'Results on train: {search_rf.best_estimator_.score(X_train, y_train)}')

Fitting 5 folds for each of 144 candidates, totalling 720 fits
[CV 1/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=43, pca__n_components=5;, score=0.644 total time=   2.6s
[CV 2/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=43, pca__n_components=5;, score=0.638 total time=   2.5s
[CV 3/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=43, pca__n_components=5;, score=0.641 total time=   2.5s
[CV 4/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=43, pca__n_components=5;, score=0.646 total time=   2.7s
[CV 5/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=43, pca__n_components=5;, score=0.644 total time=   2.6s
[CV 1/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=43, pca__n_components=10;, score=0.643 total time=   7.3s
[CV 2/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=43, pca__n_components=10;, score=0.632 total time=   6.6s
[CV 3/5] END dtc__max_depth=4, d

[CV 4/5] END dtc__max_depth=4, dtc__max_leaf_nodes=5, dtc__random_state=11, pca__n_components=5;, score=0.714 total time=   2.7s
[CV 5/5] END dtc__max_depth=4, dtc__max_leaf_nodes=5, dtc__random_state=11, pca__n_components=5;, score=0.720 total time=   2.8s
[CV 1/5] END dtc__max_depth=4, dtc__max_leaf_nodes=5, dtc__random_state=11, pca__n_components=10;, score=0.717 total time=   7.4s
[CV 2/5] END dtc__max_depth=4, dtc__max_leaf_nodes=5, dtc__random_state=11, pca__n_components=10;, score=0.713 total time=   6.7s
[CV 3/5] END dtc__max_depth=4, dtc__max_leaf_nodes=5, dtc__random_state=11, pca__n_components=10;, score=0.718 total time=   7.0s
[CV 4/5] END dtc__max_depth=4, dtc__max_leaf_nodes=5, dtc__random_state=11, pca__n_components=10;, score=0.718 total time=   6.8s
[CV 5/5] END dtc__max_depth=4, dtc__max_leaf_nodes=5, dtc__random_state=11, pca__n_components=10;, score=0.724 total time=   7.6s
[CV 1/5] END dtc__max_depth=4, dtc__max_leaf_nodes=5, dtc__random_state=11, pca__n_component

[CV 2/5] END dtc__max_depth=4, dtc__max_leaf_nodes=10, dtc__random_state=23, pca__n_components=10;, score=0.762 total time=   7.1s
[CV 3/5] END dtc__max_depth=4, dtc__max_leaf_nodes=10, dtc__random_state=23, pca__n_components=10;, score=0.760 total time=   6.7s
[CV 4/5] END dtc__max_depth=4, dtc__max_leaf_nodes=10, dtc__random_state=23, pca__n_components=10;, score=0.761 total time=   6.8s
[CV 5/5] END dtc__max_depth=4, dtc__max_leaf_nodes=10, dtc__random_state=23, pca__n_components=10;, score=0.761 total time=   6.6s
[CV 1/5] END dtc__max_depth=4, dtc__max_leaf_nodes=10, dtc__random_state=23, pca__n_components=20;, score=0.757 total time=  20.0s
[CV 2/5] END dtc__max_depth=4, dtc__max_leaf_nodes=10, dtc__random_state=23, pca__n_components=20;, score=0.759 total time=  19.9s
[CV 3/5] END dtc__max_depth=4, dtc__max_leaf_nodes=10, dtc__random_state=23, pca__n_components=20;, score=0.765 total time=  19.9s
[CV 4/5] END dtc__max_depth=4, dtc__max_leaf_nodes=10, dtc__random_state=23, pca__n

[CV 5/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=43, pca__n_components=10;, score=0.646 total time=   6.8s
[CV 1/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=43, pca__n_components=20;, score=0.643 total time=  20.4s
[CV 2/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=43, pca__n_components=20;, score=0.638 total time=  22.3s
[CV 3/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=43, pca__n_components=20;, score=0.641 total time=  22.7s
[CV 4/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=43, pca__n_components=20;, score=0.646 total time=  22.1s
[CV 5/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=43, pca__n_components=20;, score=0.643 total time=  23.8s
[CV 1/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=11, pca__n_components=5;, score=0.643 total time=   3.2s
[CV 2/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=11, pca__n_

[CV 3/5] END dtc__max_depth=10, dtc__max_leaf_nodes=5, dtc__random_state=11, pca__n_components=20;, score=0.713 total time=  20.0s
[CV 4/5] END dtc__max_depth=10, dtc__max_leaf_nodes=5, dtc__random_state=11, pca__n_components=20;, score=0.713 total time=  20.0s
[CV 5/5] END dtc__max_depth=10, dtc__max_leaf_nodes=5, dtc__random_state=11, pca__n_components=20;, score=0.714 total time=  20.1s
[CV 1/5] END dtc__max_depth=10, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_components=5;, score=0.713 total time=   2.6s
[CV 2/5] END dtc__max_depth=10, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_components=5;, score=0.714 total time=   2.7s
[CV 3/5] END dtc__max_depth=10, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_components=5;, score=0.720 total time=   2.6s
[CV 4/5] END dtc__max_depth=10, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_components=5;, score=0.710 total time=   2.6s
[CV 5/5] END dtc__max_depth=10, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_com

[CV 1/5] END dtc__max_depth=10, dtc__max_leaf_nodes=30, dtc__random_state=43, pca__n_components=5;, score=0.787 total time=   2.8s
[CV 2/5] END dtc__max_depth=10, dtc__max_leaf_nodes=30, dtc__random_state=43, pca__n_components=5;, score=0.795 total time=   2.7s
[CV 3/5] END dtc__max_depth=10, dtc__max_leaf_nodes=30, dtc__random_state=43, pca__n_components=5;, score=0.791 total time=   2.9s
[CV 4/5] END dtc__max_depth=10, dtc__max_leaf_nodes=30, dtc__random_state=43, pca__n_components=5;, score=0.794 total time=   2.8s
[CV 5/5] END dtc__max_depth=10, dtc__max_leaf_nodes=30, dtc__random_state=43, pca__n_components=5;, score=0.800 total time=   2.7s
[CV 1/5] END dtc__max_depth=10, dtc__max_leaf_nodes=30, dtc__random_state=43, pca__n_components=10;, score=0.797 total time=   7.3s
[CV 2/5] END dtc__max_depth=10, dtc__max_leaf_nodes=30, dtc__random_state=43, pca__n_components=10;, score=0.795 total time=   6.9s
[CV 3/5] END dtc__max_depth=10, dtc__max_leaf_nodes=30, dtc__random_state=43, pca

[CV 4/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=11, pca__n_components=5;, score=0.647 total time=   2.6s
[CV 5/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=11, pca__n_components=5;, score=0.644 total time=   2.7s
[CV 1/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=11, pca__n_components=10;, score=0.645 total time=   7.4s
[CV 2/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=11, pca__n_components=10;, score=0.636 total time=   6.4s
[CV 3/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=11, pca__n_components=10;, score=0.642 total time=   7.1s
[CV 4/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=11, pca__n_components=10;, score=0.647 total time=   6.8s
[CV 5/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=11, pca__n_components=10;, score=0.641 total time=   7.3s
[CV 1/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=11, pca__n_c

[CV 2/5] END dtc__max_depth=20, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_components=10;, score=0.714 total time=   7.7s
[CV 3/5] END dtc__max_depth=20, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_components=10;, score=0.706 total time=   8.3s
[CV 4/5] END dtc__max_depth=20, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_components=10;, score=0.724 total time=   7.5s
[CV 5/5] END dtc__max_depth=20, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_components=10;, score=0.715 total time=   7.3s
[CV 1/5] END dtc__max_depth=20, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_components=20;, score=0.709 total time=  21.8s
[CV 2/5] END dtc__max_depth=20, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_components=20;, score=0.704 total time=  21.7s
[CV 3/5] END dtc__max_depth=20, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_components=20;, score=0.714 total time=  21.8s
[CV 4/5] END dtc__max_depth=20, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n

[CV 5/5] END dtc__max_depth=20, dtc__max_leaf_nodes=30, dtc__random_state=43, pca__n_components=10;, score=0.809 total time=   7.8s
[CV 1/5] END dtc__max_depth=20, dtc__max_leaf_nodes=30, dtc__random_state=43, pca__n_components=20;, score=0.808 total time=  20.4s
[CV 2/5] END dtc__max_depth=20, dtc__max_leaf_nodes=30, dtc__random_state=43, pca__n_components=20;, score=0.810 total time=  20.4s
[CV 3/5] END dtc__max_depth=20, dtc__max_leaf_nodes=30, dtc__random_state=43, pca__n_components=20;, score=0.821 total time=  20.5s
[CV 4/5] END dtc__max_depth=20, dtc__max_leaf_nodes=30, dtc__random_state=43, pca__n_components=20;, score=0.816 total time=  20.5s
[CV 5/5] END dtc__max_depth=20, dtc__max_leaf_nodes=30, dtc__random_state=43, pca__n_components=20;, score=0.818 total time=  20.3s
[CV 1/5] END dtc__max_depth=20, dtc__max_leaf_nodes=30, dtc__random_state=11, pca__n_components=5;, score=0.788 total time=   2.9s
[CV 2/5] END dtc__max_depth=20, dtc__max_leaf_nodes=30, dtc__random_state=11,

[CV 3/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=11, pca__n_components=20;, score=0.641 total time=  22.1s
[CV 4/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=11, pca__n_components=20;, score=0.646 total time=  22.1s
[CV 5/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=11, pca__n_components=20;, score=0.644 total time=  20.9s
[CV 1/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=5;, score=0.644 total time=   2.7s
[CV 2/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=5;, score=0.638 total time=   2.7s
[CV 3/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=5;, score=0.640 total time=   2.7s
[CV 4/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=5;, score=0.647 total time=   2.7s
[CV 5/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_com

[CV 1/5] END dtc__max_depth=40, dtc__max_leaf_nodes=10, dtc__random_state=43, pca__n_components=5;, score=0.750 total time=   2.8s
[CV 2/5] END dtc__max_depth=40, dtc__max_leaf_nodes=10, dtc__random_state=43, pca__n_components=5;, score=0.754 total time=   2.8s
[CV 3/5] END dtc__max_depth=40, dtc__max_leaf_nodes=10, dtc__random_state=43, pca__n_components=5;, score=0.761 total time=   3.0s
[CV 4/5] END dtc__max_depth=40, dtc__max_leaf_nodes=10, dtc__random_state=43, pca__n_components=5;, score=0.764 total time=   2.9s
[CV 5/5] END dtc__max_depth=40, dtc__max_leaf_nodes=10, dtc__random_state=43, pca__n_components=5;, score=0.763 total time=   2.9s
[CV 1/5] END dtc__max_depth=40, dtc__max_leaf_nodes=10, dtc__random_state=43, pca__n_components=10;, score=0.755 total time=   7.3s
[CV 2/5] END dtc__max_depth=40, dtc__max_leaf_nodes=10, dtc__random_state=43, pca__n_components=10;, score=0.750 total time=   7.1s
[CV 3/5] END dtc__max_depth=40, dtc__max_leaf_nodes=10, dtc__random_state=43, pca

[CV 4/5] END dtc__max_depth=40, dtc__max_leaf_nodes=30, dtc__random_state=11, pca__n_components=5;, score=0.784 total time=   2.9s
[CV 5/5] END dtc__max_depth=40, dtc__max_leaf_nodes=30, dtc__random_state=11, pca__n_components=5;, score=0.793 total time=   3.1s
[CV 1/5] END dtc__max_depth=40, dtc__max_leaf_nodes=30, dtc__random_state=11, pca__n_components=10;, score=0.799 total time=   7.3s
[CV 2/5] END dtc__max_depth=40, dtc__max_leaf_nodes=30, dtc__random_state=11, pca__n_components=10;, score=0.805 total time=   7.8s
[CV 3/5] END dtc__max_depth=40, dtc__max_leaf_nodes=30, dtc__random_state=11, pca__n_components=10;, score=0.803 total time=   8.0s
[CV 4/5] END dtc__max_depth=40, dtc__max_leaf_nodes=30, dtc__random_state=11, pca__n_components=10;, score=0.806 total time=   7.5s
[CV 5/5] END dtc__max_depth=40, dtc__max_leaf_nodes=30, dtc__random_state=11, pca__n_components=10;, score=0.806 total time=   7.4s
[CV 1/5] END dtc__max_depth=40, dtc__max_leaf_nodes=30, dtc__random_state=11, 

Achieving scores of classification, saving accuracy, recall and F1 score in data frame:

Best hyperparameters:

In [None]:
search_dtc.best_params_

In [272]:
y_pred_dtc

array([0, 1, 1, ..., 1, 1, 0])

In [326]:
print(classification_report(y_test, y_pred_dtc))
A_report_dtc = pd.DataFrame(classification_report(y_test, y_pred_dtc, output_dict=True))

              precision    recall  f1-score   support

           0       0.80      0.78      0.79     22550
           1       0.64      0.67      0.66     13267

    accuracy                           0.74     35817
   macro avg       0.72      0.73      0.72     35817
weighted avg       0.74      0.74      0.74     35817



In [327]:
for i, name in enumerate(A_report_dtc.columns):
  A_report_dtc = A_report_dtc.rename(columns={(A_report_dtc.iloc[:,i].name): ('DTC_'+A_report_dtc.iloc[:,i].name)})


In [328]:
A_report_dtc

Unnamed: 0,DTC_0,DTC_1,DTC_accuracy,DTC_macro avg,DTC_weighted avg
precision,0.801758,0.639499,0.738448,0.720629,0.741656
recall,0.776585,0.673626,0.738448,0.725106,0.738448
f1-score,0.788971,0.656119,0.738448,0.722545,0.739761
support,22550.0,13267.0,0.738448,35817.0,35817.0


Support Vector Classifier algorythm with RandomizedGridSearch in pipeline, scaling reducing, ballancing:

In [274]:
stratified_kfold = StratifiedKFold(n_splits=5,
                                       shuffle=True,
                                       random_state=23)

pipeline_SVC = imbpipeline([('scaler', scaler),
                            ('pca', pca),
                            ('SMOTE', SMOTEEN),
                            ('SVC', SVC())])
    
params_SVC = {
              'SVC__gamma': ['auto'],
              'SVC__max_iter': [150, 300, 500],
              'SVC__decision_function_shape': ['ovo'],
              'SVC__degree': [1],
              'SVC__kernel': ['rbf'],
              'SVC__random_state': [11],
              'pca__n_components': [5, 10, 20]
             }

search_SVC = GridSearchCV(pipeline_SVC,
                             params_SVC,
                             scoring='roc_auc',
                             cv=stratified_kfold,
                            verbose=3,
                            #n_jobs=3
                         )

search_SVC.fit(X_train, y_train)

cv_score = search_SVC.best_score_
test_score = search_SVC.score(X_test, y_test)
print(f'Cross-validation score: {cv_score}\nTest score: {test_score}')
print("Support Vector:")
print(search_SVC.best_params_)
print(f'Results on test: {search_SVC.best_estimator_.score(X_test, y_test)}')
print(f'Results on train: {search_SVC.best_estimator_.score(X_train, y_train)}')


Fitting 5 folds for each of 9 candidates, totalling 45 fits




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=5;, score=0.613 total time=   3.8s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=5;, score=0.603 total time=   3.7s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=5;, score=0.540 total time=   3.8s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=5;, score=0.583 total time=   3.7s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=5;, score=0.501 total time=   3.7s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=10;, score=0.592 total time=   8.8s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=10;, score=0.675 total time=   8.3s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=10;, score=0.462 total time=   8.3s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=10;, score=0.647 total time=   7.9s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=10;, score=0.586 total time=   8.1s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=20;, score=0.682 total time=  21.2s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=20;, score=0.629 total time=  21.2s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=20;, score=0.652 total time=  21.0s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=20;, score=0.572 total time=  21.4s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=20;, score=0.634 total time=  21.3s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=5;, score=0.580 total time=   5.0s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=5;, score=0.577 total time=   4.9s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=5;, score=0.570 total time=   5.0s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=5;, score=0.606 total time=   4.9s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=5;, score=0.530 total time=   4.9s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=10;, score=0.598 total time=  10.0s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=10;, score=0.467 total time=   9.8s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=10;, score=0.534 total time=  10.0s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=10;, score=0.600 total time=   9.4s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=10;, score=0.489 total time=   9.6s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=20;, score=0.688 total time=  22.7s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=20;, score=0.673 total time=  22.8s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=20;, score=0.622 total time=  23.0s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=20;, score=0.694 total time=  22.8s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=20;, score=0.663 total time=  22.7s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=5;, score=0.532 total time=   6.4s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=5;, score=0.588 total time=   6.4s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=5;, score=0.645 total time=   6.5s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=5;, score=0.454 total time=   6.4s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=5;, score=0.606 total time=   6.5s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=10;, score=0.692 total time=  11.2s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=10;, score=0.629 total time=  11.6s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=10;, score=0.553 total time=  11.6s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=10;, score=0.561 total time=  12.9s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=10;, score=0.590 total time=  11.5s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=20;, score=0.656 total time=  27.3s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=20;, score=0.664 total time=  26.4s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=20;, score=0.672 total time=  26.3s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=20;, score=0.701 total time=  27.0s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=20;, score=0.692 total time=  27.5s




Cross-validation score: 0.6772759494225262
Test score: 0.6486881860314934
Support Vector:
{'SVC__decision_function_shape': 'ovo', 'SVC__degree': 1, 'SVC__gamma': 'auto', 'SVC__kernel': 'rbf', 'SVC__max_iter': 500, 'SVC__random_state': 11, 'pca__n_components': 20}
Wyniki test: 0.6097663120864394
Wyniki train: 0.6038553121223361


Achieving scores of classification, saving accuracy, recall and F1 score in data frame:

In [275]:
y_pred_SVC_train = search_SVC.best_estimator_.predict(X_train)

In [276]:
y_pred_svc_test = search_SVC.best_estimator_.predict(X_test)

In [277]:
y_pred_SVC = search_SVC.predict(X_test)

Best hyperparameters:

In [278]:
search_SVC.best_params_

{'SVC__decision_function_shape': 'ovo',
 'SVC__degree': 1,
 'SVC__gamma': 'auto',
 'SVC__kernel': 'rbf',
 'SVC__max_iter': 500,
 'SVC__random_state': 11,
 'pca__n_components': 20}

In [316]:
print(classification_report(y_test, y_pred_SVC))
A_report_svc = pd.DataFrame(classification_report(y_test, y_pred_SVC, output_dict=True))

              precision    recall  f1-score   support

           0       0.66      0.78      0.72     22550
           1       0.46      0.32      0.38     13267

    accuracy                           0.61     35817
   macro avg       0.56      0.55      0.55     35817
weighted avg       0.59      0.61      0.59     35817



In [280]:
A_report_svc

Unnamed: 0,0,1,accuracy,macro avg,weighted avg
precision,0.661566,0.46177,0.609766,0.561668,0.587559
recall,0.778359,0.323208,0.609766,0.550784,0.609766
f1-score,0.715226,0.38026,0.609766,0.547743,0.591151
support,22550.0,13267.0,0.609766,35817.0,35817.0


In [317]:
for i, name in enumerate(A_report_svc.columns):
  A_report_svc = A_report_svc.rename(columns={(A_report_svc.iloc[:,i].name): ('SVC_'+A_report_svc.iloc[:,i].name)})


In [318]:
A_report_svc

Unnamed: 0,SVC_0,SVC_1,SVC_accuracy,SVC_macro avg,SVC_weighted avg
precision,0.661566,0.46177,0.609766,0.561668,0.587559
recall,0.778359,0.323208,0.609766,0.550784,0.609766
f1-score,0.715226,0.38026,0.609766,0.547743,0.591151
support,22550.0,13267.0,0.609766,35817.0,35817.0


XGBClassifier algorythm with RandomizedGridSearch in pipeline, scaling reducing, ballancing:

In [284]:
stratified_kfold = StratifiedKFold(n_splits=3,
                                       shuffle=True,
                                       random_state=77)

pipeline = imbpipeline(steps=[('scaler', scaler),
                              ('pca', pca),
                              ('smote', SMOTEEN),
                              ('XGB', XGBClassifier())])

params = {
    'XGB__n_estimators': [100, 500, 800],
    'XGB__max_depth': [3, 5, 10],
    'XGB__learning_rate': [0.1, 0.5],
    'pca__n_components': [5, 10, 20]
    }

search_XGB = GridSearchCV(pipeline, 
                          params, 
                          scoring='roc_auc', 
                          cv=stratified_kfold, 
                          verbose=3,
                        #n_jobs=3
                         ) 

search_XGB.fit(X_train, y_train) 
accuracy_score(y_test, search_XGB.predict(X_test))

Fitting 3 folds for each of 54 candidates, totalling 162 fits
[CV 1/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=5;, score=0.813 total time=   3.8s
[CV 2/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=5;, score=0.824 total time=   3.6s
[CV 3/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=5;, score=0.818 total time=   3.6s
[CV 1/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=10;, score=0.838 total time=   8.0s
[CV 2/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=10;, score=0.849 total time=   7.7s
[CV 3/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=10;, score=0.843 total time=   7.5s
[CV 1/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=20;, score=0.864 total time=  18.1s
[CV 2/3] END XGB_

[CV 3/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=100, pca__n_components=20;, score=0.912 total time= 1.4min
[CV 1/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=500, pca__n_components=5;, score=0.871 total time=  31.0s
[CV 2/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=500, pca__n_components=5;, score=0.872 total time=  31.8s
[CV 3/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=500, pca__n_components=5;, score=0.870 total time=  31.1s
[CV 1/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=500, pca__n_components=10;, score=0.899 total time=  52.0s
[CV 2/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=500, pca__n_components=10;, score=0.901 total time=  50.9s
[CV 3/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=500, pca__n_components=10;, score=0.900 total time=  50.5s
[CV 1/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimator

[CV 3/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=500, pca__n_components=20;, score=0.913 total time= 1.0min
[CV 1/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, pca__n_components=5;, score=0.856 total time=  30.0s
[CV 2/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, pca__n_components=5;, score=0.868 total time=  27.0s
[CV 3/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, pca__n_components=5;, score=0.864 total time=  27.1s
[CV 1/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, pca__n_components=10;, score=0.890 total time=  50.1s
[CV 2/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, pca__n_components=10;, score=0.893 total time=  46.3s
[CV 3/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, pca__n_components=10;, score=0.894 total time=  49.1s
[CV 1/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, p

0.8155903621185471

Achieving scores of classification, saving accuracy, recall and F1 score in data frame:

Best hyperparameters:

In [285]:
search_XGB.best_params_

{'XGB__learning_rate': 0.1,
 'XGB__max_depth': 10,
 'XGB__n_estimators': 500,
 'pca__n_components': 20}

In [286]:
#XGBClassifier().get_params().keys()

In [287]:
search_XGB.cv_results_["mean_test_score"]

array([0.81814827, 0.8433423 , 0.8663882 , 0.8423512 , 0.87293739,
       0.89460945, 0.84606465, 0.87772284, 0.90008461, 0.84032295,
       0.87218937, 0.89291861, 0.85804679, 0.88999013, 0.90896097,
       0.86083837, 0.89276161, 0.9122498 , 0.86490965, 0.8947273 ,
       0.91238036, 0.87090103, 0.90015782, 0.91644698, 0.87122343,
       0.89984058, 0.91609685, 0.84118567, 0.8721394 , 0.89191442,
       0.85291425, 0.88541865, 0.90549588, 0.85539647, 0.88527059,
       0.90759859, 0.85603816, 0.88531402, 0.90509017, 0.86314625,
       0.891598  , 0.91146266, 0.86276763, 0.89220987, 0.91328466,
       0.86857464, 0.89596866, 0.91276622, 0.86598245, 0.89888359,
       0.91459415, 0.8673686 , 0.89776334, 0.91266936])

In [288]:
accuracy_score(y_test, search_XGB.predict(X_test))

0.8155903621185471

In [289]:
y_pred_XGB = search_XGB.best_estimator_.predict(X_test)
test_score = search_XGB.score(X_test, y_test)
cv_score = search_XGB.best_score_

In [290]:
print(f'Cross-validation score: {cv_score}\nTest score: {test_score}')
print("XGBClassifier:")
print(search_XGB.best_params_)
print(f'Results on test: {search_XGB.best_estimator_.score(X_test, y_test)}')
print(f'Results on train: {search_XGB.best_estimator_.score(X_train, y_train)}')

Cross-validation score: 0.9164469813848809
Test score: 0.8862855455335973
XGBClassifier:
{'XGB__learning_rate': 0.1, 'XGB__max_depth': 10, 'XGB__n_estimators': 500, 'pca__n_components': 20}
Results on test: 0.8155903621185471
Results on train: 0.8821389683270913


In [364]:
print(classification_report(y_test, y_pred_XGB))
A_report_xgb = pd.DataFrame(classification_report(y_test, y_pred_XGB, output_dict=True))

              precision    recall  f1-score   support

           0       0.82      0.90      0.86     22550
           1       0.80      0.67      0.73     13267

    accuracy                           0.82     35817
   macro avg       0.81      0.79      0.79     35817
weighted avg       0.81      0.82      0.81     35817



In [365]:
for i, name in enumerate(A_report_xgb.columns):
  A_report_xgb = A_report_xgb.rename(columns={(A_report_xgb.iloc[:,i].name): ('XGB_'+A_report_xgb.iloc[:,i].name)})


In [366]:
A_report_xgb

Unnamed: 0,XGB_0,XGB_1,XGB_accuracy,XGB_macro avg,XGB_weighted avg
precision,0.822499,0.800198,0.81559,0.811349,0.814239
recall,0.901685,0.669255,0.81559,0.78547,0.81559
f1-score,0.860274,0.728892,0.81559,0.794583,0.811609
support,22550.0,13267.0,0.81559,35817.0,35817.0


LogisticRegression algorythm with RandomizedGridSearch in pipeline, scaling reducing, ballancing:

In [293]:
pipeline = imbpipeline(steps = [['scaler', scaler],
                                ['pca', pca],
                                ['smote', SMOTEEN],
                                ['LR', LogisticRegression()]])

stratified_kfold = StratifiedKFold(n_splits=5,
                                       shuffle=True,
                                       random_state=13)
    
param_grid = {'LR__C':[20, 50, 70],
             'LR__random_state': [11],
             'LR__multi_class': ['auto'],
             'LR__max_iter': [100, 200, 500],
             'LR__solver': ['saga'],
             'LR__penalty': ['l2', 'l1'],
             'pca__n_components': [5, 10, 20]
             }
                                                                 
search_LR = GridSearchCV(estimator=pipeline,
                           param_grid=param_grid,
                           scoring='roc_auc',
                           cv=stratified_kfold,
                           verbose=3,
                           #n_jobs=3
                        )

search_LR.fit(X_train, y_train)
cv_score = search_LR.best_score_
test_score = search_LR.score(X_test, y_test)
print(f'Cross-validation score: {cv_score}\nTest score: {test_score}')
print("XGBClassifier:")
print(search_LR.best_params_)
print(f'Results on test: {search_LR.best_estimator_.score(X_test, y_test)}')
print(f'Results on train: {search_LR.best_estimator_.score(X_train, y_train)}')

Fitting 5 folds for each of 54 candidates, totalling 270 fits
[CV 1/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.5s
[CV 2/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.7s
[CV 3/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.8s
[CV 4/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.769 total time=   2.7s
[CV 5/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.773 total time=   2.8s
[CV 1/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__rando



[CV 1/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.832 total time=  23.6s




[CV 2/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.833 total time=  23.3s




[CV 3/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.841 total time=  23.4s




[CV 4/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  23.2s




[CV 5/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  23.5s
[CV 1/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.5s
[CV 2/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.5s
[CV 3/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.770 total time=   2.5s
[CV 4/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.5s
[CV 5/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  24.8s




[CV 2/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.834 total time=  24.2s




[CV 3/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  24.1s




[CV 4/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  23.9s




[CV 5/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  24.0s
[CV 1/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.762 total time=   2.5s
[CV 2/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.6s
[CV 3/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.5s
[CV 4/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.5s
[CV 5/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.834 total time=  27.2s




[CV 2/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  26.9s




[CV 3/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  26.8s




[CV 4/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.834 total time=  26.6s




[CV 5/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  26.8s
[CV 1/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.5s
[CV 2/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.5s
[CV 3/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.769 total time=   2.5s
[CV 4/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.5s
[CV 5/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.834 total time=  28.0s




[CV 2/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  27.9s




[CV 3/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  27.8s




[CV 4/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  28.0s




[CV 5/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  27.9s
[CV 1/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.5s
[CV 2/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.5s
[CV 3/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.769 total time=   2.6s
[CV 4/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.5s
[CV 5/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.831 total time=  36.5s




[CV 2/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  36.6s




[CV 3/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.841 total time=  36.7s




[CV 4/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.842 total time=  36.6s




[CV 5/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  36.4s
[CV 1/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.760 total time=   2.5s
[CV 2/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.5s
[CV 3/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.5s
[CV 4/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.5s
[CV 5/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.76



[CV 1/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  40.1s




[CV 2/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.834 total time=  39.7s




[CV 3/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.840 total time=  40.7s




[CV 4/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  39.9s




[CV 5/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  39.6s
[CV 1/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.4s
[CV 2/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.763 total time=   2.5s
[CV 3/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.4s
[CV 4/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.5s
[CV 5/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.834 total time=  23.5s




[CV 2/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.833 total time=  24.7s




[CV 3/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  23.8s




[CV 4/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  23.3s




[CV 5/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  22.8s
[CV 1/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.5s
[CV 2/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.5s
[CV 3/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.5s
[CV 4/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.4s
[CV 5/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.832 total time=  24.3s




[CV 2/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  24.1s




[CV 3/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  24.2s




[CV 4/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  23.8s




[CV 5/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.834 total time=  24.1s
[CV 1/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.4s
[CV 2/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.5s
[CV 3/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.769 total time=   2.5s
[CV 4/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.5s
[CV 5/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.834 total time=  26.4s




[CV 2/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  27.4s




[CV 3/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  27.2s




[CV 4/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  27.0s




[CV 5/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.832 total time=  26.1s
[CV 1/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.4s
[CV 2/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.4s
[CV 3/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.770 total time=   2.6s
[CV 4/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.5s
[CV 5/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.830 total time=  28.1s




[CV 2/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.840 total time=  27.8s
[CV 3/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  21.5s




[CV 4/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  28.3s




[CV 5/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  27.9s
[CV 1/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.5s
[CV 2/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.5s
[CV 3/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.5s
[CV 4/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.5s
[CV 5/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.832 total time=  39.1s




[CV 2/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.832 total time=  38.8s




[CV 3/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  38.8s




[CV 4/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  36.1s




[CV 5/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  38.8s
[CV 1/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.5s
[CV 2/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.763 total time=   2.5s
[CV 3/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.6s
[CV 4/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.5s
[CV 5/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.833 total time=  39.6s




[CV 2/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  40.1s




[CV 3/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  39.8s




[CV 4/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  39.9s




[CV 5/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.833 total time=  39.7s
[CV 1/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.4s
[CV 2/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.5s
[CV 3/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.769 total time=   2.5s
[CV 4/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.5s
[CV 5/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.832 total time=  24.8s




[CV 2/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.831 total time=  25.4s




[CV 3/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  24.2s




[CV 4/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  23.4s




[CV 5/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  23.5s
[CV 1/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.5s
[CV 2/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.5s
[CV 3/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.5s
[CV 4/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.5s
[CV 5/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  23.8s




[CV 2/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  23.8s




[CV 3/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  24.3s




[CV 4/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  24.8s




[CV 5/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.833 total time=  24.3s
[CV 1/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.5s
[CV 2/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.5s
[CV 3/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.769 total time=   2.5s
[CV 4/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.5s
[CV 5/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.76



[CV 3/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=10;, score=0.785 total time=  10.6s
[CV 4/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=10;, score=0.783 total time=   7.2s
[CV 5/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=10;, score=0.785 total time=   7.8s




[CV 1/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.833 total time=  26.8s




[CV 2/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  27.0s




[CV 3/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  26.6s




[CV 4/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.834 total time=  26.6s




[CV 5/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  26.6s
[CV 1/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.5s
[CV 2/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.6s
[CV 3/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.5s
[CV 4/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.5s
[CV 5/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  28.3s




[CV 2/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  27.5s




[CV 3/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  28.1s




[CV 4/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.840 total time=  28.5s




[CV 5/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  28.6s
[CV 1/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.6s
[CV 2/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.5s
[CV 3/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.4s
[CV 4/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.5s
[CV 5/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.76



[CV 1/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.831 total time=  36.3s




[CV 2/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  40.2s




[CV 3/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  37.1s




[CV 4/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.840 total time=  36.9s




[CV 5/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  38.2s
[CV 1/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.761 total time=   2.7s
[CV 2/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.8s
[CV 3/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.7s
[CV 4/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.8s
[CV 5/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.831 total time=  40.4s




[CV 2/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  40.2s




[CV 3/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  40.2s




[CV 4/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.840 total time=  40.5s




[CV 5/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  40.0s




Cross-validation score: 0.8375790928036766
Test score: 0.8352140691514565
XGBClassifier:
{'LR__C': 70, 'LR__max_iter': 200, 'LR__multi_class': 'auto', 'LR__penalty': 'l1', 'LR__random_state': 11, 'LR__solver': 'saga', 'pca__n_components': 20}
Results on test: 0.7558980372448837
Results on train: 0.7490696756129372


Achieving scores of classification, saving accuracy, recall and F1 score in data frame:

Best hyperparameters:

In [294]:
search_LR.best_params_

{'LR__C': 70,
 'LR__max_iter': 200,
 'LR__multi_class': 'auto',
 'LR__penalty': 'l1',
 'LR__random_state': 11,
 'LR__solver': 'saga',
 'pca__n_components': 20}

In [295]:
y_pred_lr = search_LR.best_estimator_.predict(X_test)

In [345]:
print(classification_report(y_test, y_pred_lr))
A_report_lr = pd.DataFrame(classification_report(y_test, y_pred_lr, output_dict=True))

              precision    recall  f1-score   support

           0       0.83      0.77      0.80     22550
           1       0.65      0.73      0.69     13267

    accuracy                           0.76     35817
   macro avg       0.74      0.75      0.74     35817
weighted avg       0.76      0.76      0.76     35817



In [346]:
for i, name in enumerate(A_report_lr.columns):
  A_report_lr = A_report_lr.rename(columns={(A_report_lr.iloc[:,i].name): ('LR_'+A_report_lr.iloc[:,i].name)})


In [347]:
A_report_lr

Unnamed: 0,LR_0,LR_1,LR_accuracy,LR_macro avg,LR_weighted avg
precision,0.830833,0.651304,0.755898,0.741069,0.764334
recall,0.768825,0.733926,0.755898,0.751376,0.755898
f1-score,0.798627,0.690151,0.755898,0.744389,0.758447
support,22550.0,13267.0,0.755898,35817.0,35817.0


Utilizing Multi Layer Perceptron algorythm with RandomizedGridSearch in pipeline, scaling reducing, ballancing:

In [297]:
pipeline = imbpipeline(steps = [['scaler', scaler],
                                ['pca', pca],
                                ['smote', SMOTEEN],
                                ['MLP', MLPClassifier()]])

stratified_kfold = StratifiedKFold(n_splits=5,
                                       shuffle=True,
                                       random_state=13)
    
param_grid = {'MLP__hidden_layer_sizes':[8, 4, 16],
             'MLP__activation': ['relu'],
              'MLP__solver': ['adam'],
              'MLP__random_state': [42],
              'MLP__max_iter': [1000],
              'MLP__batch_size': [32],
              'pca__n_components': [5, 10, 20]
             }
                                                                 
search_MLP = GridSearchCV(estimator=pipeline,
                           param_grid=param_grid,
                           scoring='roc_auc',
                           cv=stratified_kfold,
                           verbose=3,
                           #n_jobs=3
                        )

search_MLP.fit(X_train, y_train)
cv_score = search_MLP.best_score_
test_score = search_MLP.score(X_test, y_test)
print(f'Cross-validation score: {cv_score}\nTest score: {test_score}')

Fitting 5 folds for each of 9 candidates, totalling 45 fits
[CV 1/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=8, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=5;, score=0.810 total time=  40.5s
[CV 2/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=8, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=5;, score=0.814 total time=  36.6s
[CV 3/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=8, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=5;, score=0.816 total time=  37.0s
[CV 4/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=8, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=5;, score=0.813 total time=  51.3s
[CV 5/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=8, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=

[CV 3/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=16, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=20;, score=0.900 total time= 2.1min
[CV 4/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=16, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=20;, score=0.895 total time= 2.5min
[CV 5/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=16, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=20;, score=0.894 total time= 2.2min
Cross-validation score: 0.8951873891650045
Test score: 0.8771899568423862


Achieving scores of classification, saving accuracy, recall and F1 score in data frame:

Best hyperparameters:

In [298]:
search_MLP.best_params_

{'MLP__activation': 'relu',
 'MLP__batch_size': 32,
 'MLP__hidden_layer_sizes': 16,
 'MLP__max_iter': 1000,
 'MLP__random_state': 42,
 'MLP__solver': 'adam',
 'pca__n_components': 20}

In [338]:
y_pred_mlp = search_MLP.predict(X_test)
print(classification_report(y_test, y_pred_mlp))
A_report_mlp = pd.DataFrame(classification_report(y_test, y_pred_mlp, output_dict=True))

              precision    recall  f1-score   support

           0       0.83      0.88      0.85     22550
           1       0.78      0.68      0.73     13267

    accuracy                           0.81     35817
   macro avg       0.80      0.78      0.79     35817
weighted avg       0.81      0.81      0.81     35817



In [339]:
for i, name in enumerate(A_report_mlp.columns):
  A_report_mlp = A_report_mlp.rename(columns={(A_report_mlp.iloc[:,i].name): ('MLP_'+A_report_mlp.iloc[:,i].name)})


In [340]:
A_report_mlp

Unnamed: 0,MLP_0,MLP_1,MLP_accuracy,MLP_macro avg,MLP_weighted avg
precision,0.826294,0.77639,0.810006,0.801342,0.807809
recall,0.88408,0.684103,0.810006,0.784092,0.810006
f1-score,0.854211,0.727331,0.810006,0.790771,0.807213
support,22550.0,13267.0,0.810006,35817.0,35817.0


Creating Data Frame containing all six classifiers results:

In [367]:
A_results = pd.concat([A_report_rf, 
                       A_report_dtc, 
                       A_report_svc, 
                       A_report_xgb, 
                       A_report_lr, 
                       A_report_mlp], 
                      axis=1)

In [368]:
A_results

Unnamed: 0,RF_0,RF_1,RF_accuracy,RF_macro avg,RF_weighted avg,DTC_0,DTC_1,DTC_accuracy,DTC_macro avg,DTC_weighted avg,...,LR_0,LR_1,LR_accuracy,LR_macro avg,LR_weighted avg,MLP_0,MLP_1,MLP_accuracy,MLP_macro avg,MLP_weighted avg
precision,0.798638,0.791443,0.796577,0.795041,0.795973,0.801758,0.639499,0.738448,0.720629,0.741656,...,0.830833,0.651304,0.755898,0.741069,0.764334,0.826294,0.77639,0.810006,0.801342,0.807809
recall,0.9051,0.61212,0.796577,0.75861,0.796577,0.776585,0.673626,0.738448,0.725106,0.738448,...,0.768825,0.733926,0.755898,0.751376,0.755898,0.88408,0.684103,0.810006,0.784092,0.810006
f1-score,0.848543,0.690326,0.796577,0.769435,0.789938,0.788971,0.656119,0.738448,0.722545,0.739761,...,0.798627,0.690151,0.755898,0.744389,0.758447,0.854211,0.727331,0.810006,0.790771,0.807213
support,22550.0,13267.0,0.796577,35817.0,35817.0,22550.0,13267.0,0.738448,35817.0,35817.0,...,22550.0,13267.0,0.755898,35817.0,35817.0,22550.0,13267.0,0.810006,35817.0,35817.0


Saving results in a file:

In [369]:
A_results.to_pickle("data/A_dataset_results.pkl")

In [355]:
A_results = pd.read_pickle("data/A_dataset_results.pkl")

In [356]:
A_results

Unnamed: 0,RF_0,RF_1,RF_accuracy,RF_macro avg,RF_weighted avg,DTC_0,DTC_1,DTC_accuracy,DTC_macro avg,DTC_weighted avg,...,LR_0,LR_1,LR_accuracy,LR_macro avg,LR_weighted avg,MLP_0,MLP_1,MLP_accuracy,MLP_macro avg,MLP_weighted avg
precision,0.798638,0.791443,0.796577,0.795041,0.795973,0.801758,0.639499,0.738448,0.720629,0.741656,...,0.830833,0.651304,0.755898,0.741069,0.764334,0.826294,0.77639,0.810006,0.801342,0.807809
recall,0.9051,0.61212,0.796577,0.75861,0.796577,0.776585,0.673626,0.738448,0.725106,0.738448,...,0.768825,0.733926,0.755898,0.751376,0.755898,0.88408,0.684103,0.810006,0.784092,0.810006
f1-score,0.848543,0.690326,0.796577,0.769435,0.789938,0.788971,0.656119,0.738448,0.722545,0.739761,...,0.798627,0.690151,0.755898,0.744389,0.758447,0.854211,0.727331,0.810006,0.790771,0.807213
support,22550.0,13267.0,0.796577,35817.0,35817.0,22550.0,13267.0,0.738448,35817.0,35817.0,...,22550.0,13267.0,0.755898,35817.0,35817.0,22550.0,13267.0,0.810006,35817.0,35817.0


In [371]:
c_list = ["RF_1", "DTC_1", "SVC_1", "XGB_1", "LR_1", "MLP_1"]

precision tells us the accuracy of positive predictions
recall shows what is real probability of canceling reservation, correctly identified positive predictions.
F1 score measures precision and recall at the same time by finding the harmonic mean of the two values

In [375]:
A_results[c_list]#.loc["recall"]

Unnamed: 0,RF_1,DTC_1,SVC_1,XGB_1,LR_1,MLP_1
precision,0.791443,0.639499,0.46177,0.800198,0.651304,0.77639
recall,0.61212,0.673626,0.323208,0.669255,0.733926,0.684103
f1-score,0.690326,0.656119,0.38026,0.728892,0.690151,0.727331
support,13267.0,13267.0,13267.0,13267.0,13267.0,13267.0


In [None]:
B_results = pd.read_pickle("data/B_results.pkl")

In [None]:
B_results

In [None]:
B_results[c_list]