#Machine Learning Models for prepared significant variables dataset

**In this notebook I'm utilizing five Machine Learning algorithms and one Deep Learning algorithm on initially cleaned dataset. Making use of RandomizedSearch in pipelines to find out best hyperparameters for ML algorithms. I'll perform some additional preparations of dataset, divide into train and test subsets, encoding into numbers with pandas get_dummies and OrdinalEncoder, using StandardScaller for scaling, SMOTEENN to make classes equal and PCA to decrease amount of variables**

Imports:

In [1]:
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from xgboost.sklearn import XGBClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression

from sklearn.preprocessing import (StandardScaler, 
                                   OrdinalEncoder, 
                                   MinMaxScaler)

from sklearn.model_selection import (train_test_split, 
                                     GridSearchCV, 
                                     StratifiedKFold, 
                                     RandomizedSearchCV)

from imblearn.over_sampling import SMOTE
from imblearn.combine import SMOTEENN
from imblearn.pipeline import Pipeline as imbpipeline
from sklearn.pipeline import Pipeline
from sklearn.metrics import (classification_report, 
                             roc_auc_score, 
                             make_scorer, 
                             recall_score, 
                             confusion_matrix, 
                             accuracy_score,
                            get_scorer_names)
from sklearn.decomposition import PCA

Loading dataset:

In [2]:
data_clean = pd.read_pickle("data/data_clear.pkl")

Dividing into predictor variables X and target y ("is_canceled"):

In [3]:
X = data_clean.drop("is_canceled", axis=1)
y = data_clean.is_canceled

Splitting dataset into train and test subsets with test size 30% and train 70%:

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.3,
                                                    stratify=y,
                                                    random_state=42
                                                   )

Shape after division

In [5]:
X_train.shape

(83573, 27)

In [6]:
X_test.shape

(35817, 27)

Inputting NaNs in country column with the most frequent value ()max of train subset into train and test:

In [7]:
country_input = X_train["country"][X_train.country.value_counts().max()]

In [8]:
X_train.country.fillna(country_input, inplace=True)

In [9]:
X_test.country.fillna(country_input, inplace=True)

Inputting NaNs in agent column with the most frequent value ()max of train subset into train and test:

In [10]:
agent_input = X_train["agent"][X_train.agent.value_counts().max()]

In [11]:
X_train.agent.fillna(agent_input, inplace=True)

In [12]:
X_test.agent.fillna(agent_input, inplace=True)

Outlier value of column adr found in a file "Data_Preparations" now is to be replaced with mean of adr column.

In [13]:
(X_train["adr"]==5400).sum()

1

In [14]:
(X_test["adr"]==5400).sum()

0

In [15]:
if (X_train["adr"]==5400).sum() > 0:
    X_train.replace({5400.0:np.round(X_train.adr.mean(), 2)}, inplace=True) #filling inordinary adr value with mean of training set adr column
    print("Outlier observations in train subset = ", (X_train["adr"]==5400).sum())
elif (X_test["adr"]==5400).sum() > 0:
    X_test.replace({5400.0:np.round(X_train.adr.mean(), 2)}, inplace=True)
    print("Outlier observations in test subset = ", (X_test["adr"]==5400).sum())

Outlier observations in train subset =  0


Encoding columns of most numerous classes with OrdinalEncoder:

In [16]:
data_label_train = X_train[["agent", "company", "country", "reservation_status_date", "arrival_date"]]
data_label_test = X_test[["agent", "company", "country", "reservation_status_date", "arrival_date"]]

In [17]:
ode = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1)
ode.fit(data_label_train)
data_label_train_ode = pd.DataFrame(ode.transform(data_label_train),
                                    columns=["agent", "company", "country", "reservation_status_date", "arrival_date"])
data_label_test_ode = pd.DataFrame(ode.transform(data_label_test), 
                                   columns=["agent", "company", "country", "reservation_status_date", "arrival_date"])

In [18]:
data_label_train_ode

Unnamed: 0,agent,company,country,reservation_status_date,arrival_date
0,288.0,323.0,125.0,400.0,562.0
1,98.0,323.0,125.0,375.0,258.0
2,316.0,92.0,125.0,886.0,770.0
3,316.0,76.0,56.0,449.0,330.0
4,316.0,323.0,125.0,714.0,597.0
...,...,...,...,...,...
83568,0.0,323.0,12.0,339.0,220.0
83569,143.0,323.0,125.0,640.0,660.0
83570,99.0,323.0,31.0,817.0,699.0
83571,193.0,323.0,125.0,304.0,231.0


Updating encoded columns:

In [19]:
X_train.drop(["agent", "company", "country", "reservation_status_date", "arrival_date"], axis=1, inplace=True)
X_test.drop(["agent", "company", "country", "reservation_status_date", "arrival_date"], axis=1, inplace=True)

In [20]:
X_train = pd.concat([X_train.reset_index(drop=True), data_label_train_ode.reset_index(drop=True)], axis=1)
X_test = pd.concat([X_test.reset_index(drop=True), data_label_test_ode.reset_index(drop=True)], axis=1)

In [21]:
X_train.shape

(83573, 27)

Encoding training and test subsets with get_dummies:

In [22]:
X_train = pd.get_dummies(X_train, drop_first=True)

In [23]:
X_test = pd.get_dummies(X_test, drop_first=True)
X_test = X_test.reindex(columns = X_train.columns, fill_value=0)

In [24]:
X_train.shape

(83573, 59)

Initiating StandardScaler for further data scaling:

In [25]:
scaler = StandardScaler()

Initiating Principal Components with ten components reducing dimentions to ten components :

In [26]:
pca = PCA(n_components=10)

Initiating algorithm to ballance unballanced data- SMOTEENN:

In [27]:
SMOTEEN = SMOTEENN()

RandomForestClassifier algorythm with RandomizedGridSearch in pipeline, scaling reducing, ballancing:

In [28]:
stratified_kfold = StratifiedKFold(n_splits=5,
                                       shuffle=True,
                                       random_state=11)
#imbpipeline
pipeline_rf = imbpipeline(steps=[
    ['scaler', scaler],
    ['pca', pca],
    ['smote', SMOTEEN],
    ['rf', RandomForestClassifier()]])
    
param_distributions_rf = {
    'rf__n_estimators': [20, 100, 300],
    'rf__max_depth': [10, 20],
    'rf__min_samples_split': [5, 10],
    'pca__n_components': [5, 10, 20]
}

search_rf = RandomizedSearchCV(pipeline_rf, 
                               param_distributions_rf, 
                               n_iter=10, 
                               cv=stratified_kfold, 
                               scoring='roc_auc',
                               verbose=3
                              )

search_rf.fit(X_train, y_train)
y_pred_rf = search_rf.best_estimator_.predict(X_test)
print("Random Forest:")
print(search_rf.best_params_)
print(f'Results on test: {search_rf.best_estimator_.score(X_test, y_test)}')
print(f'Results on train: {search_rf.best_estimator_.score(X_train, y_train)}')

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5] END pca__n_components=10, rf__max_depth=10, rf__min_samples_split=10, rf__n_estimators=100;, score=0.871 total time=  27.0s
[CV 2/5] END pca__n_components=10, rf__max_depth=10, rf__min_samples_split=10, rf__n_estimators=100;, score=0.865 total time=  27.0s
[CV 3/5] END pca__n_components=10, rf__max_depth=10, rf__min_samples_split=10, rf__n_estimators=100;, score=0.878 total time=  26.5s
[CV 4/5] END pca__n_components=10, rf__max_depth=10, rf__min_samples_split=10, rf__n_estimators=100;, score=0.872 total time=  26.6s
[CV 5/5] END pca__n_components=10, rf__max_depth=10, rf__min_samples_split=10, rf__n_estimators=100;, score=0.877 total time=  26.5s
[CV 1/5] END pca__n_components=5, rf__max_depth=20, rf__min_samples_split=5, rf__n_estimators=300;, score=0.881 total time=  49.9s
[CV 2/5] END pca__n_components=5, rf__max_depth=20, rf__min_samples_split=5, rf__n_estimators=300;, score=0.876 total time=  50.8s
[CV 3/5] END

Achieving scores of classification, saving accuracy, recall and F1 score in data frame:

Best hyperparameters:

In [29]:
search_rf.best_params_

{'rf__n_estimators': 20,
 'rf__min_samples_split': 5,
 'rf__max_depth': 20,
 'pca__n_components': 20}

In [30]:
#print(get_scorer_names())

In [31]:
y_pred_rf

array([0, 1, 1, ..., 0, 0, 0])

In [32]:
print(classification_report(y_test, y_pred_rf))

              precision    recall  f1-score   support

           0       0.82      0.90      0.86     22550
           1       0.79      0.66      0.72     13267

    accuracy                           0.81     35817
   macro avg       0.81      0.78      0.79     35817
weighted avg       0.81      0.81      0.81     35817



In [33]:
A_report_rf = pd.DataFrame(classification_report(y_test, y_pred_rf, output_dict=True))

In [35]:
for i, name in enumerate(A_report_rf.columns):
  A_report_rf = A_report_rf.rename(columns={(A_report_rf.iloc[:,i].name): ('RF_'+A_report_rf.iloc[:,i].name)})

In [36]:
A_report_rf

Unnamed: 0,RF_0,RF_1,RF_accuracy,RF_macro avg,RF_weighted avg
precision,0.817547,0.792963,0.809979,0.805255,0.808441
recall,0.898758,0.659079,0.809979,0.778919,0.809979
f1-score,0.856232,0.719849,0.809979,0.78804,0.805714
support,22550.0,13267.0,0.809979,35817.0,35817.0


DecisionTreeClassifier algorythm with RandomizedGridSearch in pipeline, scaling reducing, ballancing:

In [37]:
stratified_kfold = StratifiedKFold(n_splits=5,
                                       shuffle=True,
                                       random_state=13)

pipeline = imbpipeline(steps = [['scaler', scaler],
                                ['pca', pca],
                                ['smote', SMOTEEN],
                                ['dtc', DecisionTreeClassifier()]])

    
param_grid = {'dtc__max_leaf_nodes' : [2, 5, 10, 30], 
             'dtc__max_depth': [4, 10, 20, 40],
             'dtc__random_state' : [23],
             'pca__n_components': [5, 10, 20]
             }

search_dtc = GridSearchCV(estimator=pipeline,
                           param_grid=param_grid,
                           scoring='roc_auc',
                           cv=stratified_kfold,                           
                          verbose=3,
                           #n_jobs=3
                         )

search_dtc.fit(X_train, y_train)
y_pred_dtc = search_dtc.best_estimator_.predict(X_test)
cv_score = search_dtc.best_score_
test_score = search_dtc.score(X_test, y_test)
print(f'Cross-validation score: {cv_score}\nTest score: {test_score}')
print("Decision Tree:")
print(search_rf.best_params_)
print(f'Results on test: {search_rf.best_estimator_.score(X_test, y_test)}')
print(f'Results on train: {search_rf.best_estimator_.score(X_train, y_train)}')

Fitting 5 folds for each of 48 candidates, totalling 240 fits
[CV 1/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=5;, score=0.644 total time=   2.8s
[CV 2/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=5;, score=0.639 total time=   2.8s
[CV 3/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=5;, score=0.641 total time=   2.9s
[CV 4/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=5;, score=0.646 total time=   2.9s
[CV 5/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=5;, score=0.644 total time=   2.8s
[CV 1/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=10;, score=0.644 total time=   7.9s
[CV 2/5] END dtc__max_depth=4, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=10;, score=0.636 total time=   7.6s
[CV 3/5] END dtc__max_depth=4, dt

[CV 4/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=5;, score=0.646 total time=   2.7s
[CV 5/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=5;, score=0.643 total time=   2.6s
[CV 1/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=10;, score=0.644 total time=   7.9s
[CV 2/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=10;, score=0.639 total time=   7.4s
[CV 3/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=10;, score=0.641 total time=   7.4s
[CV 4/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=10;, score=0.647 total time=   7.4s
[CV 5/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=10;, score=0.643 total time=   7.6s
[CV 1/5] END dtc__max_depth=10, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_c

[CV 2/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=10;, score=0.638 total time=   6.7s
[CV 3/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=10;, score=0.641 total time=   7.1s
[CV 4/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=10;, score=0.646 total time=   6.8s
[CV 5/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=10;, score=0.643 total time=   7.6s
[CV 1/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=20;, score=0.643 total time=  19.6s
[CV 2/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=20;, score=0.638 total time=  19.7s
[CV 3/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=20;, score=0.640 total time=  19.8s
[CV 4/5] END dtc__max_depth=20, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n

[CV 5/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=10;, score=0.641 total time=   7.3s
[CV 1/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=20;, score=0.644 total time=  20.5s
[CV 2/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=20;, score=0.636 total time=  20.3s
[CV 3/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=20;, score=0.641 total time=  19.7s
[CV 4/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=20;, score=0.647 total time=  20.0s
[CV 5/5] END dtc__max_depth=40, dtc__max_leaf_nodes=2, dtc__random_state=23, pca__n_components=20;, score=0.644 total time=  19.6s
[CV 1/5] END dtc__max_depth=40, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_components=5;, score=0.715 total time=   2.6s
[CV 2/5] END dtc__max_depth=40, dtc__max_leaf_nodes=5, dtc__random_state=23, pca__n_

Achieving scores of classification, saving accuracy, recall and F1 score in data frame:

Best hyperparameters:

In [38]:
search_dtc.best_params_

{'dtc__max_depth': 20,
 'dtc__max_leaf_nodes': 30,
 'dtc__random_state': 23,
 'pca__n_components': 20}

In [39]:
y_pred_dtc

array([0, 1, 0, ..., 0, 0, 0])

In [40]:
print(classification_report(y_test, y_pred_dtc))
A_report_dtc = pd.DataFrame(classification_report(y_test, y_pred_dtc, output_dict=True))

              precision    recall  f1-score   support

           0       0.80      0.79      0.79     22550
           1       0.65      0.66      0.65     13267

    accuracy                           0.74     35817
   macro avg       0.72      0.72      0.72     35817
weighted avg       0.74      0.74      0.74     35817



In [41]:
for i, name in enumerate(A_report_dtc.columns):
  A_report_dtc = A_report_dtc.rename(columns={(A_report_dtc.iloc[:,i].name): ('DTC_'+A_report_dtc.iloc[:,i].name)})


In [42]:
A_report_dtc

Unnamed: 0,DTC_0,DTC_1,DTC_accuracy,DTC_macro avg,DTC_weighted avg
precision,0.797431,0.648731,0.741491,0.723081,0.742351
recall,0.790111,0.658853,0.741491,0.724482,0.741491
f1-score,0.793754,0.653753,0.741491,0.723753,0.741896
support,22550.0,13267.0,0.741491,35817.0,35817.0


Support Vector Classifier algorythm with RandomizedGridSearch in pipeline, scaling reducing, ballancing:

In [43]:
stratified_kfold = StratifiedKFold(n_splits=5,
                                       shuffle=True,
                                       random_state=23)

pipeline_SVC = imbpipeline([('scaler', scaler),
                            ('pca', pca),
                            ('SMOTE', SMOTEEN),
                            ('SVC', SVC())])
    
params_SVC = {
              'SVC__gamma': ['auto'],
              'SVC__max_iter': [150, 300, 500],
              'SVC__decision_function_shape': ['ovo'],
              'SVC__degree': [1],
              'SVC__kernel': ['rbf'],
              'SVC__random_state': [11],
              'pca__n_components': [5, 10, 20]
             }

search_SVC = GridSearchCV(pipeline_SVC,
                             params_SVC,
                             scoring='roc_auc',
                             cv=stratified_kfold,
                            verbose=3,
                            #n_jobs=3
                         )

search_SVC.fit(X_train, y_train)

cv_score = search_SVC.best_score_
test_score = search_SVC.score(X_test, y_test)
print(f'Cross-validation score: {cv_score}\nTest score: {test_score}')
print("Support Vector:")
print(search_SVC.best_params_)
print(f'Results on test: {search_SVC.best_estimator_.score(X_test, y_test)}')
print(f'Results on train: {search_SVC.best_estimator_.score(X_train, y_train)}')


Fitting 5 folds for each of 9 candidates, totalling 45 fits




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=5;, score=0.588 total time=   3.9s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=5;, score=0.649 total time=   3.9s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=5;, score=0.448 total time=   4.2s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=5;, score=0.371 total time=   4.0s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=5;, score=0.596 total time=   4.0s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=10;, score=0.578 total time=   8.8s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=10;, score=0.498 total time=   9.2s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=10;, score=0.659 total time=   8.1s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=10;, score=0.582 total time=   9.0s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=10;, score=0.587 total time=   9.2s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=20;, score=0.586 total time=  21.8s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=20;, score=0.648 total time=  21.7s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=20;, score=0.632 total time=  21.5s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=20;, score=0.673 total time=  22.0s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=150, SVC__random_state=11, pca__n_components=20;, score=0.682 total time=  23.4s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=5;, score=0.625 total time=   5.5s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=5;, score=0.592 total time=   5.3s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=5;, score=0.505 total time=   5.0s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=5;, score=0.627 total time=   5.1s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=5;, score=0.610 total time=   5.1s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=10;, score=0.588 total time=  10.7s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=10;, score=0.662 total time=  10.8s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=10;, score=0.650 total time=  10.5s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=10;, score=0.652 total time=  10.0s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=10;, score=0.524 total time=  10.0s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=20;, score=0.670 total time=  23.1s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=20;, score=0.709 total time=  23.1s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=20;, score=0.740 total time=  23.3s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=20;, score=0.637 total time=  23.2s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=300, SVC__random_state=11, pca__n_components=20;, score=0.657 total time=  23.2s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=5;, score=0.556 total time=   6.6s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=5;, score=0.633 total time=   6.6s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=5;, score=0.515 total time=   6.5s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=5;, score=0.600 total time=   6.5s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=5;, score=0.590 total time=   6.7s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=10;, score=0.625 total time=  11.7s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=10;, score=0.641 total time=  11.6s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=10;, score=0.590 total time=  11.7s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=10;, score=0.576 total time=  11.7s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=10;, score=0.572 total time=  11.0s




[CV 1/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=20;, score=0.732 total time=  25.7s




[CV 2/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=20;, score=0.657 total time=  25.4s




[CV 3/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=20;, score=0.740 total time=  25.6s




[CV 4/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=20;, score=0.587 total time=  25.7s




[CV 5/5] END SVC__decision_function_shape=ovo, SVC__degree=1, SVC__gamma=auto, SVC__kernel=rbf, SVC__max_iter=500, SVC__random_state=11, pca__n_components=20;, score=0.724 total time=  25.8s




Cross-validation score: 0.6881313221150505
Test score: 0.6736255454032369
Support Vector:
{'SVC__decision_function_shape': 'ovo', 'SVC__degree': 1, 'SVC__gamma': 'auto', 'SVC__kernel': 'rbf', 'SVC__max_iter': 500, 'SVC__random_state': 11, 'pca__n_components': 20}
Results on test: 0.6442192255074406
Results on train: 0.6253814030847283


Achieving scores of classification, saving accuracy, recall and F1 score in data frame:

In [44]:
y_pred_SVC_train = search_SVC.best_estimator_.predict(X_train)

In [45]:
y_pred_svc_test = search_SVC.best_estimator_.predict(X_test)

In [46]:
y_pred_SVC = search_SVC.predict(X_test)

Best hyperparameters:

In [47]:
search_SVC.best_params_

{'SVC__decision_function_shape': 'ovo',
 'SVC__degree': 1,
 'SVC__gamma': 'auto',
 'SVC__kernel': 'rbf',
 'SVC__max_iter': 500,
 'SVC__random_state': 11,
 'pca__n_components': 20}

In [48]:
print(classification_report(y_test, y_pred_SVC))
A_report_svc = pd.DataFrame(classification_report(y_test, y_pred_SVC, output_dict=True))

              precision    recall  f1-score   support

           0       0.74      0.67      0.70     22550
           1       0.52      0.60      0.55     13267

    accuracy                           0.64     35817
   macro avg       0.63      0.63      0.63     35817
weighted avg       0.66      0.64      0.65     35817



In [49]:
A_report_svc

Unnamed: 0,0,1,accuracy,macro avg,weighted avg
precision,0.738904,0.517133,0.644219,0.628018,0.656758
recall,0.67255,0.596065,0.644219,0.634308,0.644219
f1-score,0.704167,0.553801,0.644219,0.628984,0.64847
support,22550.0,13267.0,0.644219,35817.0,35817.0


In [50]:
for i, name in enumerate(A_report_svc.columns):
  A_report_svc = A_report_svc.rename(columns={(A_report_svc.iloc[:,i].name): ('SVC_'+A_report_svc.iloc[:,i].name)})


In [51]:
A_report_svc

Unnamed: 0,SVC_0,SVC_1,SVC_accuracy,SVC_macro avg,SVC_weighted avg
precision,0.738904,0.517133,0.644219,0.628018,0.656758
recall,0.67255,0.596065,0.644219,0.634308,0.644219
f1-score,0.704167,0.553801,0.644219,0.628984,0.64847
support,22550.0,13267.0,0.644219,35817.0,35817.0


XGBClassifier algorythm with RandomizedGridSearch in pipeline, scaling reducing, ballancing:

In [52]:
stratified_kfold = StratifiedKFold(n_splits=3,
                                       shuffle=True,
                                       random_state=77)

pipeline = imbpipeline(steps=[('scaler', scaler),
                              ('pca', pca),
                              ('smote', SMOTEEN),
                              ('XGB', XGBClassifier())])

params = {
    'XGB__n_estimators': [100, 500, 800],
    'XGB__max_depth': [3, 5, 10],
    'XGB__learning_rate': [0.1, 0.5],
    'pca__n_components': [5, 10, 20]
    }

search_XGB = GridSearchCV(pipeline, 
                          params, 
                          scoring='roc_auc', 
                          cv=stratified_kfold, 
                          verbose=3,
                        #n_jobs=3
                         ) 

search_XGB.fit(X_train, y_train) 
accuracy_score(y_test, search_XGB.predict(X_test))

Fitting 3 folds for each of 54 candidates, totalling 162 fits
[CV 1/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=5;, score=0.816 total time=   3.9s
[CV 2/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=5;, score=0.825 total time=   3.8s
[CV 3/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=5;, score=0.818 total time=   3.8s
[CV 1/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=10;, score=0.838 total time=   8.9s
[CV 2/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=10;, score=0.845 total time=   8.3s
[CV 3/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=10;, score=0.843 total time=   8.1s
[CV 1/3] END XGB__learning_rate=0.1, XGB__max_depth=3, XGB__n_estimators=100, pca__n_components=20;, score=0.865 total time=  18.4s
[CV 2/3] END XGB_

[CV 3/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=100, pca__n_components=20;, score=0.911 total time=  30.7s
[CV 1/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=500, pca__n_components=5;, score=0.869 total time=  39.1s
[CV 2/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=500, pca__n_components=5;, score=0.872 total time=  48.3s
[CV 3/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=500, pca__n_components=5;, score=0.871 total time=  46.7s
[CV 1/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=500, pca__n_components=10;, score=0.896 total time=  52.7s
[CV 2/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=500, pca__n_components=10;, score=0.898 total time=  50.5s
[CV 3/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimators=500, pca__n_components=10;, score=0.900 total time= 1.0min
[CV 1/3] END XGB__learning_rate=0.1, XGB__max_depth=10, XGB__n_estimator

[CV 3/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=500, pca__n_components=20;, score=0.914 total time=  46.2s
[CV 1/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, pca__n_components=5;, score=0.860 total time=  25.5s
[CV 2/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, pca__n_components=5;, score=0.865 total time=  22.3s
[CV 3/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, pca__n_components=5;, score=0.865 total time=  22.6s
[CV 1/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, pca__n_components=10;, score=0.894 total time=  37.5s
[CV 2/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, pca__n_components=10;, score=0.896 total time=  36.7s
[CV 3/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, pca__n_components=10;, score=0.890 total time=  36.1s
[CV 1/3] END XGB__learning_rate=0.5, XGB__max_depth=5, XGB__n_estimators=800, p

0.8172655442946087

Achieving scores of classification, saving accuracy, recall and F1 score in data frame:

Best hyperparameters:

In [53]:
search_XGB.best_params_

{'XGB__learning_rate': 0.1,
 'XGB__max_depth': 10,
 'XGB__n_estimators': 500,
 'pca__n_components': 20}

In [54]:
#XGBClassifier().get_params().keys()

In [55]:
search_XGB.cv_results_["mean_test_score"]

array([0.81939343, 0.84198873, 0.86810284, 0.84133213, 0.87234095,
       0.89394349, 0.84676692, 0.87798078, 0.90200968, 0.84108852,
       0.87000128, 0.89243752, 0.85772887, 0.88866136, 0.90994317,
       0.86056807, 0.89245273, 0.9112972 , 0.86539714, 0.89386922,
       0.91108879, 0.87069482, 0.89797723, 0.91655473, 0.87007474,
       0.90141347, 0.91524123, 0.84066092, 0.86967626, 0.89195881,
       0.85173053, 0.88466061, 0.90729342, 0.85576429, 0.8835808 ,
       0.90713081, 0.85621813, 0.88655996, 0.90513468, 0.86160186,
       0.89166158, 0.91247717, 0.86313988, 0.89354342, 0.91165163,
       0.86607574, 0.89716081, 0.91220168, 0.86857286, 0.89897748,
       0.91366623, 0.8655969 , 0.89876878, 0.91309304])

In [56]:
accuracy_score(y_test, search_XGB.predict(X_test))

0.8172655442946087

In [57]:
y_pred_XGB = search_XGB.best_estimator_.predict(X_test)
test_score = search_XGB.score(X_test, y_test)
cv_score = search_XGB.best_score_

In [58]:
print(f'Cross-validation score: {cv_score}\nTest score: {test_score}')
print("XGBClassifier:")
print(search_XGB.best_params_)
print(f'Results on test: {search_XGB.best_estimator_.score(X_test, y_test)}')
print(f'Results on train: {search_XGB.best_estimator_.score(X_train, y_train)}')

Cross-validation score: 0.9165547250279212
Test score: 0.8861460667040255
XGBClassifier:
{'XGB__learning_rate': 0.1, 'XGB__max_depth': 10, 'XGB__n_estimators': 500, 'pca__n_components': 20}
Results on test: 0.8172655442946087
Results on train: 0.8834551828939968


In [59]:
print(classification_report(y_test, y_pred_XGB))
A_report_xgb = pd.DataFrame(classification_report(y_test, y_pred_XGB, output_dict=True))

              precision    recall  f1-score   support

           0       0.83      0.90      0.86     22550
           1       0.80      0.68      0.73     13267

    accuracy                           0.82     35817
   macro avg       0.81      0.79      0.80     35817
weighted avg       0.82      0.82      0.81     35817



In [60]:
for i, name in enumerate(A_report_xgb.columns):
  A_report_xgb = A_report_xgb.rename(columns={(A_report_xgb.iloc[:,i].name): ('XGB_'+A_report_xgb.iloc[:,i].name)})


In [61]:
A_report_xgb

Unnamed: 0,XGB_0,XGB_1,XGB_accuracy,XGB_macro avg,XGB_weighted avg
precision,0.826087,0.798067,0.817266,0.812077,0.815708
recall,0.899024,0.6783,0.817266,0.788662,0.817266
f1-score,0.861014,0.733325,0.817266,0.797169,0.813717
support,22550.0,13267.0,0.817266,35817.0,35817.0


LogisticRegression algorythm with RandomizedGridSearch in pipeline, scaling reducing, ballancing:

In [62]:
pipeline = imbpipeline(steps = [['scaler', scaler],
                                ['pca', pca],
                                ['smote', SMOTEEN],
                                ['LR', LogisticRegression()]])

stratified_kfold = StratifiedKFold(n_splits=5,
                                       shuffle=True,
                                       random_state=13)
    
param_grid = {'LR__C':[20, 50, 70],
             'LR__random_state': [11],
             'LR__multi_class': ['auto'],
             'LR__max_iter': [100, 200, 500],
             'LR__solver': ['saga'],
             'LR__penalty': ['l2', 'l1'],
             'pca__n_components': [5, 10, 20]
             }
                                                                 
search_LR = GridSearchCV(estimator=pipeline,
                           param_grid=param_grid,
                           scoring='roc_auc',
                           cv=stratified_kfold,
                           verbose=3,
                           #n_jobs=3
                        )

search_LR.fit(X_train, y_train)
cv_score = search_LR.best_score_
test_score = search_LR.score(X_test, y_test)
print(f'Cross-validation score: {cv_score}\nTest score: {test_score}')
print("XGBClassifier:")
print(search_LR.best_params_)
print(f'Results on test: {search_LR.best_estimator_.score(X_test, y_test)}')
print(f'Results on train: {search_LR.best_estimator_.score(X_train, y_train)}')

Fitting 5 folds for each of 54 candidates, totalling 270 fits
[CV 1/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.8s
[CV 2/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.763 total time=   2.8s
[CV 3/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.8s
[CV 4/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.771 total time=   2.8s
[CV 5/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.770 total time=   2.8s
[CV 1/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__rando



[CV 1/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.831 total time=  22.8s




[CV 2/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  22.7s




[CV 3/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.833 total time=  22.7s




[CV 4/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.834 total time=  23.0s




[CV 5/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.833 total time=  22.8s
[CV 1/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.754 total time=   2.9s
[CV 2/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.763 total time=   2.8s
[CV 3/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.769 total time=   2.8s
[CV 4/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   3.2s
[CV 5/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.833 total time=  24.9s




[CV 2/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  25.0s




[CV 3/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  25.8s




[CV 4/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  26.0s




[CV 5/5] END LR__C=20, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  23.6s
[CV 1/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.8s
[CV 2/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.9s
[CV 3/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.7s
[CV 4/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.7s
[CV 5/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.829 total time=  26.1s




[CV 2/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.834 total time=  26.1s




[CV 3/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  26.1s




[CV 4/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.842 total time=  26.0s




[CV 5/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  26.0s
[CV 1/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.771 total time=   2.7s
[CV 2/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.763 total time=   2.7s
[CV 3/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.8s
[CV 4/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.7s
[CV 5/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.834 total time=  27.3s




[CV 2/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  27.5s




[CV 3/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  27.3s




[CV 4/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  27.5s




[CV 5/5] END LR__C=20, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  27.3s
[CV 1/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.8s
[CV 2/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.763 total time=   2.7s
[CV 3/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.6s
[CV 4/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.7s
[CV 5/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  35.9s




[CV 2/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  35.8s




[CV 3/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  35.9s
[CV 4/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  34.2s




[CV 5/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  36.0s
[CV 1/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.9s
[CV 2/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.8s
[CV 3/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.7s
[CV 4/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.769 total time=   2.7s
[CV 5/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 2/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  39.6s




[CV 3/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  39.2s




[CV 4/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.842 total time=  39.1s




[CV 5/5] END LR__C=20, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  38.9s
[CV 1/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.7s
[CV 2/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.7s
[CV 3/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.770 total time=   2.8s
[CV 4/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.8s
[CV 5/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.831 total time=  22.7s




[CV 2/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  22.6s




[CV 3/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  22.6s




[CV 4/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  22.7s




[CV 5/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.832 total time=  22.7s
[CV 1/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.7s
[CV 2/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.763 total time=   2.7s
[CV 3/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.769 total time=   2.7s
[CV 4/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.7s
[CV 5/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.833 total time=  23.3s




[CV 2/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  23.3s




[CV 3/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  23.6s




[CV 4/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.829 total time=  23.4s




[CV 5/5] END LR__C=50, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  23.4s
[CV 1/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.763 total time=   2.9s
[CV 2/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.762 total time=   2.8s
[CV 3/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.8s
[CV 4/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.769 total time=   2.7s
[CV 5/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.76



[CV 1/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  26.0s




[CV 2/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  26.0s




[CV 3/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  26.0s




[CV 4/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.840 total time=  26.0s




[CV 5/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.833 total time=  26.3s
[CV 1/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.8s
[CV 2/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.9s
[CV 3/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.8s
[CV 4/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.769 total time=   2.7s
[CV 5/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.76



[CV 1/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  27.4s




[CV 2/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  27.5s




[CV 3/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  27.5s




[CV 4/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  27.4s




[CV 5/5] END LR__C=50, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  27.5s
[CV 1/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.8s
[CV 2/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.7s
[CV 3/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.6s
[CV 4/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.7s
[CV 5/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.76



[CV 1/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.833 total time=  36.0s




[CV 2/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  36.0s




[CV 3/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  36.1s




[CV 4/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  36.1s




[CV 5/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.840 total time=  36.0s
[CV 1/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.8s
[CV 2/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.7s
[CV 3/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.772 total time=   2.8s
[CV 4/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.8s
[CV 5/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 2/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  39.0s




[CV 3/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  39.1s
[CV 4/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  37.5s
[CV 5/5] END LR__C=50, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  35.2s
[CV 1/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.8s
[CV 2/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.762 total time=   2.8s
[CV 3/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.



[CV 1/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=10;, score=0.777 total time=   9.6s
[CV 2/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=10;, score=0.782 total time=   8.9s
[CV 3/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=10;, score=0.782 total time=   7.5s
[CV 4/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=10;, score=0.787 total time=   6.5s
[CV 5/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=10;, score=0.788 total time=   7.3s




[CV 1/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.830 total time=  22.7s




[CV 2/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  22.7s




[CV 3/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  22.8s




[CV 4/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  22.7s




[CV 5/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  22.9s
[CV 1/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.8s
[CV 2/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.763 total time=   2.8s
[CV 3/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.8s
[CV 4/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.8s
[CV 5/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.834 total time=  23.4s




[CV 2/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  23.5s




[CV 3/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  23.4s




[CV 4/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  23.6s




[CV 5/5] END LR__C=70, LR__max_iter=100, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  23.4s
[CV 1/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.7s
[CV 2/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.762 total time=   2.8s
[CV 3/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.770 total time=   2.8s
[CV 4/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.768 total time=   2.9s
[CV 5/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.836 total time=  26.2s




[CV 2/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  26.2s




[CV 3/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  26.1s




[CV 4/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.841 total time=  26.0s




[CV 5/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.833 total time=  26.1s
[CV 1/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.764 total time=   2.8s
[CV 2/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.763 total time=   2.8s
[CV 3/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.8s
[CV 4/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.8s
[CV 5/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.832 total time=  27.3s




[CV 2/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  27.5s




[CV 3/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  27.7s




[CV 4/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  27.4s




[CV 5/5] END LR__C=70, LR__max_iter=200, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  27.4s
[CV 1/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.8s
[CV 2/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.765 total time=   2.7s
[CV 3/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.8s
[CV 4/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.767 total time=   2.6s
[CV 5/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 1/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.832 total time=  35.6s




[CV 2/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.838 total time=  35.2s




[CV 3/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  35.4s
[CV 4/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.835 total time=  34.6s




[CV 5/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l2, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.840 total time=  35.5s
[CV 1/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.756 total time=   2.8s
[CV 2/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.763 total time=   2.7s
[CV 3/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.770 total time=   2.7s
[CV 4/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.766 total time=   2.7s
[CV 5/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=5;, score=0.77



[CV 2/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.842 total time=  39.6s




[CV 3/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.840 total time=  39.4s




[CV 4/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.839 total time=  40.2s




[CV 5/5] END LR__C=70, LR__max_iter=500, LR__multi_class=auto, LR__penalty=l1, LR__random_state=11, LR__solver=saga, pca__n_components=20;, score=0.837 total time=  39.5s




Cross-validation score: 0.8376502319653645
Test score: 0.8378557202347755
XGBClassifier:
{'LR__C': 70, 'LR__max_iter': 500, 'LR__multi_class': 'auto', 'LR__penalty': 'l1', 'LR__random_state': 11, 'LR__solver': 'saga', 'pca__n_components': 20}
Results on test: 0.7603651897143814
Results on train: 0.7537482201189379


Achieving scores of classification, saving accuracy, recall and F1 score in data frame:

Best hyperparameters:

In [63]:
search_LR.best_params_

{'LR__C': 70,
 'LR__max_iter': 500,
 'LR__multi_class': 'auto',
 'LR__penalty': 'l1',
 'LR__random_state': 11,
 'LR__solver': 'saga',
 'pca__n_components': 20}

In [64]:
y_pred_lr = search_LR.best_estimator_.predict(X_test)

In [65]:
print(classification_report(y_test, y_pred_lr))
A_report_lr = pd.DataFrame(classification_report(y_test, y_pred_lr, output_dict=True))

              precision    recall  f1-score   support

           0       0.83      0.78      0.80     22550
           1       0.66      0.73      0.69     13267

    accuracy                           0.76     35817
   macro avg       0.75      0.76      0.75     35817
weighted avg       0.77      0.76      0.76     35817



In [66]:
for i, name in enumerate(A_report_lr.columns):
  A_report_lr = A_report_lr.rename(columns={(A_report_lr.iloc[:,i].name): ('LR_'+A_report_lr.iloc[:,i].name)})


In [67]:
A_report_lr

Unnamed: 0,LR_0,LR_1,LR_accuracy,LR_macro avg,LR_weighted avg
precision,0.832342,0.6582,0.760365,0.745271,0.767838
recall,0.77561,0.734454,0.760365,0.755032,0.760365
f1-score,0.802975,0.69424,0.760365,0.748607,0.762698
support,22550.0,13267.0,0.760365,35817.0,35817.0


Utilizing Multi Layer Perceptron algorythm with RandomizedGridSearch in pipeline, scaling reducing, ballancing:

In [68]:
pipeline = imbpipeline(steps = [['scaler', scaler],
                                ['pca', pca],
                                ['smote', SMOTEEN],
                                ['MLP', MLPClassifier()]])

stratified_kfold = StratifiedKFold(n_splits=5,
                                       shuffle=True,
                                       random_state=13)
    
param_grid = {'MLP__hidden_layer_sizes':[8, 4, 16],
             'MLP__activation': ['relu'],
              'MLP__solver': ['adam'],
              'MLP__random_state': [42],
              'MLP__max_iter': [1000],
              'MLP__batch_size': [32],
              'pca__n_components': [5, 10, 20]
             }
                                                                 
search_MLP = GridSearchCV(estimator=pipeline,
                           param_grid=param_grid,
                           scoring='roc_auc',
                           cv=stratified_kfold,
                           verbose=3,
                           #n_jobs=3
                        )

search_MLP.fit(X_train, y_train)
cv_score = search_MLP.best_score_
test_score = search_MLP.score(X_test, y_test)
print(f'Cross-validation score: {cv_score}\nTest score: {test_score}')

Fitting 5 folds for each of 9 candidates, totalling 45 fits
[CV 1/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=8, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=5;, score=0.808 total time=  28.2s
[CV 2/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=8, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=5;, score=0.813 total time=  32.4s
[CV 3/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=8, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=5;, score=0.817 total time= 1.1min
[CV 4/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=8, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=5;, score=0.818 total time= 1.2min
[CV 5/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=8, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=

[CV 3/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=16, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=20;, score=0.902 total time= 2.2min
[CV 4/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=16, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=20;, score=0.896 total time= 2.6min
[CV 5/5] END MLP__activation=relu, MLP__batch_size=32, MLP__hidden_layer_sizes=16, MLP__max_iter=1000, MLP__random_state=42, MLP__solver=adam, pca__n_components=20;, score=0.895 total time= 2.7min
Cross-validation score: 0.8979029718091383
Test score: 0.8888246899723018


Achieving scores of classification, saving accuracy, recall and F1 score in data frame:

Best hyperparameters:

In [69]:
search_MLP.best_params_

{'MLP__activation': 'relu',
 'MLP__batch_size': 32,
 'MLP__hidden_layer_sizes': 16,
 'MLP__max_iter': 1000,
 'MLP__random_state': 42,
 'MLP__solver': 'adam',
 'pca__n_components': 20}

In [70]:
y_pred_mlp = search_MLP.predict(X_test)
print(classification_report(y_test, y_pred_mlp))
A_report_mlp = pd.DataFrame(classification_report(y_test, y_pred_mlp, output_dict=True))

              precision    recall  f1-score   support

           0       0.84      0.86      0.85     22550
           1       0.76      0.73      0.74     13267

    accuracy                           0.81     35817
   macro avg       0.80      0.80      0.80     35817
weighted avg       0.81      0.81      0.81     35817



In [71]:
for i, name in enumerate(A_report_mlp.columns):
  A_report_mlp = A_report_mlp.rename(columns={(A_report_mlp.iloc[:,i].name): ('MLP_'+A_report_mlp.iloc[:,i].name)})


In [72]:
A_report_mlp

Unnamed: 0,MLP_0,MLP_1,MLP_accuracy,MLP_macro avg,MLP_weighted avg
precision,0.843845,0.75592,0.812435,0.799883,0.811277
recall,0.861508,0.729027,0.812435,0.795267,0.812435
f1-score,0.852585,0.74223,0.812435,0.797407,0.811708
support,22550.0,13267.0,0.812435,35817.0,35817.0


Creating Data Frame containing all six classifiers results:

In [73]:
A_results = pd.concat([A_report_rf, 
                       A_report_dtc, 
                       A_report_svc, 
                       A_report_xgb, 
                       A_report_lr, 
                       A_report_mlp], 
                      axis=1)

In [74]:
A_results

Unnamed: 0,RF_0,RF_1,RF_accuracy,RF_macro avg,RF_weighted avg,DTC_0,DTC_1,DTC_accuracy,DTC_macro avg,DTC_weighted avg,...,LR_0,LR_1,LR_accuracy,LR_macro avg,LR_weighted avg,MLP_0,MLP_1,MLP_accuracy,MLP_macro avg,MLP_weighted avg
precision,0.817547,0.792963,0.809979,0.805255,0.808441,0.797431,0.648731,0.741491,0.723081,0.742351,...,0.832342,0.6582,0.760365,0.745271,0.767838,0.843845,0.75592,0.812435,0.799883,0.811277
recall,0.898758,0.659079,0.809979,0.778919,0.809979,0.790111,0.658853,0.741491,0.724482,0.741491,...,0.77561,0.734454,0.760365,0.755032,0.760365,0.861508,0.729027,0.812435,0.795267,0.812435
f1-score,0.856232,0.719849,0.809979,0.78804,0.805714,0.793754,0.653753,0.741491,0.723753,0.741896,...,0.802975,0.69424,0.760365,0.748607,0.762698,0.852585,0.74223,0.812435,0.797407,0.811708
support,22550.0,13267.0,0.809979,35817.0,35817.0,22550.0,13267.0,0.741491,35817.0,35817.0,...,22550.0,13267.0,0.760365,35817.0,35817.0,22550.0,13267.0,0.812435,35817.0,35817.0


Saving results in a file:

In [75]:
A_results.to_pickle("data/A_dataset_results.pkl")

In [76]:
A_results = pd.read_pickle("data/A_dataset_results.pkl")

In [77]:
A_results

Unnamed: 0,RF_0,RF_1,RF_accuracy,RF_macro avg,RF_weighted avg,DTC_0,DTC_1,DTC_accuracy,DTC_macro avg,DTC_weighted avg,...,LR_0,LR_1,LR_accuracy,LR_macro avg,LR_weighted avg,MLP_0,MLP_1,MLP_accuracy,MLP_macro avg,MLP_weighted avg
precision,0.817547,0.792963,0.809979,0.805255,0.808441,0.797431,0.648731,0.741491,0.723081,0.742351,...,0.832342,0.6582,0.760365,0.745271,0.767838,0.843845,0.75592,0.812435,0.799883,0.811277
recall,0.898758,0.659079,0.809979,0.778919,0.809979,0.790111,0.658853,0.741491,0.724482,0.741491,...,0.77561,0.734454,0.760365,0.755032,0.760365,0.861508,0.729027,0.812435,0.795267,0.812435
f1-score,0.856232,0.719849,0.809979,0.78804,0.805714,0.793754,0.653753,0.741491,0.723753,0.741896,...,0.802975,0.69424,0.760365,0.748607,0.762698,0.852585,0.74223,0.812435,0.797407,0.811708
support,22550.0,13267.0,0.809979,35817.0,35817.0,22550.0,13267.0,0.741491,35817.0,35817.0,...,22550.0,13267.0,0.760365,35817.0,35817.0,22550.0,13267.0,0.812435,35817.0,35817.0
