# Getting Started Kaggle TPS Challenge with Tabular ML Toolkit

> A Tutorial to showcase usage of tabular_ml_toolkit (tmlt) library on Kaggle TPS Challenge Nov 2021.

> tabular_ml_toolkit is a helper library to jumpstart your machine learning project based on Tabular or Structured data.

> It comes with model and data parallelism and cutting edge hyperparameter search techniques.

> Under the hood TMLT uses modin, optuna, xgboost and scikit-learn pipelines

## Install

`pip install -U tabular_ml_toolkit`

### How to Best Use tabular_ml_toolkit

Start with your favorite model and then just simply create **tmlt** with one API.

*Here we are using XGBClassifier, on  [Kaggle TPS Challenge (Nov 2021) data](https://www.kaggle.com/c/tabular-playground-series-nov-2021/data)*

In [None]:
from tabular_ml_toolkit.tmlt import *
from sklearn.svm import LinearSVC
from xgboost import XGBClassifier
import numpy as np

# for visualizing pipeline
from sklearn import set_config
set_config(display="diagram")

# just to measure fit performance
import time

In [None]:
from sklearn.metrics import roc_auc_score, accuracy_score

In [None]:
# Dataset file names and Paths
DIRECTORY_PATH = "/Users/pamathur/kaggle_datasets/tps_nov_2021/"
TRAIN_FILE = "train.csv"
TEST_FILE = "test.csv"
SAMPLE_SUB_FILE = "sample_submission.csv"
OUTPUT_PATH = "kaggle_tps_output/"

In [None]:
# TRY THIS using LOGISTIC Regression
# https://www.kaggle.com/maximkazantsev/tps-11-21-eda-xgboost-optuna

# ALSO TAKE OUT MODIN OR USE SOME FUNCTIONALITY TO USE BOTH

#### Create a base xgb classifier model with your best guess params

In [None]:
xgb_params = {
    # your best guess params
    'learning_rate':0.01,
    'eval_metric':'auc',
    # must for xgb classifier otherwise warning will be shown
    'use_label_encoder':False,
    # because 42 is the answer for all the randomness of this universe
    'random_state':42,
    #for GPU
    #'tree_method': 'gpu_hist',
    #'predictor': 'gpu_predictor',
}

xgb_model = XGBClassifier(**xgb_params)

In [None]:
# createm tmlt for xgb model
tmlt = TMLT().prepare_data_for_training(
    train_file_path= DIRECTORY_PATH + TRAIN_FILE,
    test_file_path= DIRECTORY_PATH + TEST_FILE,
    #make sure to use right index and target columns
    idx_col="id",
    target="target",
    model=xgb_model,
    random_state=42,
    problem_type="binary_classification", nrows=4000)


# supports only task type
# "binary_classification"
# "multi_label_classification"
# "multi_class_classification"
# "regression"

2021-11-24 11:53:15,358 INFO 12 cores found, model and data parallel processing should worked!
2021-11-24 11:53:15,477 INFO DataFrame Memory usage decreased to 0.80 Mb (74.4% reduction)
2021-11-24 11:53:15,601 INFO DataFrame Memory usage decreased to 0.79 Mb (74.3% reduction)
2021-11-24 11:53:15,655 INFO categorical columns are None, Preprocessing will done accordingly!


In [None]:
tmlt.spl

#### Let's do a quick round of training

In [None]:
tmlt.dfl.create_train_valid(valid_size=0.2)

In [None]:
# Quick check on dataframe shapes
print(f"X_train shape is {tmlt.dfl.X_train.shape}" )
print(f"X_valid shape is {tmlt.dfl.X_valid.shape}" )
print(f"y_train shape is {tmlt.dfl.y_train.shape}")
print(f"y_valid shape is {tmlt.dfl.y_valid.shape}")

X_train shape is (3200, 100)
X_valid shape is (800, 100)
y_train shape is (3200,)
y_valid shape is (800,)


In [None]:
# Fit
start = time.time()
# Now fit
tmlt.spl.fit(tmlt.dfl.X_train, tmlt.dfl.y_train)
end = time.time()
print("Fit Time:", end - start)

#predict
preds = tmlt.spl.predict(tmlt.dfl.X_valid)
preds_probs = tmlt.spl.predict_proba(tmlt.dfl.X_valid)[:, 1]

# Metrics
auc = roc_auc_score(tmlt.dfl.y_valid, preds_probs)
acc = accuracy_score(tmlt.dfl.y_valid, preds)

print(f"AUC is : {auc} while Accuracy is : {acc} ")

Fit Time: 1.045884132385254
AUC is : 0.6137947418435223 while Accuracy is : 0.6175 


#### Base model For Meta Ensemble Model 

In [None]:
# OOF training and prediction on both train and test dataset by a given model

linear_oof_model = LinearSVC(tol=1e-7, penalty='l2', dual=False, max_iter=2000, random_state=42)

linear_oof_model_preds, linear_oof_model_test_preds = tmlt.do_oof_kfold_train_preds(n_splits=5,
                                                          oof_model=linear_oof_model)
if linear_oof_model_preds is not None:
    print(linear_oof_model_preds.shape)

if linear_oof_model_test_preds is not None:    
    print(linear_oof_model_test_preds.shape)

2021-11-24 11:55:33,649 INFO fold: 1 OOF Model ROC AUC: 0.7259767891682785!
2021-11-24 11:55:34,094 INFO fold: 2 OOF Model ROC AUC: 0.6958091553836234!
2021-11-24 11:55:34,543 INFO fold: 3 OOF Model ROC AUC: 0.6614764667956157!
2021-11-24 11:55:35,027 INFO fold: 4 OOF Model ROC AUC: 0.7080050760440353!
2021-11-24 11:55:35,446 INFO fold: 5 OOF Model ROC AUC: 0.7223571396363027!
2021-11-24 11:55:35,451 INFO Mean OOF Model ROC AUC: 0.7027249254055712!


(4000,)
(4000,)


In [None]:
# add based model oof predictions back to X and X_test before Meta model training
tmlt.dfl.X["linear_preds"] = linear_oof_model_preds
tmlt.dfl.X_test["linear_preds"] = linear_oof_model_test_preds

In [None]:
print(tmlt.dfl.X.shape)
print(tmlt.dfl.X_test.shape)

(4000, 101)
(4000, 101)


#### For Meta Model, Let's do Optuna based HyperParameter search to get best params for fit

In [None]:
study = tmlt.do_xgb_optuna_optimization(optuna_db_path=OUTPUT_PATH, opt_timeout=360)

2021-11-24 11:56:05,611 INFO Optimization Direction is: minimize
[32m[I 2021-11-24 11:56:05,674][0m Using an existing study with name 'tmlt_autoxgb' instead of creating a new one.[0m
2021-11-24 11:56:05,948 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:56:13,976 INFO Training Ended!
2021-11-24 11:56:14,038 INFO log_loss: 0.6036267434991897
2021-11-24 11:56:14,039 INFO roc_auc_score: 0.7167207792207793
2021-11-24 11:56:14,039 INFO accuracy_score: 0.70125
2021-11-24 11:56:14,040 INFO f1_score: 0.5886402753872634
2021-11-24 11:56:14,041 INFO precision_score: 0.6263736263736264
2021-11-24 11:56:14,041 INFO recall_score: 0.5551948051948052
[32m[I 2021-11-24 11:56:14,078][0m Trial 40 finished with value: 0.6036267434991897 and parameters: {'learning_rate': 0.0467094492253725, 'n_estimators': 15000, 'reg_lambda': 6.538273265404555e-05, 'reg_alpha': 0.0013117821568648178, 'subsample': 0.5290631340507302, 'colsample_bytree': 0.7582355779106297, 'max_depth': 4, 'early_stopping_rounds': 426, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:56:14,250 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:56:17,358 INFO Training Ended!
  _warn_prf(average, modifier, msg_start, len(result))
2021-11-24 11:56:17,419 INFO log_loss: 0.6689949867129326
2021-11-24 11:56:17,420 INFO roc_auc_score: 0.5
2021-11-24 11:56:17,420 INFO accuracy_score: 0.615
2021-11-24 11:56:17,421 INFO f1_score: 0.0
2021-11-24 11:56:17,422 INFO precision_score: 0.0
2021-11-24 11:56:17,422 INFO recall_score: 0.0
[32m[I 2021-11-24 11:56:17,451][0m Trial 41 finished with value: 0.6689949867129326 and parameters: {'learning_rate': 0.018704968140364132, 'n_estimators': 15000, 'reg_lambda': 3.5773118385391564e-06, 'reg_alpha': 0.3568074678915505, 'subsample': 0.5658420728617757, 'colsample_bytree': 0.6677211087049771, 'max_depth': 7, 'early_stopping_rounds': 221, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:56:17,645 INFO Training Started!
2021-11-24 11:56:41,801 INFO Training Ended!
2021-11-24 11:56:41,994 INFO log_loss: 1.05072499800998

Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:56:47,668 INFO Training Ended!
  _warn_prf(average, modifier, msg_start, len(result))
2021-11-24 11:56:47,728 INFO log_loss: 0.6668759831786155
2021-11-24 11:56:47,729 INFO roc_auc_score: 0.6999458874458875
2021-11-24 11:56:47,729 INFO accuracy_score: 0.615
2021-11-24 11:56:47,730 INFO f1_score: 0.0
2021-11-24 11:56:47,730 INFO precision_score: 0.0
2021-11-24 11:56:47,731 INFO recall_score: 0.0
[32m[I 2021-11-24 11:56:47,762][0m Trial 43 finished with value: 0.6668759831786155 and parameters: {'learning_rate': 0.010772643652489347, 'n_estimators': 15000, 'reg_lambda': 7.224808161071666, 'reg_alpha': 0.016376625092934763, 'subsample': 0.7488614807842593, 'colsample_bytree': 0.5452525070955537, 'max_depth': 5, 'early_stopping_rounds': 458, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:56:47,928 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:56:57,753 INFO Training Ended!
2021-11-24 11:56:57,817 INFO log_loss: 0.603700662679039
2021-11-24 11:56:57,818 INFO roc_auc_score: 0.7165624010136205
2021-11-24 11:56:57,818 INFO accuracy_score: 0.70125
2021-11-24 11:56:57,819 INFO f1_score: 0.5886402753872634
2021-11-24 11:56:57,820 INFO precision_score: 0.6263736263736264
2021-11-24 11:56:57,820 INFO recall_score: 0.5551948051948052
[32m[I 2021-11-24 11:56:57,850][0m Trial 44 finished with value: 0.603700662679039 and parameters: {'learning_rate': 0.04764946344564492, 'n_estimators': 15000, 'reg_lambda': 9.509972809846152e-05, 'reg_alpha': 0.0011806166035534902, 'subsample': 0.535456718198006, 'colsample_bytree': 0.7563055146997821, 'max_depth': 4, 'early_stopping_rounds': 427, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:56:58,033 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:57:06,884 INFO Training Ended!
2021-11-24 11:57:06,947 INFO log_loss: 0.6050223457673565
2021-11-24 11:57:06,947 INFO roc_auc_score: 0.7159684827367754
2021-11-24 11:57:06,948 INFO accuracy_score: 0.695
2021-11-24 11:57:06,949 INFO f1_score: 0.5836177474402731
2021-11-24 11:57:06,949 INFO precision_score: 0.6151079136690647
2021-11-24 11:57:06,950 INFO recall_score: 0.5551948051948052
[32m[I 2021-11-24 11:57:06,976][0m Trial 45 finished with value: 0.6050223457673565 and parameters: {'learning_rate': 0.06732275507944363, 'n_estimators': 15000, 'reg_lambda': 3.723340173877417e-05, 'reg_alpha': 8.659625516561364e-05, 'subsample': 0.6791973826476483, 'colsample_bytree': 0.7886046588497224, 'max_depth': 3, 'early_stopping_rounds': 328, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:57:07,147 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:57:16,694 INFO Training Ended!
2021-11-24 11:57:16,755 INFO log_loss: 0.6034033725084736
2021-11-24 11:57:16,756 INFO roc_auc_score: 0.7169055537957977
2021-11-24 11:57:16,757 INFO accuracy_score: 0.7
2021-11-24 11:57:16,757 INFO f1_score: 0.5847750865051904
2021-11-24 11:57:16,758 INFO precision_score: 0.6259259259259259
2021-11-24 11:57:16,759 INFO recall_score: 0.5487012987012987
[32m[I 2021-11-24 11:57:16,785][0m Trial 46 finished with value: 0.6034033725084736 and parameters: {'learning_rate': 0.0529560143480222, 'n_estimators': 15000, 'reg_lambda': 0.0015065052556385249, 'reg_alpha': 0.0015237224036502737, 'subsample': 0.5217391893832076, 'colsample_bytree': 0.9032805892664175, 'max_depth': 4, 'early_stopping_rounds': 471, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:57:16,956 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:57:26,324 INFO Training Ended!
2021-11-24 11:57:26,395 INFO log_loss: 0.6047576112858951
2021-11-24 11:57:26,396 INFO roc_auc_score: 0.71601467638053
2021-11-24 11:57:26,396 INFO accuracy_score: 0.695
2021-11-24 11:57:26,397 INFO f1_score: 0.5836177474402731
2021-11-24 11:57:26,398 INFO precision_score: 0.6151079136690647
2021-11-24 11:57:26,398 INFO recall_score: 0.5551948051948052
[32m[I 2021-11-24 11:57:26,431][0m Trial 47 finished with value: 0.6047576112858951 and parameters: {'learning_rate': 0.07947070726527587, 'n_estimators': 15000, 'reg_lambda': 0.0016366890028134459, 'reg_alpha': 2.119564883222415e-07, 'subsample': 0.6443051608844637, 'colsample_bytree': 0.9167973472127608, 'max_depth': 6, 'early_stopping_rounds': 473, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:57:26,614 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:57:35,304 INFO Training Ended!
2021-11-24 11:57:35,367 INFO log_loss: 0.6035707989195361
2021-11-24 11:57:35,368 INFO roc_auc_score: 0.7166547883011297
2021-11-24 11:57:35,368 INFO accuracy_score: 0.6975
2021-11-24 11:57:35,369 INFO f1_score: 0.5827586206896551
2021-11-24 11:57:35,370 INFO precision_score: 0.6213235294117647
2021-11-24 11:57:35,370 INFO recall_score: 0.5487012987012987
[32m[I 2021-11-24 11:57:35,396][0m Trial 48 finished with value: 0.6035707989195361 and parameters: {'learning_rate': 0.03614354681363991, 'n_estimators': 15000, 'reg_lambda': 0.006854957302066931, 'reg_alpha': 0.00019017956747167315, 'subsample': 0.7940986572285533, 'colsample_bytree': 0.8738975570432537, 'max_depth': 5, 'early_stopping_rounds': 261, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:57:35,560 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:57:46,493 INFO Training Ended!
2021-11-24 11:57:46,553 INFO log_loss: 0.6082258519902826
2021-11-24 11:57:46,554 INFO roc_auc_score: 0.7157573117938971
2021-11-24 11:57:46,554 INFO accuracy_score: 0.6975
2021-11-24 11:57:46,555 INFO f1_score: 0.5631768953068592
2021-11-24 11:57:46,555 INFO precision_score: 0.6341463414634146
2021-11-24 11:57:46,556 INFO recall_score: 0.5064935064935064
[32m[I 2021-11-24 11:57:46,581][0m Trial 49 finished with value: 0.6082258519902826 and parameters: {'learning_rate': 0.023182620810535053, 'n_estimators': 20000, 'reg_lambda': 0.00032241286371737795, 'reg_alpha': 0.007504659157311994, 'subsample': 0.7334248177616951, 'colsample_bytree': 0.814394630383725, 'max_depth': 6, 'early_stopping_rounds': 475, 'tree_method': 'hist', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:57:46,773 INFO Training Started!
2021-11-24 11:58:29,785 INFO Training Ended!
2021-11-24 11:58:29,987 INFO log_loss: 1.06980902

Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:58:38,204 INFO Training Ended!
2021-11-24 11:58:38,264 INFO log_loss: 0.650817776657641
2021-11-24 11:58:38,265 INFO roc_auc_score: 0.7190271618625277
2021-11-24 11:58:38,265 INFO accuracy_score: 0.6175
2021-11-24 11:58:38,266 INFO f1_score: 0.012903225806451613
2021-11-24 11:58:38,267 INFO precision_score: 1.0
2021-11-24 11:58:38,267 INFO recall_score: 0.006493506493506494
[32m[I 2021-11-24 11:58:38,292][0m Trial 51 finished with value: 0.650817776657641 and parameters: {'learning_rate': 0.027657223103193497, 'n_estimators': 15000, 'reg_lambda': 1.508834114571624, 'reg_alpha': 3.4286062415028234e-05, 'subsample': 0.5947354543019293, 'colsample_bytree': 0.9528866136619958, 'max_depth': 6, 'early_stopping_rounds': 402, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:58:38,464 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:58:50,718 INFO Training Ended!
2021-11-24 11:58:50,779 INFO log_loss: 0.6160027698986232
2021-11-24 11:58:50,780 INFO roc_auc_score: 0.7201589061345159
2021-11-24 11:58:50,780 INFO accuracy_score: 0.70125
2021-11-24 11:58:50,781 INFO f1_score: 0.5031185031185031
2021-11-24 11:58:50,781 INFO precision_score: 0.6994219653179191
2021-11-24 11:58:50,782 INFO recall_score: 0.39285714285714285
[32m[I 2021-11-24 11:58:50,808][0m Trial 52 finished with value: 0.6160027698986232 and parameters: {'learning_rate': 0.040427227428741906, 'n_estimators': 20000, 'reg_lambda': 0.20535422413974883, 'reg_alpha': 0.0007742171711545045, 'subsample': 0.9306781384312386, 'colsample_bytree': 0.704801485650981, 'max_depth': 3, 'early_stopping_rounds': 373, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:58:50,977 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:58:59,456 INFO Training Ended!
2021-11-24 11:58:59,518 INFO log_loss: 0.6043546871934086
2021-11-24 11:58:59,519 INFO roc_auc_score: 0.7161664554957239
2021-11-24 11:58:59,519 INFO accuracy_score: 0.6975
2021-11-24 11:58:59,520 INFO f1_score: 0.5856164383561645
2021-11-24 11:58:59,521 INFO precision_score: 0.6195652173913043
2021-11-24 11:58:59,521 INFO recall_score: 0.5551948051948052
[32m[I 2021-11-24 11:58:59,546][0m Trial 53 finished with value: 0.6043546871934086 and parameters: {'learning_rate': 0.01320709570376475, 'n_estimators': 15000, 'reg_lambda': 0.0012834363916561155, 'reg_alpha': 0.00032737946638013767, 'subsample': 0.46474662299651637, 'colsample_bytree': 0.5888590434117638, 'max_depth': 5, 'early_stopping_rounds': 483, 'tree_method': 'approx', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:58:59,711 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:59:09,200 INFO Training Ended!
2021-11-24 11:59:09,263 INFO log_loss: 0.603551508737728
2021-11-24 11:59:09,264 INFO roc_auc_score: 0.7166811846689896
2021-11-24 11:59:09,265 INFO accuracy_score: 0.6975
2021-11-24 11:59:09,265 INFO f1_score: 0.5827586206896551
2021-11-24 11:59:09,266 INFO precision_score: 0.6213235294117647
2021-11-24 11:59:09,267 INFO recall_score: 0.5487012987012987
[32m[I 2021-11-24 11:59:09,294][0m Trial 54 finished with value: 0.603551508737728 and parameters: {'learning_rate': 0.03561780446032311, 'n_estimators': 15000, 'reg_lambda': 0.006387074213889569, 'reg_alpha': 0.00028530228249217134, 'subsample': 0.7873559850797378, 'colsample_bytree': 0.8761284055015803, 'max_depth': 5, 'early_stopping_rounds': 259, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:59:09,470 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:59:20,340 INFO Training Ended!
2021-11-24 11:59:20,400 INFO log_loss: 0.6032781556388364
2021-11-24 11:59:20,401 INFO roc_auc_score: 0.717354292049414
2021-11-24 11:59:20,401 INFO accuracy_score: 0.7025
2021-11-24 11:59:20,402 INFO f1_score: 0.5839160839160839
2021-11-24 11:59:20,403 INFO precision_score: 0.6325757575757576
2021-11-24 11:59:20,403 INFO recall_score: 0.5422077922077922
[32m[I 2021-11-24 11:59:20,428][0m Trial 55 finished with value: 0.6032781556388364 and parameters: {'learning_rate': 0.03052022768504554, 'n_estimators': 15000, 'reg_lambda': 0.004699701717716409, 'reg_alpha': 0.002530036833028838, 'subsample': 0.7714795451742406, 'colsample_bytree': 0.8871066704803128, 'max_depth': 5, 'early_stopping_rounds': 285, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:59:20,596 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:59:31,240 INFO Training Ended!
2021-11-24 11:59:31,300 INFO log_loss: 0.6033758248761296
2021-11-24 11:59:31,301 INFO roc_auc_score: 0.7179944039700137
2021-11-24 11:59:31,301 INFO accuracy_score: 0.7075
2021-11-24 11:59:31,302 INFO f1_score: 0.5776173285198555
2021-11-24 11:59:31,303 INFO precision_score: 0.6504065040650406
2021-11-24 11:59:31,304 INFO recall_score: 0.5194805194805194
[32m[I 2021-11-24 11:59:31,328][0m Trial 56 finished with value: 0.6033758248761296 and parameters: {'learning_rate': 0.24582923032168152, 'n_estimators': 15000, 'reg_lambda': 0.02798322204929562, 'reg_alpha': 0.001996383531663444, 'subsample': 0.8521368438304793, 'colsample_bytree': 0.831616878495048, 'max_depth': 6, 'early_stopping_rounds': 235, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:59:31,494 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:59:41,075 INFO Training Ended!
2021-11-24 11:59:41,134 INFO log_loss: 0.6026766263414174
2021-11-24 11:59:41,135 INFO roc_auc_score: 0.7188456868334918
2021-11-24 11:59:41,136 INFO accuracy_score: 0.69625
2021-11-24 11:59:41,136 INFO f1_score: 0.563734290843806
2021-11-24 11:59:41,137 INFO precision_score: 0.6305220883534136
2021-11-24 11:59:41,137 INFO recall_score: 0.5097402597402597
[32m[I 2021-11-24 11:59:41,162][0m Trial 57 finished with value: 0.6026766263414174 and parameters: {'learning_rate': 0.21874272284306231, 'n_estimators': 15000, 'reg_lambda': 0.048222909125164665, 'reg_alpha': 1.0900267872563438e-08, 'subsample': 0.8517865539316619, 'colsample_bytree': 0.811352597477725, 'max_depth': 6, 'early_stopping_rounds': 276, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 20 with value: 0.6023122700396926.[0m
2021-11-24 11:59:41,331 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 11:59:50,815 INFO Training Ended!
2021-11-24 11:59:50,874 INFO log_loss: 0.6023109186254442
2021-11-24 11:59:50,875 INFO roc_auc_score: 0.7179614085101891
2021-11-24 11:59:50,876 INFO accuracy_score: 0.6975
2021-11-24 11:59:50,877 INFO f1_score: 0.5709219858156028
2021-11-24 11:59:50,878 INFO precision_score: 0.62890625
2021-11-24 11:59:50,879 INFO recall_score: 0.5227272727272727
[32m[I 2021-11-24 11:59:50,906][0m Trial 58 finished with value: 0.6023109186254442 and parameters: {'learning_rate': 0.21692758717644564, 'n_estimators': 15000, 'reg_lambda': 0.03574312032188266, 'reg_alpha': 1.460032654750016e-08, 'subsample': 0.8548663474176669, 'colsample_bytree': 0.8376473731229855, 'max_depth': 6, 'early_stopping_rounds': 285, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 58 with value: 0.6023109186254442.[0m
2021-11-24 11:59:51,079 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:00:00,140 INFO Training Ended!
2021-11-24 12:00:00,202 INFO log_loss: 0.6372257913276553
2021-11-24 12:00:00,202 INFO roc_auc_score: 0.7198883433639531
2021-11-24 12:00:00,203 INFO accuracy_score: 0.635
2021-11-24 12:00:00,204 INFO f1_score: 0.17045454545454547
2021-11-24 12:00:00,204 INFO precision_score: 0.6818181818181818
2021-11-24 12:00:00,205 INFO recall_score: 0.09740259740259741
[32m[I 2021-11-24 12:00:00,233][0m Trial 59 finished with value: 0.6372257913276553 and parameters: {'learning_rate': 0.1621944022986845, 'n_estimators': 15000, 'reg_lambda': 0.6793826372516385, 'reg_alpha': 1.2840417064710336e-08, 'subsample': 0.8282155214223299, 'colsample_bytree': 0.9577702648852262, 'max_depth': 7, 'early_stopping_rounds': 283, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 58 with value: 0.6023109186254442.[0m
2021-11-24 12:00:00,395 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:00:10,041 INFO Training Ended!
2021-11-24 12:00:10,103 INFO log_loss: 0.6044605529028922
2021-11-24 12:00:10,104 INFO roc_auc_score: 0.719545190581776
2021-11-24 12:00:10,105 INFO accuracy_score: 0.6975
2021-11-24 12:00:10,105 INFO f1_score: 0.5535055350553506
2021-11-24 12:00:10,106 INFO precision_score: 0.6410256410256411
2021-11-24 12:00:10,107 INFO recall_score: 0.487012987012987
[32m[I 2021-11-24 12:00:10,134][0m Trial 60 finished with value: 0.6044605529028922 and parameters: {'learning_rate': 0.19279729928507114, 'n_estimators': 15000, 'reg_lambda': 0.0777448369646114, 'reg_alpha': 4.255513418616447e-08, 'subsample': 0.9151708034700325, 'colsample_bytree': 0.7827978468790364, 'max_depth': 6, 'early_stopping_rounds': 292, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 58 with value: 0.6023109186254442.[0m
2021-11-24 12:00:10,327 INFO Training Started!
2021-11-24 12:00:44,139 INFO Training Ended!
2021-11-24 12:00:44,344 INFO log_loss: 1.013946076872

Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:00:53,121 INFO Training Ended!
2021-11-24 12:00:53,183 INFO log_loss: 0.6093048080988228
2021-11-24 12:00:53,183 INFO roc_auc_score: 0.7204822616407982
2021-11-24 12:00:53,184 INFO accuracy_score: 0.695
2021-11-24 12:00:53,185 INFO f1_score: 0.5196850393700788
2021-11-24 12:00:53,185 INFO precision_score: 0.66
2021-11-24 12:00:53,186 INFO recall_score: 0.42857142857142855
[32m[I 2021-11-24 12:00:53,211][0m Trial 62 finished with value: 0.6093048080988228 and parameters: {'learning_rate': 0.18605366069192236, 'n_estimators': 15000, 'reg_lambda': 0.13730331363752588, 'reg_alpha': 1.2097579233420072e-08, 'subsample': 0.7245815935454666, 'colsample_bytree': 0.8373210128330251, 'max_depth': 8, 'early_stopping_rounds': 280, 'tree_method': 'approx', 'booster': 'gblinear'}. Best is trial 58 with value: 0.6023109186254442.[0m
2021-11-24 12:00:53,381 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:01:03,115 INFO Training Ended!
2021-11-24 12:01:03,176 INFO log_loss: 0.6041115773329512
2021-11-24 12:01:03,177 INFO roc_auc_score: 0.7161532573117939
2021-11-24 12:01:03,177 INFO accuracy_score: 0.69375
2021-11-24 12:01:03,178 INFO f1_score: 0.5811965811965812
2021-11-24 12:01:03,179 INFO precision_score: 0.6137184115523465
2021-11-24 12:01:03,179 INFO recall_score: 0.551948051948052
[32m[I 2021-11-24 12:01:03,205][0m Trial 63 finished with value: 0.6041115773329512 and parameters: {'learning_rate': 0.1327708902916091, 'n_estimators': 15000, 'reg_lambda': 0.004722373106294932, 'reg_alpha': 2.984760758637526e-07, 'subsample': 0.8698983738777526, 'colsample_bytree': 0.9254610813429943, 'max_depth': 6, 'early_stopping_rounds': 226, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 58 with value: 0.6023109186254442.[0m
2021-11-24 12:01:03,378 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:01:11,935 INFO Training Ended!
2021-11-24 12:01:11,992 INFO log_loss: 0.602295038830489
2021-11-24 12:01:11,992 INFO roc_auc_score: 0.7176380530039067
2021-11-24 12:01:11,993 INFO accuracy_score: 0.7
2021-11-24 12:01:11,994 INFO f1_score: 0.5789473684210527
2021-11-24 12:01:11,994 INFO precision_score: 0.6297709923664122
2021-11-24 12:01:11,995 INFO recall_score: 0.5357142857142857
[32m[I 2021-11-24 12:01:12,020][0m Trial 64 finished with value: 0.602295038830489 and parameters: {'learning_rate': 0.23448628117471854, 'n_estimators': 15000, 'reg_lambda': 0.027865211791046604, 'reg_alpha': 3.172600149850677e-08, 'subsample': 0.8286217057166831, 'colsample_bytree': 0.8432116146719373, 'max_depth': 6, 'early_stopping_rounds': 196, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 64 with value: 0.602295038830489.[0m
2021-11-24 12:01:12,192 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:01:20,680 INFO Training Ended!
2021-11-24 12:01:20,740 INFO log_loss: 0.6229229770414532
2021-11-24 12:01:20,741 INFO roc_auc_score: 0.7203964734452539
2021-11-24 12:01:20,742 INFO accuracy_score: 0.68
2021-11-24 12:01:20,742 INFO f1_score: 0.3990610328638498
2021-11-24 12:01:20,743 INFO precision_score: 0.7203389830508474
2021-11-24 12:01:20,743 INFO recall_score: 0.275974025974026
[32m[I 2021-11-24 12:01:20,768][0m Trial 65 finished with value: 0.6229229770414532 and parameters: {'learning_rate': 0.20598585186906015, 'n_estimators': 15000, 'reg_lambda': 0.3278004500249784, 'reg_alpha': 5.7043759713664325e-08, 'subsample': 0.7684286637805158, 'colsample_bytree': 0.8915934405410837, 'max_depth': 6, 'early_stopping_rounds': 191, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 64 with value: 0.602295038830489.[0m
2021-11-24 12:01:20,936 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:01:30,198 INFO Training Ended!
2021-11-24 12:01:30,258 INFO log_loss: 0.6024340576445684
2021-11-24 12:01:30,259 INFO roc_auc_score: 0.7175060711646077
2021-11-24 12:01:30,259 INFO accuracy_score: 0.6975
2021-11-24 12:01:30,260 INFO f1_score: 0.578397212543554
2021-11-24 12:01:30,260 INFO precision_score: 0.6240601503759399
2021-11-24 12:01:30,261 INFO recall_score: 0.538961038961039
[32m[I 2021-11-24 12:01:30,286][0m Trial 66 finished with value: 0.6024340576445684 and parameters: {'learning_rate': 0.22501750315420058, 'n_estimators': 15000, 'reg_lambda': 0.022053556989799886, 'reg_alpha': 2.1093543363044277e-08, 'subsample': 0.8228148089289461, 'colsample_bytree': 0.7865649533128805, 'max_depth': 6, 'early_stopping_rounds': 188, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 64 with value: 0.602295038830489.[0m
2021-11-24 12:01:30,455 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:01:39,070 INFO Training Ended!
2021-11-24 12:01:39,131 INFO log_loss: 0.6024603482894599
2021-11-24 12:01:39,131 INFO roc_auc_score: 0.7174466793369233
2021-11-24 12:01:39,132 INFO accuracy_score: 0.69625
2021-11-24 12:01:39,133 INFO f1_score: 0.577391304347826
2021-11-24 12:01:39,133 INFO precision_score: 0.6217228464419475
2021-11-24 12:01:39,134 INFO recall_score: 0.538961038961039
[32m[I 2021-11-24 12:01:39,160][0m Trial 67 finished with value: 0.6024603482894599 and parameters: {'learning_rate': 0.23711085047118874, 'n_estimators': 15000, 'reg_lambda': 0.021378490262174213, 'reg_alpha': 2.3213765777731845e-08, 'subsample': 0.8247856949056165, 'colsample_bytree': 0.7257348305668477, 'max_depth': 7, 'early_stopping_rounds': 188, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 64 with value: 0.602295038830489.[0m
2021-11-24 12:01:39,326 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:01:47,788 INFO Training Ended!
2021-11-24 12:01:47,848 INFO log_loss: 0.6034830134734511
2021-11-24 12:01:47,849 INFO roc_auc_score: 0.7191888396156689
2021-11-24 12:01:47,850 INFO accuracy_score: 0.6975
2021-11-24 12:01:47,850 INFO f1_score: 0.5583941605839416
2021-11-24 12:01:47,851 INFO precision_score: 0.6375
2021-11-24 12:01:47,852 INFO recall_score: 0.4967532467532468
[32m[I 2021-11-24 12:01:47,880][0m Trial 68 finished with value: 0.6034830134734511 and parameters: {'learning_rate': 0.24018034186076134, 'n_estimators': 15000, 'reg_lambda': 0.06341025893382701, 'reg_alpha': 2.2796329413542294e-08, 'subsample': 0.9599350543076591, 'colsample_bytree': 0.7355242470499104, 'max_depth': 7, 'early_stopping_rounds': 166, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 64 with value: 0.602295038830489.[0m
2021-11-24 12:01:48,044 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:01:59,873 INFO Training Ended!
2021-11-24 12:01:59,942 INFO log_loss: 0.6022779347747564
2021-11-24 12:01:59,942 INFO roc_auc_score: 0.7178690212226798
2021-11-24 12:01:59,943 INFO accuracy_score: 0.69875
2021-11-24 12:01:59,944 INFO f1_score: 0.5734513274336284
2021-11-24 12:01:59,944 INFO precision_score: 0.6303501945525292
2021-11-24 12:01:59,945 INFO recall_score: 0.525974025974026
[32m[I 2021-11-24 12:01:59,974][0m Trial 69 finished with value: 0.6022779347747564 and parameters: {'learning_rate': 0.152746085461703, 'n_estimators': 15000, 'reg_lambda': 0.031987234461410045, 'reg_alpha': 1.3693071141813216e-07, 'subsample': 0.8214620713008292, 'colsample_bytree': 0.8000514536230441, 'max_depth': 7, 'early_stopping_rounds': 147, 'tree_method': 'exact', 'booster': 'gblinear'}. Best is trial 69 with value: 0.6022779347747564.[0m
2021-11-24 12:02:00,262 INFO Training Started!


Parameters: { "colsample_bytree", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:02:11,425 INFO Training Ended!
2021-11-24 12:02:11,485 INFO log_loss: 0.6026194911077618
2021-11-24 12:02:11,486 INFO roc_auc_score: 0.7173410938654842
2021-11-24 12:02:11,486 INFO accuracy_score: 0.69875
2021-11-24 12:02:11,487 INFO f1_score: 0.5823223570190641
2021-11-24 12:02:11,488 INFO precision_score: 0.6245353159851301
2021-11-24 12:02:11,488 INFO recall_score: 0.5454545454545454
[32m[I 2021-11-24 12:02:11,514][0m Trial 70 finished with value: 0.6026194911077618 and parameters: {'learning_rate': 0.1630843141366229, 'n_estimators': 20000, 'reg_lambda': 0.018155430643457735, 'reg_alpha': 6.924580799000354e-08, 'subsample': 0.8331594340685916, 'colsample_bytree': 0.7983840960926235, 'max_depth': 7, 'early_stopping_rounds': 131, 'tree_method': 'hist', 'booster': 'gblinear'}. Best is trial 69 with value: 0.6022779347747564.[0m


In [None]:
print(study.best_trial)

FrozenTrial(number=69, values=[0.6022779347747564], datetime_start=datetime.datetime(2021, 11, 24, 12, 1, 47, 891300), datetime_complete=datetime.datetime(2021, 11, 24, 12, 1, 59, 946156), params={'booster': 'gblinear', 'colsample_bytree': 0.8000514536230441, 'early_stopping_rounds': 147, 'learning_rate': 0.152746085461703, 'max_depth': 7, 'n_estimators': 15000, 'reg_alpha': 1.3693071141813216e-07, 'reg_lambda': 0.031987234461410045, 'subsample': 0.8214620713008292, 'tree_method': 'exact'}, distributions={'booster': CategoricalDistribution(choices=('gbtree', 'gblinear')), 'colsample_bytree': UniformDistribution(high=1.0, low=0.1), 'early_stopping_rounds': IntUniformDistribution(high=500, low=100, step=1), 'learning_rate': LogUniformDistribution(high=0.25, low=0.01), 'max_depth': IntUniformDistribution(high=9, low=1, step=1), 'n_estimators': CategoricalDistribution(choices=(7000, 15000, 20000)), 'reg_alpha': LogUniformDistribution(high=100.0, low=1e-08), 'reg_lambda': LogUniformDistribu

##### now update the meta model with best params from study and then update the sklearn pipeline with this new model

In [None]:
xgb_params.update(study.best_trial.params)
print("Final xgb_params:", xgb_params)
xgb_model = XGBClassifier(**xgb_params)
tmlt.update_model(xgb_model)
tmlt.spl

Final xgb_params: {'learning_rate': 0.152746085461703, 'eval_metric': 'auc', 'use_label_encoder': False, 'random_state': 42, 'booster': 'gblinear', 'colsample_bytree': 0.8000514536230441, 'early_stopping_rounds': 147, 'max_depth': 7, 'n_estimators': 15000, 'reg_alpha': 1.3693071141813216e-07, 'reg_lambda': 0.031987234461410045, 'subsample': 0.8214620713008292, 'tree_method': 'exact'}


#### Let's Use K-Fold Training with best params

In [None]:
# K-Fold fit and predict on test dataset
xgb_model_mean_metrics_results, xgb_model_test_preds= tmlt.do_kfold_training(n_splits=5,
                                                                            test_preds_metric=roc_auc_score)
if xgb_model_test_preds is not None:
    print(xgb_model_test_preds.shape)

Parameters: { "colsample_bytree", "early_stopping_rounds", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:02:21,265 INFO fold: 1 log_loss : 0.6044875546731054
2021-11-24 12:02:21,266 INFO fold: 1 roc_auc_score : 0.7267569310122501
2021-11-24 12:02:21,266 INFO fold: 1 accuracy_score : 0.69125
2021-11-24 12:02:21,267 INFO fold: 1 f1_score : 0.5612788632326821
2021-11-24 12:02:21,267 INFO fold: 1 precision_score : 0.6781115879828327
2021-11-24 12:02:21,268 INFO fold: 1 recall_score : 0.47878787878787876
2021-11-24 12:02:21,269 INFO Predicting Test Preds Probablities!


Parameters: { "colsample_bytree", "early_stopping_rounds", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:02:30,239 INFO fold: 2 log_loss : 0.6239040436223149
2021-11-24 12:02:30,240 INFO fold: 2 roc_auc_score : 0.6955383623468729
2021-11-24 12:02:30,240 INFO fold: 2 accuracy_score : 0.68875
2021-11-24 12:02:30,241 INFO fold: 2 f1_score : 0.5815126050420169
2021-11-24 12:02:30,242 INFO fold: 2 precision_score : 0.6528301886792452
2021-11-24 12:02:30,243 INFO fold: 2 recall_score : 0.5242424242424243
2021-11-24 12:02:30,243 INFO Predicting Test Preds Probablities!


Parameters: { "colsample_bytree", "early_stopping_rounds", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:02:39,207 INFO fold: 3 log_loss : 0.6432310473406687
2021-11-24 12:02:39,208 INFO fold: 3 roc_auc_score : 0.662849774339136
2021-11-24 12:02:39,209 INFO fold: 3 accuracy_score : 0.64625
2021-11-24 12:02:39,209 INFO fold: 3 f1_score : 0.4991150442477876
2021-11-24 12:02:39,210 INFO fold: 3 precision_score : 0.6
2021-11-24 12:02:39,211 INFO fold: 3 recall_score : 0.42727272727272725
2021-11-24 12:02:39,211 INFO Predicting Test Preds Probablities!


Parameters: { "colsample_bytree", "early_stopping_rounds", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:02:49,262 INFO fold: 4 log_loss : 0.617499795970507
2021-11-24 12:02:49,262 INFO fold: 4 roc_auc_score : 0.7093964789775765
2021-11-24 12:02:49,263 INFO fold: 4 accuracy_score : 0.69375
2021-11-24 12:02:49,264 INFO fold: 4 f1_score : 0.5840407470288624
2021-11-24 12:02:49,264 INFO fold: 4 precision_score : 0.6666666666666666
2021-11-24 12:02:49,265 INFO fold: 4 recall_score : 0.5196374622356495
2021-11-24 12:02:49,265 INFO Predicting Test Preds Probablities!


Parameters: { "colsample_bytree", "early_stopping_rounds", "max_depth", "subsample", "tree_method" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.




2021-11-24 12:02:58,122 INFO fold: 5 log_loss : 0.6016531882900744
2021-11-24 12:02:58,123 INFO fold: 5 roc_auc_score : 0.7217580633732503
2021-11-24 12:02:58,124 INFO fold: 5 accuracy_score : 0.6775
2021-11-24 12:02:58,124 INFO fold: 5 f1_score : 0.5582191780821918
2021-11-24 12:02:58,125 INFO fold: 5 precision_score : 0.6442687747035574
2021-11-24 12:02:58,125 INFO fold: 5 recall_score : 0.49244712990936557
2021-11-24 12:02:58,126 INFO Predicting Test Preds Probablities!
2021-11-24 12:02:58,155 INFO  Mean Metrics Results from all Folds are: {'log_loss': 0.6181551259793341, 'roc_auc_score': 0.7032599220098172, 'accuracy_score': 0.6795, 'f1_score': 0.5568332875267081, 'precision_score': 0.6483754436064604, 'recall_score': 0.48847752448960907}


(4000,)


In [None]:
# # take weighted average of both k-fold models predictions
# final_preds = ((0.45 * sci_model_preds) + (0.55* xgb_model_test_preds)) / 2
# print(final_preds.shape)

#### Create Kaggle Predictions

In [None]:
# sub = pd.read_csv(DIRECTORY_PATH + SAMPLE_SUB_FILE)
# sub['target'] = final_preds
# sub.to_csv('submission.csv', index=False)

In [None]:
# hide
# run the script to build 

from nbdev.export import notebook2script; notebook2script()