# Introduction

In this notebook, we will train and tune sklearn's HistGBM classifier using Bayesian optimization package skopt to find the best hyperparameters.

For beginners who has used GridSearchCV and RandomizedSearchCV all the time, skopt's BayesSearchCV provides a Bayesian hyperparameter optimization with very small necessary changes (since it's derived from the same sklearn base class).

Feel free to upvote and fork if you feel like it. Enjoy!

In [None]:
!pip install -U scikit-learn --progress-bar off >> z_pip.log

In [None]:
import numpy as pd
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
train_df = pd.read_csv('../input/tabular-playground-series-nov-2021/train.csv')
test_df = pd.read_csv('../input/tabular-playground-series-nov-2021/test.csv')
ss = pd.read_csv('../input/tabular-playground-series-nov-2021/sample_submission.csv')

X = train_df.drop(['target', 'id'], axis = 1).values
y = train_df['target'].values
X_test = test_df.drop('id', axis = 1).values

del train_df, test_df

# Find best hyperparameter

Note: for the purposes of demonstration, I will tune most of the hyperparameters, including turning off early stopping and instead trying to tune max_iter. Whether this is a good approach in practice, I'm not sure.

In [None]:
from sklearn.ensemble import HistGradientBoostingClassifier
from skopt import BayesSearchCV, plots, space

# Intentionally turn early stopping off, tune max_iter instead
model = HistGradientBoostingClassifier(early_stopping = False)

# Define parameter spaces using classes provided in skopt.space
params = {
    'learning_rate': space.Real(1e-3, 1, prior = 'log-uniform'),
    'max_iter': space.Integer(25, 1_000),
    
    'max_leaf_nodes': space.Integer(4, 64),
    'max_depth': space.Integer(3, 15),
    'min_samples_leaf': space.Integer(2, 60_000, prior = 'log-uniform'),
    
    'l2_regularization': space.Real(1e-3, 1e3, prior = 'log-uniform'),
    'max_bins': space.Integer(31, 255)
}

bs = BayesSearchCV(model, params, n_iter = 100, cv = 3, scoring = 'roc_auc',
                   refit = False)

# Fit the search, i.e. begin finding the best hyperparameters
bs.fit(X, y)

# Set the best hyperparameters onto our model
model.set_params(**bs.best_params_)

# Visualize

The following plot will tell us the variation of loss w.r.t. choices of hyperparameters. It also shows the existence of interaction (or lack thereof) between hyperparameters.

In [None]:
plots.plot_objective(bs.optimizer_results_[0],
                     n_minimum_search=int(1e8))
plt.show()

# Fit and submit

In [None]:
submit = model.fit(X, y).predict_proba(X_test)[:, 1]

ss['target'] = submit
ss.to_csv('submission.csv', index = False)