# Tunning XGBoost, Catboost and Lightgbm Classifiers Hyperparameters with Hyperopt

**Abstract**. This notebook is an example on how to use `hyperopt` to train `XGBoost`, `CatBoost` and `LightGBM` classifiers. A cross-validation object is also defined.

# The Dataset

For this notebook, we wil use the breast cancer dataset

In [7]:
from sklearn.datasets import load_breast_cancer
import pandas as pd

data = load_breast_cancer()
X = data['data']
y = data['target']

To prepare for training and validation, we will split this dataset into training and test

In [8]:
from sklearn.model_selection import StratifiedShuffleSplit

tran_size = 0.75
n_splits = 1

sss = StratifiedShuffleSplit(n_splits=n_splits, train_size=tran_size)
for train_index, test_index in sss.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

# A note about the hyperparameters of `XGBClassifier` (X), `LGBMClassifier` (L), and `CatBoostClassifier` (C)

Some basic parameters:

* `learning_rate` [X/L/C]: learning rate (alias: `eta` )
* `max_depth` [X/L/C]: maximum depth of trees
* `n_estimators` [X/L/C]: no. of boosting iterations
* `min_child_weight` [X/L]: minimum sum of instance (hessian) weight needed in a child
* `min_child_samples` [L/C]: minimum no. of data in one leaf
* `subsample` [X/L/C]: subsample ratio of the training instances (note that for CatBoost this parameter can be used only if either Poisson or Bernoulli bootstrap_type is * `selected`)
* `colsample_bytree` [X/L]: subsample ratio of columns in tree building
* `colsample_bylevel` [X/C]: subsample ratio of columns for each level in tree building
* `colsample_bynode` [X]: subsample ratio of columns for each node
* `tree_method` [X]: tree construction method
* `boosting` [L]: tree construction method
* `boosting_type` [C]: Ordered for ordered boosting or Plain for classic
* `early_stopping_rounds` [X/L/C]: parameter for fit() — stop the training if one metric of a validation data does not improve in last early_stopping_rounds rounds
* `eval_metric` [X/L/C]: evaluation metrics for validation data

# A note on `hyperopt`

There two options for the search algortihm: 
* `hyperopt.tpe.suggest` 
* `hyperopt.rand.suggest`

The search space is defined using uniform, loguniform, normal distribution along with its quantized variations. For instance, `uniform('x', -1, 1)` defines a search space with label `x` that will be sampled uniformly between -1 and 1. The expressions currently recognized by hyperopt’s optimization algorithms are:

* `hp.choice(label, options)`: index of an option
* `hp.randint(label, upper)`: random integer within $[0, \text{upper})$
* `hp.uniform(label, low, high)`: uniform value between low/high
* `hp.quniform(label, low, high, q)`: `round(uniform(.)/q)*q` (note that the value is giving a float instead of integer)
* `hp.loguniform(label, low, high)`: `exp(uniform(low, high)/q)*q`
* `hp.qloguniform(label, low, high, q)`: `round(loguniform())`
* `hp.normal(label, mu, sigma)`: sampling from normal distribution
* `hp.qnormal(label, mu, sigma, q)`: `round(normal(nu, sigma)/q)*q`
* `hp.lognormal(label, mu, sigma)`: `exp(normal(mu, sigma)`
* `hp.qlognormal(label, mu, sigma, q)` : `round(exp(normal(.))/q)*q`

# Finding optimal hyperparameters of `XGBClassifier` using `hyperopt`

The search space for the hyperparameters is defined by the following distributions

In [4]:
from hyperopt import hp

space = {
        'max_depth' : hp.randint('max_depth', 10),
        'learning_rate' : hp.quniform('learning_rate', 0.01, 0.5, 0.01),
        'n_estimators' : hp.randint('n_estimators', 250),
        'gamma' : hp.quniform('gamma', 0, 0.50, 0.01),
        'min_child_weight' : hp.quniform('min_child_weight', 1, 10, 1),
        'subsample' : hp.quniform('subsample', 0.1, 1, 0.01),
        'colsample_bytree' : hp.quniform('colsample_bytree', 0.1, 1.0, 0.01)
        }

A first cost function is defined for the `XGBoostClassifier` and it is compose of three steps
1. Instatiate the estimator with a selected set of hyperparameters and fit
2. Evaluate the estimator according to chosen performance metrics
3. Calculate the loss (ie, how close the chosen performance metrics are close to their optimal value) and return this value

In [5]:
import warnings

import numpy as np
from hyperopt import STATUS_FAIL, STATUS_OK, Trials, fmin, tpe
from sklearn.metrics import f1_score, log_loss, roc_auc_score
from xgboost import XGBClassifier

def xgb_cost(space, X_train, y_train):

    warnings.filterwarnings(action='ignore')

    # 1. Instantiate estimator with selected hyperparamters and fit
    classifier = XGBClassifier(n_estimators = space['n_estimators']
                            ,max_depth = int(space['max_depth'])
                            ,learning_rate = space['learning_rate']
                            ,gamma = space['gamma']
                            ,min_child_weight = space['min_child_weight']
                            ,subsample = space['subsample']
                            ,colsample_bytree = space['colsample_bytree']
                            ,use_label_encoder=False
                            ,eval_metric="logloss"
                            )
    classifier.fit(X_train, y_train)

    # 2. Evaluate the estimator according to chose performance metrics
    y_proba_train = classifier.predict_proba(X_train)[:, 1]
    y_class_train = classifier.predict(X_train)

    rocauc_train = roc_auc_score(y_train, y_proba_train)
    f1_train = f1_score(y_train, y_class_train)
    logloss_train = log_loss(y_train, y_class_train)

    # 3. Calculate the loss    
    loss = ((1.0 - rocauc_train)**2 + (1.0 - f1_train)**2 + logloss_train**2)

    return {'loss': np.sqrt(loss), 'status': STATUS_OK }


In the next step, the objective function is defined from the cost function and the minization problem is solved. The output are the optimal hyperparameters found in the search space.

In [6]:
objective = lambda x: xgb_cost(x, X_train=X_train, y_train=y_train)
trials = Trials()

best = fmin(fn=objective,
            space=space,
            algo=tpe.suggest,
            max_evals=100,
            trials=trials)

print("Optimal hyperparameters: ", best)

100%|██████████| 100/100 [00:08<00:00, 11.28trial/s, best loss: 9.992007221626413e-16]
Best:  {'colsample_bytree': 0.68, 'gamma': 0.24, 'learning_rate': 0.2, 'max_depth': 3, 'min_child_weight': 1.0, 'n_estimators': 38, 'subsample': 0.9400000000000001}


# Using `hyperopt` for Cross Validation

Here, a cross validation class is defined

In [25]:
import sys
import traceback
from collections.abc import Iterable
from typing import Union

from hyperopt import STATUS_FAIL, STATUS_OK, Trials, fmin, hp, tpe
import pandas as pd
import numpy as np
import scipy as sp
from sklearn.model_selection import KFold, StratifiedKFold
from sklearn.metrics import get_scorer

class BayesSearchCV:

    def __init__(self, estimator, param_distributions: dict, scoring: dict
        ,n_iter: int=10, weights_matrix: np.ndarray=None, cv: Union[int, Iterable]=5, random_state: int=None
        ,algo=tpe.suggest, trials: Trials=Trials()) -> None:
        """Use Bayesian optimisation to search for hyperpameters and selects best estimator based on validation sets

        Args:
            estimator: Estimator object
            param_distributions (dict): Search space containing hyperparameters
            scoring (dict): {metric: opt_value} Dict of performance metrics to measure the estimator performance and their corresponding optimal values.
                                Select one from sklearn.metrics.SCORERS.keys()
            n_iter (int, optional): Max number of iterations. Defaults to 10.
            weights_matrix (np.ndarray, optional): Symmetric positive definite matrix used to calculate the quadratic loss function
            cv (int or Iterable, optional): int, cross-validation generator or an iterable. Defaults to 5.
            random_state (int, optional): Pseudo random number generator state used for random uniform sampling. Defaults to None.
            algo (optional): Algorithm to for distribution search. Defaults to tpe.suggest.
            trials (Trials, optional): [description]. Defaults to Trials().
        """
        self.estimator = estimator
        self.param_distributions = param_distributions
        self.n_iter = n_iter
        self.random_state = random_state
        self.weights_matrix = weights_matrix or np.identity(len(scoring))
        self.cv = cv
        self.algo = algo
        self.trials = trials
        self.scoring = scoring

    def fit(self, X: pd.DataFrame, y=None) -> None:
        """Find optimal hyperparameters and fit estimator

        Args:
            X (pd.DataFrame): Predictors
            y (pd.DataFrame): Target
        """
        self.cv_results_ = pd.DataFrame()
        self.min_loss = np.inf

        for iteration, (train_index, val_index) in enumerate(self._get_splits(X, y)):
            X_train, X_val = X[train_index], X[val_index]
            y_train, y_val = y[train_index], y[val_index]

            objective = lambda space: self._cost(X=X_train, y=y_train, hyperparameters=space)

            try:
                hyperparameters = fmin(fn=objective, space=self.param_distributions
                            ,algo=self.algo, max_evals=self.n_iter
                            ,trials=self.trials)

            except KeyError:
                exc_info = sys.exc_info()
                traceback.print_exception(*exc_info)
                return {'status': STATUS_FAIL,
                        'exception': str(sys.exc_info())}

            estimator = self._instantiate_estimator(X_train, y_train, hyperarameters=hyperparameters)
            loss_df, current_loss = self._cost(X_val, y_val, hyperparameters, estimator=estimator, return_loss_df=True)
            loss_df["cv_iteration"] = iteration

            if current_loss < self.min_loss:
                self.min_loss = current_loss
                self.best_estimator_ = estimator
                self.best_hyperparameters_ = hyperparameters

            self.cv_results_ = pd.concat([self.cv_results_ , loss_df.copy()])

        self.cv_results_.rename(columns={col: f"{col}_loss" for col in self.cv_results_.columns if col != "loss"}, inplace=True)
        self.cv_results_.sort_values(by="loss", inplace=True)
        self.cv_results_.reset_index(inplace=True, drop=True)


    def _get_splits(self, X: pd.DataFrame, y=None):
        """Instantiate and/or get training and validation datasets

        Args:
            X (pd.DataFrame): Predictor
            y (pd.DataFrame): Target

        Yields:
            [type]: Train and test indices

        Raises:
            NotImplementedError: Only KFold and StratifiedKFold are implemented
        """

        if isinstance(self.cv, int): 
            self.cv = KFold(n_splits=self.cv, random_state=self.random_state)

        elif isinstance(self.cv, StratifiedKFold):
            pass

        else:
            msg = f"Cross validation not yet implemented for type {type(self.cv)}"
            NotImplementedError(msg)
            
        for train_index, test_index in self.cv.split(X, y):
            yield train_index, test_index


    def _cost(self, X: pd.DataFrame, y: pd.DataFrame, hyperparameters: dict
            ,estimator=None, return_loss_df: bool=False) -> dict:
        """Evaluates the cost function for the trained estimator using a quadratic loss function

        Args:
            X (pd.DataFrame): Predictor
            y (pd.DataFrame): Target
            hyperarameters (dict): Estimator hyperparameters
            return_loss_df (bool): Returns fit loss data frame

        Returns:
            (dict)
        """
        loss_dict = {metric_name: [] for metric_name in self.scoring}

        if not estimator:
            estimator = self._instantiate_estimator(X, y, hyperparameters)

        for p_metric, opt_value in self.scoring.items():
            scorer = get_scorer(p_metric)
            loss_dict[p_metric].append((opt_value - scorer(estimator, X, y))**2)

        self._check_pd(self.weights_matrix)

        loss_df = pd.DataFrame.from_dict(loss_dict)
        loss = loss_df.values.dot(self.weights_matrix.dot(loss_df.T.values))
        loss_df["loss"] = loss

        if return_loss_df:
            return loss_df, loss

        return {'loss': np.sqrt(loss), 'status': STATUS_OK}


    def _instantiate_estimator(self, X: pd.DataFrame, y: pd.DataFrame
                            ,hyperarameters: dict):
        """Instantiate estimator with selected hyperparameters

        Args:
            X (pd.DataFrame): Predictors
            y (pd.DataFrame): Target
            hyperarameters (dict): Estimator hyperparameters

        Returns:
            [type]: Estimator
        """
        estimator_cls = self.estimator.__class__
        estimator = estimator_cls(**hyperarameters)
        estimator.fit(X, y)
        return estimator


    @staticmethod
    def _check_spd(m: np.ndarray, tol: float=1e-8) -> bool:
        """Checks if a matrix is symmetric positive definite

        Args:
            m (np.ndarray): Matrix

        Returns:
            (bool): True if matrix is symmetric positive definite
        """

        try:
            # Check if matrix is positive definite
            np.linalg.cholesky(m)

        except np.linalg.linalg.LinAlgError as err:
            if 'Matrix is not positive definite' in err.message:
                return False

            else:
                raise 

        else:
            # Now that m is positive definite, check if it is symmetric
            return sp.sparse.linalg.norm(m-m.T, sp.Inf) < tol

## Example with `XGBClassifier`

In [26]:
from xgboost import XGBClassifier

space = {
        'max_depth' : hp.randint('max_depth', 10),
        'learning_rate' : hp.quniform('learning_rate', 0.01, 0.75, 0.01),
        'n_estimators' : hp.randint('n_estimators', 500),
        'gamma' : hp.quniform('gamma', 0, 0.50, 0.01),
        'min_child_weight' : hp.quniform('min_child_weight', 1, 10, 1),
        'subsample' : hp.quniform('subsample', 0.1, 1, 0.01),
        'colsample_bytree' : hp.quniform('colsample_bytree', 0.1, 1.0, 0.01),
        'scale_pos_weight': hp.normal("scale_pos_weight", np.mean(y_train), 0.01*np.mean(y_train)),
        'eval_metric': "logloss",
        "use_label_encoder": False
        }

cv = BayesSearchCV(XGBClassifier(learning_rate=0.1), param_distributions=space, scoring={"roc_auc": 1, "f1": 1, "neg_log_loss": 0}, cv=StratifiedKFold(), n_iter=250)
cv.fit(X_train, y_train)
print("Loss value according to each performance metric:\n", cv.cv_results_)
print("Optimal hyperparameters: ", cv.best_hyperparameters_)

100%|██████████| 250/250 [00:30<00:00,  8.11trial/s, best loss: 7.045292227633089e-05]
100%|██████████| 250/250 [00:00<?, ?trial/s, best loss=?]




100%|██████████| 250/250 [00:00<?, ?trial/s, best loss=?]




100%|██████████| 250/250 [00:00<?, ?trial/s, best loss=?]




100%|██████████| 250/250 [00:00<?, ?trial/s, best loss=?]
Loss value according to each performance metric:
    roc_auc_loss   f1_loss  neg_log_loss_loss          loss  cv_iteration_loss
0  0.000000e+00  0.000000           0.000284  8.078521e-08                  0
1  3.476549e-07  0.000091           0.002391  5.724499e-06                  4
2  1.703509e-05  0.000816           0.009109  8.364266e-05                  3
3  1.284670e-05  0.000786           0.014083  1.989465e-04                  1
4  5.562478e-04  0.000786           0.030698  9.432685e-04                  2
Optimal hyperparameters:  {'colsample_bytree': 0.41000000000000003, 'gamma': 0.01, 'learning_rate': 0.43, 'max_depth': 7, 'min_child_weight': 1.0, 'n_estimators': 490, 'scale_pos_weight': 0.6369590112899874, 'subsample': 0.9500000000000001}




## `CatBoostClassifier`

In [55]:
from catboost import CatBoostClassifier
space = {
        'learning_rate':     hp.choice('learning_rate',     np.arange(0.05, 0.31, 0.05)),
        'max_depth':         hp.choice('max_depth',         np.arange(5, 16, 1, dtype=int)),
        'n_estimators':      100,
        'verbose': False
    }

cv = BayesSearchCV(CatBoostClassifier(), param_distributions=space, scoring={"roc_auc": 1, "f1": 1}, cv=StratifiedKFold())
cv.fit(X_train, y_train)

100%|██████████| 10/10 [05:35<00:00, 33.53s/trial, best loss: 0.0]
0:	learn: 0.4540327	total: 2.34ms	remaining: 2.34s
1:	learn: 0.6530328	total: 4.58ms	remaining: 2.29s
2:	learn: 0.3165582	total: 10.5ms	remaining: 3.49s
3:	learn: 0.1416187	total: 13.9ms	remaining: 3.45s
4:	learn: 0.0798963	total: 17.8ms	remaining: 3.54s
5:	learn: 0.0447153	total: 20.6ms	remaining: 3.41s
6:	learn: 0.0158235	total: 23.1ms	remaining: 3.28s
7:	learn: 0.0033076	total: 24.9ms	remaining: 3.08s
8:	learn: 0.0010312	total: 27ms	remaining: 2.98s
9:	learn: 0.0010300	total: 29.9ms	remaining: 2.96s


learning rate is greater than 1. You probably need to decrease learning rate.
learning rate is greater than 1. You probably need to decrease learning rate.
learning rate is greater than 1. You probably need to decrease learning rate.


10:	learn: 0.0010295	total: 33.8ms	remaining: 3.04s
11:	learn: 0.0010292	total: 39.9ms	remaining: 3.29s
12:	learn: 0.0010286	total: 51.4ms	remaining: 3.9s
13:	learn: 0.0010285	total: 55.3ms	remaining: 3.89s
14:	learn: 0.0010283	total: 60.7ms	remaining: 3.98s
15:	learn: 0.0010279	total: 64.5ms	remaining: 3.97s
16:	learn: 0.0010270	total: 70.6ms	remaining: 4.08s
17:	learn: 0.0010270	total: 79.7ms	remaining: 4.35s
18:	learn: 0.0010264	total: 84.6ms	remaining: 4.37s
19:	learn: 0.0010261	total: 98.3ms	remaining: 4.82s
20:	learn: 0.0010260	total: 107ms	remaining: 4.97s
21:	learn: 0.0010258	total: 109ms	remaining: 4.86s
22:	learn: 0.0010250	total: 111ms	remaining: 4.72s
23:	learn: 0.0010250	total: 115ms	remaining: 4.67s
24:	learn: 0.0010246	total: 122ms	remaining: 4.75s
25:	learn: 0.0010207	total: 132ms	remaining: 4.94s
26:	learn: 0.0010189	total: 137ms	remaining: 4.94s
27:	learn: 0.0010188	total: 145ms	remaining: 5.03s
28:	learn: 0.0010179	total: 154ms	remaining: 5.17s
29:	learn: 0.0010170	t

learning rate is greater than 1. You probably need to decrease learning rate.
learning rate is greater than 1. You probably need to decrease learning rate.
learning rate is greater than 1. You probably need to decrease learning rate.


2:	learn: 0.2895749	total: 45.1ms	remaining: 15s
3:	learn: 0.1699666	total: 72.7ms	remaining: 18.1s
4:	learn: 0.2183264	total: 77.8ms	remaining: 15.5s
5:	learn: 0.3419597	total: 83.6ms	remaining: 13.8s
6:	learn: 0.1205548	total: 90.5ms	remaining: 12.8s
7:	learn: 0.0412798	total: 97.9ms	remaining: 12.1s
8:	learn: 0.0176418	total: 137ms	remaining: 15.1s
9:	learn: 0.0023512	total: 146ms	remaining: 14.5s
10:	learn: 0.0008351	total: 157ms	remaining: 14.1s
11:	learn: 0.0008345	total: 162ms	remaining: 13.3s
12:	learn: 0.0008339	total: 165ms	remaining: 12.5s
13:	learn: 0.0008328	total: 167ms	remaining: 11.8s
14:	learn: 0.0008268	total: 170ms	remaining: 11.2s
15:	learn: 0.0008265	total: 172ms	remaining: 10.6s
16:	learn: 0.0008254	total: 174ms	remaining: 10.1s
17:	learn: 0.0006860	total: 177ms	remaining: 9.68s
18:	learn: 0.0006859	total: 183ms	remaining: 9.47s
19:	learn: 0.0006858	total: 187ms	remaining: 9.18s
20:	learn: 0.0006857	total: 192ms	remaining: 8.95s
21:	learn: 0.0006855	total: 196ms	r

learning rate is greater than 1. You probably need to decrease learning rate.
learning rate is greater than 1. You probably need to decrease learning rate.
learning rate is greater than 1. You probably need to decrease learning rate.


5:	learn: 0.1767043	total: 31.3ms	remaining: 5.19s
6:	learn: 0.0049168	total: 39.1ms	remaining: 5.55s
7:	learn: 0.0022693	total: 42.5ms	remaining: 5.27s
8:	learn: 0.0014721	total: 45.5ms	remaining: 5.01s
9:	learn: 0.0011948	total: 48.9ms	remaining: 4.84s
10:	learn: 0.0011873	total: 52.7ms	remaining: 4.73s
11:	learn: 0.0011820	total: 59ms	remaining: 4.85s
12:	learn: 0.0011782	total: 61.7ms	remaining: 4.68s
13:	learn: 0.0011760	total: 66.4ms	remaining: 4.67s
14:	learn: 0.0011713	total: 74.4ms	remaining: 4.89s
15:	learn: 0.0011170	total: 78.6ms	remaining: 4.83s
16:	learn: 0.0010755	total: 90.3ms	remaining: 5.22s
17:	learn: 0.0010369	total: 100ms	remaining: 5.47s
18:	learn: 0.0010355	total: 107ms	remaining: 5.54s
19:	learn: 0.0010346	total: 123ms	remaining: 6.05s
20:	learn: 0.0010246	total: 129ms	remaining: 5.99s
21:	learn: 0.0010232	total: 140ms	remaining: 6.23s
22:	learn: 0.0010231	total: 144ms	remaining: 6.12s
23:	learn: 0.0010225	total: 151ms	remaining: 6.13s
24:	learn: 0.0010052	total

learning rate is greater than 1. You probably need to decrease learning rate.
learning rate is greater than 1. You probably need to decrease learning rate.
learning rate is greater than 1. You probably need to decrease learning rate.


7:	learn: 0.0281713	total: 48.7ms	remaining: 6.03s
8:	learn: 0.0255218	total: 64.1ms	remaining: 7.05s
9:	learn: 0.0218161	total: 67.9ms	remaining: 6.72s
10:	learn: 0.0052551	total: 71.6ms	remaining: 6.44s
11:	learn: 0.0018230	total: 74.4ms	remaining: 6.12s
12:	learn: 0.0018211	total: 76.4ms	remaining: 5.8s
13:	learn: 0.0018207	total: 79.3ms	remaining: 5.59s
14:	learn: 0.0018194	total: 81.5ms	remaining: 5.35s
15:	learn: 0.0016223	total: 83.5ms	remaining: 5.13s
16:	learn: 0.0013918	total: 89.1ms	remaining: 5.15s
17:	learn: 0.0013916	total: 91.1ms	remaining: 4.97s
18:	learn: 0.0013915	total: 97.1ms	remaining: 5.01s
19:	learn: 0.0013907	total: 107ms	remaining: 5.25s
20:	learn: 0.0013907	total: 112ms	remaining: 5.22s
21:	learn: 0.0013906	total: 123ms	remaining: 5.48s
22:	learn: 0.0013905	total: 131ms	remaining: 5.57s
23:	learn: 0.0013904	total: 148ms	remaining: 6.01s
24:	learn: 0.0013900	total: 152ms	remaining: 5.94s
25:	learn: 0.0013896	total: 157ms	remaining: 5.87s
26:	learn: 0.0013892	to

learning rate is greater than 1. You probably need to decrease learning rate.
learning rate is greater than 1. You probably need to decrease learning rate.
learning rate is greater than 1. You probably need to decrease learning rate.


4:	learn: 0.1944775	total: 58.3ms	remaining: 11.6s
5:	learn: 0.0835758	total: 63ms	remaining: 10.4s
6:	learn: 0.0363021	total: 80.3ms	remaining: 11.4s
7:	learn: 0.0324716	total: 85.4ms	remaining: 10.6s
8:	learn: 0.0006994	total: 104ms	remaining: 11.4s
9:	learn: 0.0006991	total: 109ms	remaining: 10.8s
10:	learn: 0.0006989	total: 118ms	remaining: 10.6s
11:	learn: 0.0006355	total: 128ms	remaining: 10.5s
12:	learn: 0.0006348	total: 136ms	remaining: 10.3s
13:	learn: 0.0006345	total: 142ms	remaining: 10s
14:	learn: 0.0005826	total: 147ms	remaining: 9.65s
15:	learn: 0.0005826	total: 153ms	remaining: 9.44s
16:	learn: 0.0005825	total: 157ms	remaining: 9.1s
17:	learn: 0.0005815	total: 164ms	remaining: 8.97s
18:	learn: 0.0005806	total: 175ms	remaining: 9.02s
19:	learn: 0.0005805	total: 184ms	remaining: 9s
20:	learn: 0.0005803	total: 187ms	remaining: 8.71s
21:	learn: 0.0005802	total: 190ms	remaining: 8.47s
22:	learn: 0.0005799	total: 194ms	remaining: 8.25s
23:	learn: 0.0005798	total: 199ms	remaini

In [56]:
print("Loss value according to each performance metric:\n", cv.cv_results_)
print("Optimal hyperparameters: ", cv.best_hyperparameters_)

Loss value according to each performance metric:
    roc_auc_loss   f1_loss      loss  cv_iteration_loss
0      0.000301  0.001322  0.001624                  0
1      0.000184  0.004619  0.004803                  4
2      0.000660  0.005917  0.006577                  1
3      0.001252  0.006818  0.008069                  3
4      0.001956  0.009612  0.011567                  2
Optimal hyperparameters:  {'learning_rate': 3, 'max_depth': 4}


## `LGBMClassifier`

In [50]:
from lightgbm import LGBMClassifier

lgb_reg_params = {
    'learning_rate':    hp.choice('learning_rate',    np.arange(0.05, 0.31, 0.05)),
    'max_depth':        hp.choice('max_depth',        np.arange(5, 16, 1, dtype=int)),
    'min_child_weight': hp.choice('min_child_weight', np.arange(1, 8, 1, dtype=int)),
    'colsample_bytree': hp.choice('colsample_bytree', np.arange(0.3, 0.8, 0.1)),
    'subsample':        hp.uniform('subsample', 0.8, 1),
    'n_estimators':     100,
}

space = {
        'learning_rate':    hp.choice('learning_rate',    np.arange(0.05, 0.31, 0.05)),
        'max_depth':        hp.choice('max_depth',        np.arange(5, 16, 1, dtype=int)),
        'min_child_weight': hp.choice('min_child_weight', np.arange(1, 8, 1, dtype=int)),
        'colsample_bytree': hp.choice('colsample_bytree', np.arange(0.3, 0.8, 0.1)),
        'subsample':        hp.uniform('subsample', 0.8, 1),
        'n_estimators':     100,
    }
cv = BayesSearchCV(LGBMClassifier(), param_distributions=space, scoring={"roc_auc": 1, "f1": 1}, cv=StratifiedKFold())
cv.fit(X_train, y_train)
print(cv.cv_results_)
print(cv.best_hyperparameters_)
print(cv.min_loss)

100%|██████████| 10/10 [00:00<00:00, 20.63trial/s, best loss: 0.0]


[LightGBM] [Fatal] Check failed: (feature_fraction) > (0.0) at /__w/1/s/python-package/compile/src/io/config_auto.cpp, line 362 .



LightGBMError: Check failed: (feature_fraction) > (0.0) at /__w/1/s/python-package/compile/src/io/config_auto.cpp, line 362 .


# References

https://towardsdatascience.com/an-example-of-hyperparameter-optimization-on-xgboost-lightgbm-and-catboost-using-hyperopt-12bc41a271e
