## <font color='darkblue'>You Are Missing Out on LightGBM. It Crushes XGBoost in Every Aspect</font>
([article source](https://towardsdatascience.com/how-to-beat-the-heck-out-of-xgboost-with-lightgbm-comprehensive-tutorial-5eba52195997)) <font size='3ptx'><b>Not anymore, XGBoost, not anymore</b></font>

Learn how to crush <b><a href='https://xgboost.readthedocs.io/en/latest/python/python_intro.html'>XGBoost</a></b> in this comprehensive <b><a href='https://lightgbm.readthedocs.io/en/latest/Python-Intro.html'>LightGBM</a></b> tutorial.

So many people are drawn to XGBoost like a moth to a flame. Yes, it has seen some glorious days in prestigious competitions, and it’s still the most widely-used ML library.

But, it has been 4 years since XGBoost lost its top spot in terms of performance. In 2017, Microsoft open-sourced <b><a href='https://lightgbm.readthedocs.io/en/latest/Python-Intro.html'>LightGBM</a></b> (<font color='brown'>Light Gradient Boosting Machine</font>) that gives equally high accuracy with 2–10 times less training speed.

This is a game-changing advantage considering the ubiquity of massive, million-row datasets. There are other distinctions that tip the scales towards LightGBM and give it an edge over XGBoost.

By the end of this post, you will learn about these advantages, including:
* How to develop LightGBM models for classification and regression tasks
* Structural differences between XGBoost and LGBM
* How to use early stopping and evaluation sets
* Enabling powerful categorical feature support for up 8x times speed increase
* Implementing successful cross-validation with LGBM
* Hyperparameter tuning with <b><a href='https://optuna.org/'>Optuna</a></b> (Part II)

### <font color='darkgreen'>Agenda</a>
* <font size='3ptx'><b><a href='#sect1'>XGBoost vs. LightGBM</a></b></font>
* <font size='3ptx'><b><a href='#sect2'>Model initialization and objectives</a></b></font>
* <font size='3ptx'><b><a href='#sect3'>Controlling the number of decision trees</a></b></font>
* <font size='3ptx'><b><a href='#sect4'>Early stopping</a></b></font>
* <font size='3ptx'><b><a href='#sect5'>Eval sets and metrics</a></b></font>
* <font size='3ptx'><b><a href='#sect6'>Establish a baseline</a></b></font>
* <font size='3ptx'><b><a href='#sect7'>Categorical and missing values support</a></b></font>
* <font size='3ptx'><b><a href='#sect8'>Cross-validation with LightGBM</a></b></font>

<a id='sect1'></a>
## <font color='darkblue'>XGBoost vs. LightGBM</font>
<font size='3ptx'><b>When LGBM got released, it came with ground-breaking changes to the way it grows decision trees.</b></font>
> Both XGBoost and LightGBM are ensebmle algorithms. They use a special type of decision trees, also called weak learners, to capture complex, non-linear patterns.

<br/>

In XGBoost (<font color='brown'>and many other libraries</font>), decision trees were built one level at a time:
![How XGBoost built trees](images/1.png)
<br/>

<b>This type of structure tends to result in unnecessary nodes and leaves because the trees continued to build until the `max_depth` reached</b>. This led to higher model complexity and training cost runtime.

In contrast, <b><a href='https://lightgbm.readthedocs.io/en/latest/Python-Intro.html'>LightGBM</a></b> takes a leaf-wise approach:
![How LightGBM built trees](images/2.png)
<br/>

The structure continues to grow with the most promising branches and leaves (<font color='brown'>nodes with the most delta loss</font>), holding the number of the decision leaves constant. (<font color='brown'>If this doesn’t make sense to you, don’t sweat. This won’t prevent you from effectively using LGBM</font>).

This is one of the main reasons LGBM crushed XGBoost in terms of speed when it first came out.
![Training time comparison between LGBM and XGBoost](images/3.png)
<br/>

Above is a benchmark comparison of XGBoost with traditional decision trees and LGBM with leaf-wise structure (<font color='brown'>first and last columns</font>) on datasets with ~500k-13M samples. It shows that <b>LGBM is orders of magnitude faster than XGB</b>.

<b>LGBM also uses <a href='https://lightgbm.readthedocs.io/en/latest/Features.html#optimization-in-speed-and-memory-usage'>histogram binning</a> of continuous features, which provides even more speed-up than traditional gradient boosting</b>. Binning numeric values significantly decrease the number of split points to consider in decision trees, and they remove the need to use sorting algorithms, which are always computation-heavy (<font color='brown'>check last two columns</font>).
![Training time comparison between LGBM and XGBoost using histogram-binning](images/4.png)
<br/>

We will continue exploring the differences in the coming sections.

<a id='sect2'></a>
## <font color='darkblue'>Model initialization and objectives</font>
<font size='3ptx'><b>Like XGBoost, LGBM has two APIs — core learning API and Sklearn-compatible one. You know I am a big fan of <a href='https://scikit-learn.org/stable/'>Sklearn</a>, so this tutorial will focus on that version.</b></font>
> Sklearn-compatible API of XGBoost and LGBM allows you to integrate their models in the Sklearn ecosystem so that you can use them inside pipelines in combination with other transformers.

<br/>

Sklearn API exposes <b><a href='https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html'>LGBMRegressor</a></b> and <b><a href='https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier'>LGBMClassifier</a></b>, with the familiar `fit/predict/predict_proba` pattern:

In [20]:
#!!pip install lightgbm
#!!pip install --upgrade sklearn

In [21]:
import lightgbm as lgbm  # standard alias

clf = lgbm.LGBMClassifier(objective="binary")  # or 'mutliclass'
reg = lgbm.LGBMRegressor()  # default - 'regression'

<b>objective specifies the type of learning task</b>. Besides the common ones like `binary`, `multiclass` and `regression` tasks, there are others like `poisson`, `tweedie` regressions. See <a href='https://lightgbm.readthedocs.io/en/latest/Parameters.html#core-parameters'>this section</a> of the documentation for the full list of objectives.

<a id='sect3'></a>
## <font color='darkblue'>Controlling the number of decision trees</font>
* <b><a href='#sect3_1'>Loading dataset</a></b>
* <b><a href='#sect3_2'>Train Classifier</a></b>

<font size='3ptx'><b>The number of decision trees inside the ensemble significantly affects the results</b></font>. You can control it using the `n_estimators` parameter in both the classifier and regressor.

In [36]:
import numpy as np
import pandas as pd
import time
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss

RANDOM_STATE = 1121218

<a id='sect3_1'></a>
### <font color='darkgreen'>Loading dataset</font>
Below, we will fit an <b><a href='https://lightgbm.readthedocs.io/en/latest/index.html'>LGBM</a></b> binary classifier on the <a href='https://www.kaggle.com/c/tabular-playground-series-mar-2021/data'>Kaggle TPS March dataset</a> with 1000 decision trees:

In [6]:
tps_march = pd.read_csv("../../datas/kaggle_tabular_playground_series_mar_2021/train.csv")
tps_march.head()

Unnamed: 0,id,cat0,cat1,cat2,cat3,cat4,cat5,cat6,cat7,cat8,...,cont2,cont3,cont4,cont5,cont6,cont7,cont8,cont9,cont10,target
0,0,A,I,A,B,B,BI,A,S,Q,...,0.759439,0.795549,0.681917,0.621672,0.592184,0.791921,0.815254,0.965006,0.665915,0
1,1,A,I,A,A,E,BI,K,W,AD,...,0.386385,0.541366,0.388982,0.357778,0.600044,0.408701,0.399353,0.927406,0.493729,0
2,2,A,K,A,A,E,BI,A,E,BM,...,0.343255,0.616352,0.793687,0.552877,0.352113,0.388835,0.412303,0.292696,0.549452,0
3,3,A,K,A,C,E,BI,A,Y,AD,...,0.831147,0.807807,0.800032,0.619147,0.221789,0.897617,0.633669,0.760318,0.934242,0
4,4,A,I,G,B,E,BI,C,G,Q,...,0.338818,0.277308,0.610578,0.128291,0.578764,0.279167,0.351103,0.357084,0.32896,1


In [7]:
tps_march.shape

(300000, 32)

In [8]:
tps_march.dtypes.value_counts()

object     19
float64    11
int64       2
dtype: int64

In [9]:
X, y = tps_march.drop("target", axis=1), tps_march[["target"]].values.flatten()

In [11]:
# Encode categoricals
X_enc = pd.get_dummies(X)
X_enc.head()

Unnamed: 0,id,cont0,cont1,cont2,cont3,cont4,cont5,cont6,cont7,cont8,...,cat16_C,cat16_D,cat17_A,cat17_B,cat17_C,cat17_D,cat18_A,cat18_B,cat18_C,cat18_D
0,0,0.629858,0.855349,0.759439,0.795549,0.681917,0.621672,0.592184,0.791921,0.815254,...,0,1,0,0,0,1,0,1,0,0
1,1,0.370727,0.328929,0.386385,0.541366,0.388982,0.357778,0.600044,0.408701,0.399353,...,0,0,0,0,0,1,0,1,0,0
2,2,0.502272,0.322749,0.343255,0.616352,0.793687,0.552877,0.352113,0.388835,0.412303,...,0,1,0,0,0,1,0,1,0,0
3,3,0.934242,0.707663,0.831147,0.807807,0.800032,0.619147,0.221789,0.897617,0.633669,...,0,1,0,0,0,1,0,1,0,0
4,4,0.254427,0.274514,0.338818,0.277308,0.610578,0.128291,0.578764,0.279167,0.351103,...,0,0,0,0,0,1,0,1,0,0


<a id='sect3_2'></a>
### <font color='darkgreen'>Train Classifier</font>

In [29]:
%%time
clf = lgbm.LGBMClassifier(objective="binary", n_estimators=1000, random_state=RANDOM_STATE)
clf.fit(X_enc, y)

Wall time: 10.2 s


LGBMClassifier(n_estimators=1000, objective='binary', random_state=1121218)

<b>Adding more trees leads to more accuracy but increases the risk of overfitting</b>. To combat this, you can create many trees (<font color='brown'>+2000</font>) and choose a smaller `learning_rate` (<font color='brown'>more on this later</font>).

In [15]:
%%time
clf = lgbm.LGBMClassifier(objective="binary", n_estimators=3000, learning_rate=0.1, random_state=RANDOM_STATE)
clf.fit(X_enc, y)

Wall time: 21.2 s


LGBMClassifier(n_estimators=3000, objective='binary', random_state=1121218)

Like in XGBoost, fitting a single decision tree to the data is called a <b><font color='darkblue'>boosting round</font></b>.

<a id='sect4'></a>
## <font color='darkblue'>Early stopping</font>
<font size='3ptx'><b>Each tree in the ensemble builds on the predictions of the last tree — i.e., each boosting round is an improvement of the last</b></font>.

If the predictions don’t improve after a sequence of rounds, it is sensible to stop the training of the ensemble even if we are not at a hard stop for <font color='violet'>n_estimators</font>. To achieve this, LGBM provides <font color='violet'>early_stopping_rounds</font> parameter inside the fit function. For example, setting it to 100 means we stop the training if the predictions have not improved for the last 100 rounds.

Before looking at a code example, we should learn a couple of concepts connected to early stopping.

<a id='sect6'></a>
## <font color='darkblue'>Eval sets and metrics</font>
<b>Early stopping is only enabled when you pass a set of evaluation sets to <font color='violet'>eval_set</font> parameter of the `fit` method</b>. These evaluation sets are used to keep track of the quality of the predictions from one boosting round to the next:

In [24]:
X_train, X_eval, y_train, y_eval = train_test_split(X_enc, y, test_size=0.1)

clf = lgbm.LGBMClassifier(objective="binary", n_estimators=10000)
eval_set = [(X_eval, y_eval)]

clf.fit(
    X_train,
    y_train,
    eval_set=eval_set,
    early_stopping_rounds=100,
    eval_metric="binary_logloss",
)

[1]	valid_0's binary_logloss: 0.542166
Training until validation scores don't improve for 100 rounds
[2]	valid_0's binary_logloss: 0.51505
[3]	valid_0's binary_logloss: 0.493516
[4]	valid_0's binary_logloss: 0.475673
[5]	valid_0's binary_logloss: 0.461006
[6]	valid_0's binary_logloss: 0.448741
[7]	valid_0's binary_logloss: 0.4381
[8]	valid_0's binary_logloss: 0.42905
[9]	valid_0's binary_logloss: 0.421275
[10]	valid_0's binary_logloss: 0.414703
[11]	valid_0's binary_logloss: 0.408893
[12]	valid_0's binary_logloss: 0.403806
[13]	valid_0's binary_logloss: 0.399383
[14]	valid_0's binary_logloss: 0.395376
[15]	valid_0's binary_logloss: 0.39188
[16]	valid_0's binary_logloss: 0.388631
[17]	valid_0's binary_logloss: 0.385757
[18]	valid_0's binary_logloss: 0.383222
[19]	valid_0's binary_logloss: 0.380936
[20]	valid_0's binary_logloss: 0.378975
[21]	valid_0's binary_logloss: 0.377094
[22]	valid_0's binary_logloss: 0.375491
[23]	valid_0's binary_logloss: 0.373987
[24]	valid_0's binary_logloss: 0

[208]	valid_0's binary_logloss: 0.346423
[209]	valid_0's binary_logloss: 0.346443
[210]	valid_0's binary_logloss: 0.346438
[211]	valid_0's binary_logloss: 0.346452
[212]	valid_0's binary_logloss: 0.346457
[213]	valid_0's binary_logloss: 0.34646
[214]	valid_0's binary_logloss: 0.346465
[215]	valid_0's binary_logloss: 0.346433
[216]	valid_0's binary_logloss: 0.346392
[217]	valid_0's binary_logloss: 0.346358
[218]	valid_0's binary_logloss: 0.346362
[219]	valid_0's binary_logloss: 0.346328
[220]	valid_0's binary_logloss: 0.346308
[221]	valid_0's binary_logloss: 0.346298
[222]	valid_0's binary_logloss: 0.346297
[223]	valid_0's binary_logloss: 0.346284
[224]	valid_0's binary_logloss: 0.346296
[225]	valid_0's binary_logloss: 0.346292
[226]	valid_0's binary_logloss: 0.346282
[227]	valid_0's binary_logloss: 0.346303
[228]	valid_0's binary_logloss: 0.34628
[229]	valid_0's binary_logloss: 0.346267
[230]	valid_0's binary_logloss: 0.346281
[231]	valid_0's binary_logloss: 0.346279
[232]	valid_0's bi

LGBMClassifier(n_estimators=10000, objective='binary')

In each round of <font color='violet'>n_estimators</font>, a single decision tree is fit to `(X_train, y_train)` and predictions are made on the passed evaluation set `(X_eval, y_eval)`. The quality of predictions is measured with a passed metric in <font color='violet'>eval_metric</font>.

The training stops at the 386th iteration because the validation score has not improved since the 286th one — early stopping of 100 rounds. Now, we have the luxury of creating as many trees as we want and <font color='violet'>early_stopping_rounds</font> can discard the unnecessary ones.

<a id='sect6'></a>
## <font color='darkblue'>Establish a baseline</font>
<font size='3ptx'><b>Let’s establish a baseline score with what we know so far</b></font>. We will do the same for XGBoost so that we can compare the results:
* <a href='#sect6_1'><b>Loading dataset</b></a>
* <a href='#sect6_2'><b>Train XGBoost</b></a>
* <a href='#sect6_3'><b>Train LGBM</b></a>

<a id='sect6_1'></a>
### <font color='darkgreen'>Loading dataset</font>

In [26]:
X, y = tps_march.drop("target", axis=1), tps_march[["target"]].values.flatten()

# Encode categoricals
X_enc = pd.get_dummies(X)
X_train, X_eval, y_train, y_eval = train_test_split(
    X_enc, y, test_size=0.2, stratify=y)

<a id='sect6_2'></a>
### <font color='darkgreen'>Train XGBoost</font>

In [27]:
%%time

xgb_clf = xgb.XGBClassifier(
    objective="binary:logistic",
    random_state=RANDOM_STATE,
    n_estimators=10000,
    tree_method="hist",  # enable histogram binning in XGB
)

xgb_clf.fit(
    X_train,
    y_train,
    eval_set=[(X_eval, y_eval)],
    eval_metric="logloss",
    early_stopping_rounds=150,
    verbose=False,  # Disable logs
)

preds = xgb_clf.predict_proba(X_eval)
print(f"XGBoost logloss on the evaluation set: {log_loss(y_eval, preds):.5f}")



XGBoost logloss on the evaluation set: 0.34906
Wall time: 34.6 s


<a id='sect6_3'></a>
### <font color='darkgreen'>Train LGBM</font>

In [30]:
%%time

lgbm_clf = lgbm.LGBMClassifier(
    objective="binary",
    random_state=1121218,
    n_estimators=10000,
    boosting="gbdt",  # default histogram binning of LGBM
    #     device='gpu'  # uncomment to use GPU training
)

lgbm_clf.fit(
    X_train,
    y_train,
    eval_set=[(X_eval, y_eval)],
    eval_metric="binary_logloss",
    early_stopping_rounds=150,
    verbose=False,  # Disable logs
)

preds = lgbm_clf.predict_proba(X_eval)
print(f"LightGBM logloss on the evaluation set: {log_loss(y_eval, preds):.5f}")

LightGBM logloss on the evaluation set: 0.34671
Wall time: 7 s


LGBM achieved a smaller loss in ~4 times less runtime. Let’s see a final LGBM trick before we move on to cross-validation.

<a id='sect7'></a>
## <font color='darkblue'>Categorical and missing values support</font>
<b><font size='3ptx'>Histogram binning in LGBM comes with built-in support for handling missing values and categorical features</font></b>. TPS March dataset contains 19 categoricals, and we have been using <b><a href='https://en.wikipedia.org/wiki/One-hot'>one-hot encoding</a></b> up to this point.

This time, we will let LGBM deal with categoricals and compare the results with XGBoost once again:

In [39]:
X, y = tps_march.drop("target", axis=1), tps_march[["target"]].values.flatten()

# Extract categoricals and their indices
cat_features = X.select_dtypes(exclude=np.number).columns.to_list()
cat_idx = [X.columns.get_loc(col) for col in cat_features]

# Convert cat_features to pd.Categorical dtype
for col in cat_features:
    X[col] = pd.Categorical(X[col])

# Unencoded train/test sets
X_train, X_eval, y_train, y_eval = train_test_split(
    X, y, test_size=0.2, random_state=4, stratify=y
)

# Model initialization is the same
eval_set = [(X_eval, y_eval)]

# Used for comparing with [score, time]
lgbm_cat_handling = []
lgbm_one_hot = []
xgb_one_hot = []

To specify the categorical features, pass a list of their indices to <font color='violet'>categorical_feature</font> parameter in the `fit` method:

### <font color='darkgreen'>LGBM (cat-handling)</font>

In [45]:
%%time
st = time.monotonic()
lgbm_clf.fit(
    X_train,
    y_train,
    categorical_feature=cat_idx,  # Specify the categoricals
    eval_set=eval_set,
    early_stopping_rounds=150,
    eval_metric="logloss",
    verbose=False,
)
rt = time.monotonic() - st

preds = lgbm_clf.predict_proba(X_eval)
loss = log_loss(y_eval, preds)
print(f"LGBM logloss with default cateogircal feature handling: {loss:.5f}")
lgbm_cat_handling.extend([loss, rt])

New categorical_feature is [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]






LGBM logloss with default cateogircal feature handling: 0.35861
Wall time: 5.43 s


### <font color='darkgreen'>LGBM (one-hot)</font>

In [40]:
X, y = tps_march.drop("target", axis=1), tps_march[["target"]].values.flatten()

# Encode categoricals
X_enc = pd.get_dummies(X)
X_train, X_eval, y_train, y_eval = train_test_split(
    X_enc, y, test_size=0.2, stratify=y)

# Model initialization is the same
eval_set = [(X_eval, y_eval)]

In [41]:
%%time
st = time.monotonic()
lgbm_clf.fit(
    X_train,
    y_train,
    eval_set=eval_set,
    early_stopping_rounds=150,
    eval_metric="logloss",
    verbose=False,
)
rt = time.monotonic() - st

preds = lgbm_clf.predict_proba(X_eval)
loss = log_loss(y_eval, preds)
print(f"LGBM logloss with default cateogircal feature handling: {loss:.5f}")
lgbm_one_hot.extend([loss, rt])

LGBM logloss with default cateogircal feature handling: 0.34994
Wall time: 7.31 s


### <font color='darkgreen'>XGB (one-hot)</font>

In [42]:
%%time
st = time.monotonic()
xgb_clf.fit(
    X_train,
    y_train,
    eval_set=[(X_eval, y_eval)],
    eval_metric="logloss",
    early_stopping_rounds=150,
    verbose=False,  # Disable logs
)
rt = time.monotonic() - st

preds = xgb_clf.predict_proba(X_eval)
loss = log_loss(y_eval, preds)
print(f"LGBM logloss with default cateogircal feature handling: {loss:.5f}")
xgb_one_hot.extend([loss, rt])



LGBM logloss with default cateogircal feature handling: 0.35241
Wall time: 35.6 s


You can achieve up to 8x speed up if you use <b><a href='https://pandas.pydata.org/docs/reference/api/pandas.Categorical.html'>pandas.Categorical</a></b> data type when using LGBM.

In [46]:
lgbm_cat_handling

[0.3586085698088037, 4.922000000005937]

In [52]:
# assign data of lists.  
comparing_data = {
    'LGBM (cat-handling)': lgbm_cat_handling,
    'LGBM (one-hot)': lgbm_one_hot,
    'XGB (one-hot)': xgb_one_hot,
}  

comparing_df = pd.DataFrame(
    comparing_data, 
    index =['score(log loss)', 'time(s)'])

comparing_df.iloc[0].apply(float)

LGBM (cat-handling)    0.358609
LGBM (one-hot)         0.349942
XGB (one-hot)          0.352410
Name: score(log loss), dtype: float64

In [53]:
comparing_df

Unnamed: 0,LGBM (cat-handling),LGBM (one-hot),XGB (one-hot)
score(log loss),0.358609,0.349942,0.35241
time(s),4.922,6.687,35.172


The table shows the final scores and runtimes of all three models. As you can see, the version with default categorical handling requires less time compared to the rest two modes while keeping a fair similar score.

<a id='sect8'></a>
## <font color='darkblue'>Cross-validation with LightGBM</font>
<font size='3ptx'><b>The most common way of doing CV with LGBM is to use Sklearn CV splitters.</b></font>

I am not talking about utility functions like <a href='https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html'>cross_validate</a> or <a href='https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html'>cross_val_score</a> but splitters like <b><a href='https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html'>KFold</a></b> or <b><a href='https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html'>StratifiedKFold</a></b> with their split method. Doing CV in this way gives you more control over the whole process.
> I have talked many times about the importance of cross-validation. You can read this <a href='https://towardsdatascience.com/6-sklearn-mistakes-that-silently-tell-you-are-a-rookie-84fa55f2b9dd?source=your_stories_page-------------------------------------'>post</a> for more details.

<br/>

Also, it enables you to use early stopping during cross-validation in a hassle-free manner. Here is what this looks like for the TPS March data:

In [54]:
from sklearn.model_selection import StratifiedKFold

In [59]:
X, y = tps_march.drop("target", axis=1), tps_march[["target"]].values.flatten()

# Extract categoricals and their indices
cat_features = X.select_dtypes(exclude=np.number).columns.to_list()
cat_idx = [X.columns.get_loc(col) for col in cat_features]

# Convert cat_features to pd.Categorical dtype
for col in cat_features:
    X[col] = pd.Categorical(X[col])

In [63]:
import warnings
warnings.filterwarnings('ignore')

N_SPLITS = 7
N_ESTIMATORS = 10000
strat_kf = StratifiedKFold(
    n_splits=N_SPLITS, shuffle=True, random_state=RANDOM_STATE)

scores = np.empty(N_SPLITS)
for idx, (train_idx, test_idx) in enumerate(strat_kf.split(X, y)):
    print("=" * 12 + f"Training fold {idx}" + 12 * "=")
    start = time.time()

    X_train, X_val = X.iloc[train_idx], X.iloc[test_idx]
    y_train, y_val = y[train_idx], y[test_idx]
    eval_set = [(X_val, y_val)]

    lgbm_clf = lgbm.LGBMClassifier(n_estimators=N_ESTIMATORS)
    lgbm_clf.fit(
        X_train,
        y_train,
        eval_set=eval_set,
        categorical_feature=cat_idx,
        early_stopping_rounds=200,
        eval_metric="binary_logloss",
        verbose=False,
    )

    preds = lgbm_clf.predict_proba(X_val)
    loss = log_loss(y_val, preds)
    scores[idx] = loss
    runtime = time.time() - start
    print(f"Fold {idx} finished with score: {loss:.5f} in {runtime:.2f} seconds.\n")

Fold 0 finished with score: 0.34557 in 4.14 seconds.

Fold 1 finished with score: 0.34932 in 4.63 seconds.

Fold 2 finished with score: 0.34614 in 5.09 seconds.

Fold 3 finished with score: 0.35044 in 4.52 seconds.

Fold 4 finished with score: 0.34595 in 4.26 seconds.

Fold 5 finished with score: 0.34820 in 5.73 seconds.

Fold 6 finished with score: 0.35118 in 4.37 seconds.



First, create a CV splitter — we are choosing <b><a href='https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html'>StratifiedKFold</a></b> because it is a classification problem. Then, loop through each train/test sets using split. In each fold, initialize and train a new LGBM model and optionally report the score and runtime. That's it! That's how most people do CV, including on Kaggle.

## <font color='darkblue'>Conclusion</font>
In this post, we learned pure modeling techniques with <b><a href='https://lightgbm.readthedocs.io/en/latest/index.html'>LightGBM</a></b>. <b>Next up, we will explore how to squeeze every bit of performance out of LGBM models using <a href='https://optuna.org/'>Optuna</a>.</b>

Specifically, Part II of this article will include a detailed overview of the most important LGBM hyperparameters and introduce a well-tested hyperparameter tuning workflow. It is already out — read it <a href='https://towardsdatascience.com/kagglers-guide-to-lightgbm-hyperparameter-tuning-with-optuna-in-2021-ed048d9838b5?source=your_stories_page-------------------------------------'>here</a>.