LGBMClassifier gives non-deterministic outputs with very low AUC score compared to xgboost and catboost #6411

sktin · 2024-04-09T23:42:00Z

Description

This issue arises from a kaggle competition. To replicate the issue, you would need to get the dataset (train.csv):
https://www.kaggle.com/competitions/playground-series-s4e4/data

The dataset has target that consists of 28 distinct integers. Considering the problem as a classification problem and running LGBMClassifier on a train/test split gives a very low AUC score (~0.5). xgboost and catboost would give an AUC score >0.8. Furthermore, if you run twice on the same train/test split, LGBMClassifier gives different result each time.

Reproducible example

import numpy as np
import pandas as pd
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.metrics import roc_auc_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier

train = pd.read_csv('train.csv', index_col='id')
X = pd.get_dummies(train.drop(['Rings'], axis=1), dtype=int).values
y = LabelEncoder().fit_transform(train.Rings)
kfold = RepeatedStratifiedKFold(n_splits=10, n_repeats=1, random_state=42)
for train_index, test_index in kfold.split(X, y):
    X_t, y_t = X[train_index], y[train_index]
    X_v, y_v = X[test_index], y[test_index]
    break

def test_classifier(clf):
    scores = []
    for _ in range(2):
        model = clone(clf).fit(X_t, y_t)
        scores.append(roc_auc_score(y_v, model.predict_proba(X_v), multi_class='ovr'))
    print(F'Testing {type(clf).__name__}... AUCs from 2 runs: {scores[0]}, {scores[1]}.')
    if scores[0] != scores[1]:
        print('Determinacy failed!')

test_classifier(XGBClassifier(n_jobs=4,random_state=0))
test_classifier(LGBMClassifier(n_jobs=4,random_state=0,verbose=-1))
test_classifier(CatBoostClassifier(n_estimators=100,eta=0.3,thread_count=4,random_state=0,verbose=0))

Expected output:

Testing XGBClassifier... AUCs from 2 runs: 0.842522129740256, 0.842522129740256.
Testing LGBMClassifier... AUCs from 2 runs: 0.512616125703207, 0.563532578150841.
Determinacy failed!
Testing CatBoostClassifier... AUCs from 2 runs: 0.8774742932728967, 0.8774742932728967.

Environment info

LightGBM version or commit hash: 4.3.0

Command(s) you used to install LightGBM

pip install lightgbm==4.3.0

Additional Comments

I replicated this on both the kaggle and google colab platform. The specific versions of the libraries are:
numpy: 1.25.2
pandas: 2.0.3
sklearn: 1.4.1.post1
lightgbm: 4.3.0
xgboost: 2.0.3
catboost: 1.2.3

The text was updated successfully, but these errors were encountered:

jmoralez · 2024-04-11T01:21:31Z

Hey @sktin, thanks for using LightGBM. I took a quick look and it seems that the model starts overfitting very quickly with the default settings. XGBoost sets regularization by default

lambda [default=1, alias: reg_lambda] (ref)

and LightGBM doesn't, so setting reg_lambda=1 improves the score (~0.83 for me).

To make the results reproducible there's a deterministic parameter (docs)

deterministic, default = false, type = bool
used only with cpu device type
setting this to true should ensure the stable results when using the same data and the same parameters (and different num_threads)

In summary, you should be able to get a similar, consistent score as with the other models with the following:

test_classifier(LGBMClassifier(n_jobs=4,random_state=0,verbose=-1,reg_lambda=1,deterministic=True))

sktin · 2024-04-11T02:46:02Z

@jmoralez Thank you for the quick response.

I confirm that with reg_lambda=1, even without using the deterministic parameter, I am able to get "normal" and consistent AUC scores.

Out of curiosity, I set reg_lambda=0 in XGBClassifier and it returns consistent score > 0.8 albeit lower than the default setting of reg_lambda=1. As far as the original issue for LGBMClassifier is concerned, I think it can be closed.

jmoralez added question awaiting response labels Apr 11, 2024

sktin closed this as completed Apr 11, 2024

github-actions bot removed the awaiting response label Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LGBMClassifier gives non-deterministic outputs with very low AUC score compared to xgboost and catboost #6411

LGBMClassifier gives non-deterministic outputs with very low AUC score compared to xgboost and catboost #6411

sktin commented Apr 9, 2024

jmoralez commented Apr 11, 2024 •

edited

Loading

sktin commented Apr 11, 2024

LGBMClassifier gives non-deterministic outputs with very low AUC score compared to xgboost and catboost #6411

LGBMClassifier gives non-deterministic outputs with very low AUC score compared to xgboost and catboost #6411

Comments

sktin commented Apr 9, 2024

Description

Reproducible example

Environment info

Additional Comments

jmoralez commented Apr 11, 2024 • edited Loading

sktin commented Apr 11, 2024

jmoralez commented Apr 11, 2024 •

edited

Loading