Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LGBMClassifier gives non-deterministic outputs with very low AUC score compared to xgboost and catboost #6411

Closed
sktin opened this issue Apr 9, 2024 · 2 comments
Labels

Comments

@sktin
Copy link

sktin commented Apr 9, 2024

Description

This issue arises from a kaggle competition. To replicate the issue, you would need to get the dataset (train.csv):
https://www.kaggle.com/competitions/playground-series-s4e4/data

The dataset has target that consists of 28 distinct integers. Considering the problem as a classification problem and running LGBMClassifier on a train/test split gives a very low AUC score (~0.5). xgboost and catboost would give an AUC score >0.8. Furthermore, if you run twice on the same train/test split, LGBMClassifier gives different result each time.

Reproducible example

import numpy as np
import pandas as pd
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.metrics import roc_auc_score
from sklearn.base import clone
from sklearn.preprocessing import LabelEncoder
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier

train = pd.read_csv('train.csv', index_col='id')
X = pd.get_dummies(train.drop(['Rings'], axis=1), dtype=int).values
y = LabelEncoder().fit_transform(train.Rings)
kfold = RepeatedStratifiedKFold(n_splits=10, n_repeats=1, random_state=42)
for train_index, test_index in kfold.split(X, y):
    X_t, y_t = X[train_index], y[train_index]
    X_v, y_v = X[test_index], y[test_index]
    break

def test_classifier(clf):
    scores = []
    for _ in range(2):
        model = clone(clf).fit(X_t, y_t)
        scores.append(roc_auc_score(y_v, model.predict_proba(X_v), multi_class='ovr'))
    print(F'Testing {type(clf).__name__}... AUCs from 2 runs: {scores[0]}, {scores[1]}.')
    if scores[0] != scores[1]:
        print('Determinacy failed!')

test_classifier(XGBClassifier(n_jobs=4,random_state=0))
test_classifier(LGBMClassifier(n_jobs=4,random_state=0,verbose=-1))
test_classifier(CatBoostClassifier(n_estimators=100,eta=0.3,thread_count=4,random_state=0,verbose=0))

Expected output:

Testing XGBClassifier... AUCs from 2 runs: 0.842522129740256, 0.842522129740256.
Testing LGBMClassifier... AUCs from 2 runs: 0.512616125703207, 0.563532578150841.
Determinacy failed!
Testing CatBoostClassifier... AUCs from 2 runs: 0.8774742932728967, 0.8774742932728967.

Environment info

LightGBM version or commit hash: 4.3.0

Command(s) you used to install LightGBM

pip install lightgbm==4.3.0

Additional Comments

I replicated this on both the kaggle and google colab platform. The specific versions of the libraries are:
numpy: 1.25.2
pandas: 2.0.3
sklearn: 1.4.1.post1
lightgbm: 4.3.0
xgboost: 2.0.3
catboost: 1.2.3

@jmoralez
Copy link
Collaborator

jmoralez commented Apr 11, 2024

Hey @sktin, thanks for using LightGBM. I took a quick look and it seems that the model starts overfitting very quickly with the default settings. XGBoost sets regularization by default

lambda [default=1, alias: reg_lambda] (ref)

and LightGBM doesn't, so setting reg_lambda=1 improves the score (~0.83 for me).

To make the results reproducible there's a deterministic parameter (docs)

deterministic, default = false, type = bool
used only with cpu device type
setting this to true should ensure the stable results when using the same data and the same parameters (and different num_threads)

In summary, you should be able to get a similar, consistent score as with the other models with the following:

test_classifier(LGBMClassifier(n_jobs=4,random_state=0,verbose=-1,reg_lambda=1,deterministic=True))

@sktin
Copy link
Author

sktin commented Apr 11, 2024

@jmoralez Thank you for the quick response.

I confirm that with reg_lambda=1, even without using the deterministic parameter, I am able to get "normal" and consistent AUC scores.

Out of curiosity, I set reg_lambda=0 in XGBClassifier and it returns consistent score > 0.8 albeit lower than the default setting of reg_lambda=1. As far as the original issue for LGBMClassifier is concerned, I think it can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants