LightGBM does not comply with sklearn's check_is_fitted #3014

romanlutz · 2020-04-22T18:26:53Z

Environment info:

Operating System: Windows 10

CPU/GPU model: Processor Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz, 2112 Mhz, 4 Core(s), 8 Logical Processor(s)

C++/Python/R version: 3.7.4

LightGBM version or commit hash: 2.3.1

Error message

sklearn.exceptions.NotFittedError: This LGBMClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Reproducible examples

see below

Steps to reproduce

import lightgbm
lgbmc = lightgbm.LGBMClassifier()
lgbmc.fit([[1,2,3], [4,5,6]], [0, 1])
from sklearn.utils.validation import check_is_fitted
check_is_fitted(lgbmc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\rolutz\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 967, in check_is_fitted
    raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This LGBMClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

The text was updated successfully, but these errors were encountered:

StrikerRUS · 2020-04-22T18:59:45Z

@romanlutz
Can you please clarify the versions of packages?

My ones are 2.3.1 and 0.21.3 and I get the following error runnig your code:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-e108069c5d34> in <module>
----> 1 check_is_fitted(lgbmc)
TypeError: check_is_fitted() missing 1 required positional argument: 'attributes'

Which indicates the wrong way of usage.

Adding required argument I have no errors:

check_is_fitted(lgbmc, ['classes_'])

UPD: You can use n_features_ to generalize functionality to LGBMModel class.

StrikerRUS · 2020-04-22T19:12:22Z

OK, seems the problem is in that in 0.22 version scikit-learn again changed API silently. Now attributes=None argument is optional.

Anyway, a fix above works fine with new versions (2.3.2 and 0.22.1) as well.

romanlutz · 2020-04-22T19:12:32Z

Wow, thanks for your quick response!
I used scikit-learn==0.22.1, but even with the last released scikit-learn version (0.22.2.post1) I still get the same error
sklearn.exceptions.NotFittedError: This LGBMClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

I can confirm that I'm getting the same TypeError that you got when I downgrade to 0.21.3

Looks like it became optional with this change about 4 months ago: https://github.com/scikit-learn/scikit-learn/pull/15947/files

StrikerRUS · 2020-04-22T19:17:36Z

I used scikit-learn==0.22.1, but even with the last released scikit-learn version (0.22.2.post1) I still get the same error

Weird!

I just installed scikit-learn from conda and lightgbm from nightly artifacts and have no problems...

import lightgbm
print(lightgbm.__version__)
lgbmc = lightgbm.LGBMClassifier()
lgbmc.fit([[1,2,3], [4,5,6]], [0, 1])
from sklearn.utils.validation import check_is_fitted
check_is_fitted(lgbmc, ['n_features_'])
from sklearn import __version__ as sk_ver
print(sk_ver)

2.3.2
0.22.1

romanlutz · 2020-04-22T19:21:33Z

I don't see the issue with

check_is_fitted(lgbmc, ['n_features_'])

either (regardless of the version I'm using), but if you remove the second argument it shows up. I guess I'm not familiar enough with LGBM to tell, but shouldn't it work without that extra arg?

From what I understand about check_is_fitted is checks whether all attributes with trailing underscore are set, which they should be after fit. @adrinjalali can perhaps correct me on this.
Is that not the case in LGBM?

adrinjalali · 2020-04-22T21:44:25Z

The part which does the check in sklearn is:

    if attributes is not None:
        if not isinstance(attributes, (list, tuple)):
            attributes = [attributes]
        attrs = all_or_any([hasattr(estimator, attr) for attr in attributes])
    else:
        attrs = [v for v in vars(estimator)
                 if v.endswith("_") and not v.startswith("__")]

    if not attrs:
        raise NotFittedError(msg % {'name': type(estimator).__name__})

And the issue is that LGBMClassifier has opted for storing the actual values in _attribute (_n_feautres for instance), and define all attribute_ as a @property, and properties are not listed in vars(obj). Properties can be very expensive to check and call, so I'm not sure if it's a good idea for sklearn to include them in the check.

Also, these @propertys in LGMBClassifier raise a LGBMNotFittedError instead of an AttributeError, and I remember some versions of python would raise instead of returning False on a hasattr(obj, 'attr') if that attribute would not raise an AttributeError.

@ogrisel @rth @NicolasHug WDYT?

mirekphd · 2020-05-19T07:13:01Z

I used scikit-learn==0.22.1, but even with the last released scikit-learn version (0.22.2.post1) I still get the same error

Weird!

On the contrary - rather than upgrading the environment, you need to pin all packages used by LightGBM, including sklearn to old versions from 6 months ago when the "current" (latest) LightGBM release was built. There are more and more API discrepancies with the recent versions of sklearn...

@StrikerRUS : a release is long overdue...

StrikerRUS · 2020-05-19T19:34:10Z

@mirekphd

On the contrary - rather than upgrading the environment, you need to pin all packages used by LightGBM

Please refer to #2987 for the discussion about pinning dependencies.

a release is long overdue...

We are preparing v3 release: #3071.

StrikerRUS · 2020-08-20T00:24:35Z

Hello @adrinjalali !

Have scikit-learn team made any decision for using properties as indicators of fitted estimators: #3014 (comment)?

Also, it looks like internally some scikit-learn estimators uses properties for some attributes (coef_, intercept_, n_iter_, for instance). Does it mean that they are incompatible with scikit-learn tools which use check_is_fitted?

https://github.com/scikit-learn/scikit-learn/blob/e770715c434739647ddbb645ff0fcd40c64ba1fd/sklearn/svm/_base.py#L489-L505

https://github.com/scikit-learn/scikit-learn/blob/3561802bdc9e3a32492c3ce9d8943e9a85519a7f/sklearn/naive_bayes.py#L652-L663

https://github.com/scikit-learn/scikit-learn/blob/0cfe98b9f81463143675793e5b4b11268b2cf857/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L770-L773

adrinjalali · 2020-08-21T13:02:45Z

The sklearn estimators you point to, have other attributes with a trailing underscore which makes them pass the API requirement. We only have the properties in cases where the computation is trivial based on the other properties stored in the object instance.

But we also have had discussions regarding moving away from those properties and using methods instead where there is an implicit computation behind the property.

A workaround for lightgbm if y'all really don't want to move away from the current pattern, is to introduce an attribute such as fitted_=True, which would then make the check_is_fitted to pass.

I haven't seen a discussion happening around accepting properties in check_is_fitted, but I may be missing something.

StrikerRUS · 2020-08-21T22:24:41Z

@adrinjalali Thanks a lot for your reply and proposed workaround! I think we can go with it for now, while deeper refactoring will be done under #2966.

I'm just confused by the following inconsistency. When you call check_is_fitted without attributes argument (was not possible before 0.22 version, if I'm not mistaken), you get exception. But when you pass property name in attributes argument, everything is working fine.

I believe behavior of check_is_fitted should be consistent: either it completely ignores properties with trailing underscore, or respects them.

from sklearn.datasets import load_digits
from sklearn.utils.validation import check_is_fitted

import lightgbm as lgb

X, y = load_digits(n_class=2, return_X_y=True)
clf = lgb.LGBMClassifier(n_estimators=5).fit(X, y)

check_is_fitted(clf)  # raises error

check_is_fitted(clf, "classes_")  # OK

GlorianY · 2020-09-17T14:55:26Z

Hi,

Is this problem already fixed?

I use LightGBM version 3.0.0, and scikit-learn version 0.23.2.
When I try to run the code in the section "Steps to reproduce", from @romanlutz, I still get the NotFittedError error (sklearn.exceptions.NotFittedError: This LGBMClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator).

Or, in order to fix this error, should we keep using the workaround check_is_fitted(lgbmc, ['n_features_']) ?

StrikerRUS · 2020-09-18T00:39:26Z

Hello @GlorianY !

The problem is fixed in the master branch. Unfortunately, the fix was not included in the 3.0.0 release. It will be in the next one. For now you can download wheel file from nightly build and install it: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html.

github-actions · 2023-08-23T20:48:58Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

romanlutz mentioned this issue Apr 22, 2020

Add new constraints to ThresholdOptimizer and add InterpolatedThresholder fairlearn/fairlearn#381

Merged

StrikerRUS mentioned this issue Aug 19, 2020

lightgbm.sklearn estimators pass sklearn.utils.check_is_fitted() when fitted #3325

Closed

StrikerRUS mentioned this issue Aug 21, 2020

[python][sklearn] be compatible with check_is_fitted sklearn function #3329

Merged

guolinke closed this as completed in #3329 Sep 2, 2020

StrikerRUS mentioned this issue Sep 21, 2020

fix two warnings from sklearn RGF-team/rgf#327

Merged

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LightGBM does not comply with sklearn's check_is_fitted #3014

LightGBM does not comply with sklearn's check_is_fitted #3014

romanlutz commented Apr 22, 2020

StrikerRUS commented Apr 22, 2020 •

edited

StrikerRUS commented Apr 22, 2020 •

edited

romanlutz commented Apr 22, 2020

StrikerRUS commented Apr 22, 2020

romanlutz commented Apr 22, 2020

adrinjalali commented Apr 22, 2020

mirekphd commented May 19, 2020

StrikerRUS commented May 19, 2020

StrikerRUS commented Aug 20, 2020

adrinjalali commented Aug 21, 2020

StrikerRUS commented Aug 21, 2020

GlorianY commented Sep 17, 2020

StrikerRUS commented Sep 18, 2020

github-actions bot commented Aug 23, 2023

LightGBM does not comply with sklearn's check_is_fitted #3014

LightGBM does not comply with sklearn's check_is_fitted #3014

Comments

romanlutz commented Apr 22, 2020

Environment info:

Error message

Reproducible examples

Steps to reproduce

StrikerRUS commented Apr 22, 2020 • edited

StrikerRUS commented Apr 22, 2020 • edited

romanlutz commented Apr 22, 2020

StrikerRUS commented Apr 22, 2020

romanlutz commented Apr 22, 2020

adrinjalali commented Apr 22, 2020

mirekphd commented May 19, 2020

StrikerRUS commented May 19, 2020

StrikerRUS commented Aug 20, 2020

adrinjalali commented Aug 21, 2020

StrikerRUS commented Aug 21, 2020

GlorianY commented Sep 17, 2020

StrikerRUS commented Sep 18, 2020

github-actions bot commented Aug 23, 2023

StrikerRUS commented Apr 22, 2020 •

edited

StrikerRUS commented Apr 22, 2020 •

edited