Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightGBM does not comply with sklearn's check_is_fitted #3014

Closed
romanlutz opened this issue Apr 22, 2020 · 14 comments · Fixed by #3329
Closed

LightGBM does not comply with sklearn's check_is_fitted #3014

romanlutz opened this issue Apr 22, 2020 · 14 comments · Fixed by #3329

Comments

@romanlutz
Copy link

Environment info:

Operating System: Windows 10

CPU/GPU model: Processor Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz, 2112 Mhz, 4 Core(s), 8 Logical Processor(s)

C++/Python/R version: 3.7.4

LightGBM version or commit hash: 2.3.1

Error message

sklearn.exceptions.NotFittedError: This LGBMClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Reproducible examples

see below

Steps to reproduce

import lightgbm
lgbmc = lightgbm.LGBMClassifier()
lgbmc.fit([[1,2,3], [4,5,6]], [0, 1])
from sklearn.utils.validation import check_is_fitted
check_is_fitted(lgbmc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\rolutz\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 967, in check_is_fitted
    raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This LGBMClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Apr 22, 2020

@romanlutz
Can you please clarify the versions of packages?

My ones are 2.3.1 and 0.21.3 and I get the following error runnig your code:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-e108069c5d34> in <module>
----> 1 check_is_fitted(lgbmc)
TypeError: check_is_fitted() missing 1 required positional argument: 'attributes'

Which indicates the wrong way of usage.

Adding required argument I have no errors:

check_is_fitted(lgbmc, ['classes_'])

UPD: You can use n_features_ to generalize functionality to LGBMModel class.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Apr 22, 2020

OK, seems the problem is in that in 0.22 version scikit-learn again changed API silently. Now attributes=None argument is optional.

Anyway, a fix above works fine with new versions (2.3.2 and 0.22.1) as well.

@romanlutz
Copy link
Author

Wow, thanks for your quick response!
I used scikit-learn==0.22.1, but even with the last released scikit-learn version (0.22.2.post1) I still get the same error
sklearn.exceptions.NotFittedError: This LGBMClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

I can confirm that I'm getting the same TypeError that you got when I downgrade to 0.21.3

Looks like it became optional with this change about 4 months ago: https://github.com/scikit-learn/scikit-learn/pull/15947/files

@StrikerRUS
Copy link
Collaborator

I used scikit-learn==0.22.1, but even with the last released scikit-learn version (0.22.2.post1) I still get the same error

Weird!

I just installed scikit-learn from conda and lightgbm from nightly artifacts and have no problems...

import lightgbm
print(lightgbm.__version__)
lgbmc = lightgbm.LGBMClassifier()
lgbmc.fit([[1,2,3], [4,5,6]], [0, 1])
from sklearn.utils.validation import check_is_fitted
check_is_fitted(lgbmc, ['n_features_'])
from sklearn import __version__ as sk_ver
print(sk_ver)
2.3.2
0.22.1

@romanlutz
Copy link
Author

I don't see the issue with

check_is_fitted(lgbmc, ['n_features_'])

either (regardless of the version I'm using), but if you remove the second argument it shows up. I guess I'm not familiar enough with LGBM to tell, but shouldn't it work without that extra arg?

From what I understand about check_is_fitted is checks whether all attributes with trailing underscore are set, which they should be after fit. @adrinjalali can perhaps correct me on this.
Is that not the case in LGBM?

@adrinjalali
Copy link

The part which does the check in sklearn is:

    if attributes is not None:
        if not isinstance(attributes, (list, tuple)):
            attributes = [attributes]
        attrs = all_or_any([hasattr(estimator, attr) for attr in attributes])
    else:
        attrs = [v for v in vars(estimator)
                 if v.endswith("_") and not v.startswith("__")]

    if not attrs:
        raise NotFittedError(msg % {'name': type(estimator).__name__})

And the issue is that LGBMClassifier has opted for storing the actual values in _attribute (_n_feautres for instance), and define all attribute_ as a @property, and properties are not listed in vars(obj). Properties can be very expensive to check and call, so I'm not sure if it's a good idea for sklearn to include them in the check.

Also, these @propertys in LGMBClassifier raise a LGBMNotFittedError instead of an AttributeError, and I remember some versions of python would raise instead of returning False on a hasattr(obj, 'attr') if that attribute would not raise an AttributeError.

@ogrisel @rth @NicolasHug WDYT?

@mirekphd
Copy link

I used scikit-learn==0.22.1, but even with the last released scikit-learn version (0.22.2.post1) I still get the same error

Weird!

On the contrary - rather than upgrading the environment, you need to pin all packages used by LightGBM, including sklearn to old versions from 6 months ago when the "current" (latest) LightGBM release was built. There are more and more API discrepancies with the recent versions of sklearn...

@StrikerRUS : a release is long overdue...

@StrikerRUS
Copy link
Collaborator

@mirekphd

On the contrary - rather than upgrading the environment, you need to pin all packages used by LightGBM

Please refer to #2987 for the discussion about pinning dependencies.

a release is long overdue...

We are preparing v3 release: #3071.

@StrikerRUS
Copy link
Collaborator

Hello @adrinjalali !

Have scikit-learn team made any decision for using properties as indicators of fitted estimators: #3014 (comment)?

Also, it looks like internally some scikit-learn estimators uses properties for some attributes (coef_, intercept_, n_iter_, for instance). Does it mean that they are incompatible with scikit-learn tools which use check_is_fitted?

image

https://github.com/scikit-learn/scikit-learn/blob/e770715c434739647ddbb645ff0fcd40c64ba1fd/sklearn/svm/_base.py#L489-L505

https://github.com/scikit-learn/scikit-learn/blob/3561802bdc9e3a32492c3ce9d8943e9a85519a7f/sklearn/naive_bayes.py#L652-L663

https://github.com/scikit-learn/scikit-learn/blob/0cfe98b9f81463143675793e5b4b11268b2cf857/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L770-L773

@adrinjalali
Copy link

The sklearn estimators you point to, have other attributes with a trailing underscore which makes them pass the API requirement. We only have the properties in cases where the computation is trivial based on the other properties stored in the object instance.

But we also have had discussions regarding moving away from those properties and using methods instead where there is an implicit computation behind the property.

A workaround for lightgbm if y'all really don't want to move away from the current pattern, is to introduce an attribute such as fitted_=True, which would then make the check_is_fitted to pass.

I haven't seen a discussion happening around accepting properties in check_is_fitted, but I may be missing something.

@StrikerRUS
Copy link
Collaborator

@adrinjalali Thanks a lot for your reply and proposed workaround! I think we can go with it for now, while deeper refactoring will be done under #2966.

I'm just confused by the following inconsistency. When you call check_is_fitted without attributes argument (was not possible before 0.22 version, if I'm not mistaken), you get exception. But when you pass property name in attributes argument, everything is working fine.

I believe behavior of check_is_fitted should be consistent: either it completely ignores properties with trailing underscore, or respects them.

from sklearn.datasets import load_digits
from sklearn.utils.validation import check_is_fitted

import lightgbm as lgb

X, y = load_digits(n_class=2, return_X_y=True)
clf = lgb.LGBMClassifier(n_estimators=5).fit(X, y)

check_is_fitted(clf)  # raises error

check_is_fitted(clf, "classes_")  # OK

@GlorianY
Copy link

Hi,

Is this problem already fixed?

I use LightGBM version 3.0.0, and scikit-learn version 0.23.2.
When I try to run the code in the section "Steps to reproduce", from @romanlutz, I still get the NotFittedError error (sklearn.exceptions.NotFittedError: This LGBMClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator).

Or, in order to fix this error, should we keep using the workaround check_is_fitted(lgbmc, ['n_features_']) ?

@StrikerRUS
Copy link
Collaborator

Hello @GlorianY !

The problem is fixed in the master branch. Unfortunately, the fix was not included in the 3.0.0 release. It will be in the next one. For now you can download wheel file from nightly build and install it: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants