[python][sklearn] add `__sklearn_is_fitted__()` method to be better compatible with scikit-learn API #4636

StrikerRUS · 2021-09-30T15:02:49Z

Refer to scikit-learn 1.0 release notes and corresponding PR:

Enhancement utils.validation.check_is_fitted now uses __sklearn_is_fitted__ if available, instead of checking for attributes ending with an underscore.
https://scikit-learn.org/stable/whats_new/v1.0.html#sklearn-utils
scikit-learn/scikit-learn#20657

After doing some search over GitHub repos, it looks like that only XGBoost has already adopted that new API:
dmlc/xgboost#7230

…kit-learn API

StrikerRUS · 2021-09-30T15:03:59Z

python-package/lightgbm/sklearn.py

@@ -529,6 +529,9 @@ def _more_tags(self):
            }
        }

+    def __sklearn_is_fitted__(self) -> bool:
+        return getattr(self, "fitted_", False)


We set fitted_ attribute at the end of the fit() method

LightGBM/python-package/lightgbm/sklearn.py

Line 770 in a77260f

self.fitted_ = True

I noticed in the dmlc/xgboost#7230, they also replaced some other internal checks of estimator attributes with this method call.

I think that's a good idea, so only this method needs to know about the fitted_ property. What do you think?

That would mean changing the following to if not self.__sklearn_is_fitted__():

LightGBM/python-package/lightgbm/dask.py

Lines 1004 to 1005 in a77260f

if not getattr(self, "fitted_", False):

raise LGBMNotFittedError('Cannot access property client_ before calling fit().')

I agree. I remember you were asking why do we use if self._n_features is None everywhere for checking that estimator is fitted when we have self.fitted_.

LightGBM/python-package/lightgbm/sklearn.py

Line 857 in a77260f

if self._n_features is None:

#3883 (comment)
My answer was that it's because self.fitted_ was introduced much later. Now we can replace everything with self.__sklearn_is_fitted__(). And I think it will be clearer.

But I'd prefer to do it in a follow-up PR, if you do not mind.

ok yep, I'm fine with it being a followup

StrikerRUS · 2021-09-30T15:25:05Z

Are we facing the following issue at our CI services?
https://news.yahoo.com/millions-old-phones-laptops-smart-210600565.html

jameslamb · 2021-10-02T03:46:22Z

Are we facing the following issue at our CI services? https://news.yahoo.com/millions-old-phones-laptops-smart-210600565.html

oh wow! maybe!

Err:10 https://apt.kitware.com/ubuntu bionic Release
Certificate verification failed: The certificate is NOT trusted. The certificate chain uses expired certificate. Could not handshake: Error in the certificate verification. [IP: 66.194.253.25 443]

jameslamb

thanks for catching this in the scikit-learn release notes! Totally agree.

Just left one comment for your consideration.

jameslamb · 2021-10-03T18:17:35Z

@StrikerRUS I've open #4646 to consolidate the discussion about the certificate issues for CUDA CI jobs (since they aren't specific to this PR).

jk7ss · 2021-11-05T11:24:33Z

This change

    def __sklearn_is_fitted__(self) -> bool:
        return getattr(self, "fitted_", False)

that defaults to False, together with this test:

    def predict(...):
        if not self.__sklearn_is_fitted__():
            raise LGBMNotFittedError("Estimator not fitted, call fit before exploiting the model.")

renders all models that were trained and serialized prior to v3.3.0 failing prediction after deserialization, because they are missing the fitted_ attribute.
Shouldn't it either
a) cause AttributeError if it is missing -> meaning that the model object is not code compatible? or
b) silently expect it's fitted to be backward compatible?

It's a reasonable assumption that a deserialized model was already fitted?

StrikerRUS · 2021-11-05T14:17:40Z

@jk7ss Thank you very much for your report! This is indeed important to be backward compatible with serialized models fitted in previous versions of LightGBM. But fitted_ attribute was added in #3329 and that PR was included in the v3.1.0 release (16 Nov 2020). So I guess everything should be OK for models fitted and serialized with versions started from v3.1.0. Do you think we should do something to be backward compatible even for models trained and serialized prior to v3.1.0?

github-actions · 2023-08-23T14:41:45Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

add __sklearn_is_fitted__() method to be better compatible with sci…

618d026

…kit-learn API

StrikerRUS added the maintenance label Sep 30, 2021

StrikerRUS commented Sep 30, 2021

View reviewed changes

StrikerRUS marked this pull request as ready for review September 30, 2021 15:22

StrikerRUS requested review from chivee, henry0312, jameslamb and shiyu1994 as code owners September 30, 2021 15:22

jameslamb approved these changes Oct 2, 2021

View reviewed changes

jameslamb mentioned this pull request Oct 3, 2021

[ci] CUDA CI jobs failing: "Certificate verification failed" #4646

Closed

Merge branch 'master' into sk_is_fitted

59314ba

StrikerRUS merged commit 4b140bc into master Oct 5, 2021

StrikerRUS deleted the sk_is_fitted branch October 5, 2021 21:16

StrikerRUS mentioned this pull request Oct 7, 2021

[python][sklearn] use __sklearn_is_fitted__() in all estimator fitness checks #4654

Merged

This was referenced Apr 13, 2022

plot_partial_dependence() API does not work with LightGBM regression model scikit-learn/scikit-learn#16878

Closed

[MRG] check_is_fitted() should handle getter/setter attributes scikit-learn/scikit-learn#16879

Closed

jameslamb mentioned this pull request Feb 15, 2023

[python-package] fix mypy errors about missing annotations and incompatible types #5672

Merged

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python][sklearn] add `__sklearn_is_fitted__()` method to be better compatible with scikit-learn API #4636

[python][sklearn] add `__sklearn_is_fitted__()` method to be better compatible with scikit-learn API #4636

StrikerRUS commented Sep 30, 2021

StrikerRUS Sep 30, 2021

jameslamb Oct 2, 2021 •

edited

Loading

StrikerRUS Oct 3, 2021

jameslamb Oct 3, 2021

StrikerRUS commented Sep 30, 2021

jameslamb commented Oct 2, 2021

jameslamb left a comment

jameslamb commented Oct 3, 2021

jk7ss commented Nov 5, 2021

StrikerRUS commented Nov 5, 2021 •

edited

Loading

github-actions bot commented Aug 23, 2023

	if not getattr(self, "fitted_", False):
	raise LGBMNotFittedError('Cannot access property client_ before calling fit().')

[python][sklearn] add __sklearn_is_fitted__() method to be better compatible with scikit-learn API #4636

[python][sklearn] add __sklearn_is_fitted__() method to be better compatible with scikit-learn API #4636

Conversation

StrikerRUS commented Sep 30, 2021

StrikerRUS Sep 30, 2021

Choose a reason for hiding this comment

jameslamb Oct 2, 2021 • edited Loading

Choose a reason for hiding this comment

StrikerRUS Oct 3, 2021

Choose a reason for hiding this comment

jameslamb Oct 3, 2021

Choose a reason for hiding this comment

StrikerRUS commented Sep 30, 2021

jameslamb commented Oct 2, 2021

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented Oct 3, 2021

jk7ss commented Nov 5, 2021

StrikerRUS commented Nov 5, 2021 • edited Loading

github-actions bot commented Aug 23, 2023

[python][sklearn] add `__sklearn_is_fitted__()` method to be better compatible with scikit-learn API #4636

[python][sklearn] add `__sklearn_is_fitted__()` method to be better compatible with scikit-learn API #4636

jameslamb Oct 2, 2021 •

edited

Loading

StrikerRUS commented Nov 5, 2021 •

edited

Loading