-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python][sklearn] add __sklearn_is_fitted__()
method to be better compatible with scikit-learn API
#4636
Conversation
@@ -529,6 +529,9 @@ def _more_tags(self): | |||
} | |||
} | |||
|
|||
def __sklearn_is_fitted__(self) -> bool: | |||
return getattr(self, "fitted_", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We set fitted_
attribute at the end of the fit()
method
LightGBM/python-package/lightgbm/sklearn.py
Line 770 in a77260f
self.fitted_ = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed in the dmlc/xgboost#7230, they also replaced some other internal checks of estimator attributes with this method call.
I think that's a good idea, so only this method needs to know about the fitted_
property. What do you think?
That would mean changing the following to if not self.__sklearn_is_fitted__()
:
LightGBM/python-package/lightgbm/dask.py
Lines 1004 to 1005 in a77260f
if not getattr(self, "fitted_", False): | |
raise LGBMNotFittedError('Cannot access property client_ before calling fit().') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I remember you were asking why do we use if self._n_features is None
everywhere for checking that estimator is fitted when we have self.fitted_
.
LightGBM/python-package/lightgbm/sklearn.py
Line 857 in a77260f
if self._n_features is None: |
#3883 (comment)
My answer was that it's because
self.fitted_
was introduced much later. Now we can replace everything with self.__sklearn_is_fitted__()
. And I think it will be clearer.
But I'd prefer to do it in a follow-up PR, if you do not mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok yep, I'm fine with it being a followup
Are we facing the following issue at our CI services? |
oh wow! maybe!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for catching this in the scikit-learn
release notes! Totally agree.
Just left one comment for your consideration.
@StrikerRUS I've open #4646 to consolidate the discussion about the certificate issues for CUDA CI jobs (since they aren't specific to this PR). |
This change
that defaults to def predict(...):
if not self.__sklearn_is_fitted__():
raise LGBMNotFittedError("Estimator not fitted, call fit before exploiting the model.") renders all models that were trained and serialized prior to It's a reasonable assumption that a deserialized model was already fitted? |
@jk7ss Thank you very much for your report! This is indeed important to be backward compatible with serialized models fitted in previous versions of LightGBM. But |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Refer to scikit-learn 1.0 release notes and corresponding PR:
After doing some search over GitHub repos, it looks like that only XGBoost has already adopted that new API:
dmlc/xgboost#7230