-
-
Notifications
You must be signed in to change notification settings - Fork 25.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Introduces the __sklearn_clone__ protocol #24568
Conversation
We should also add it to "how to write your estimator" guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs docs, otherwise looking great!
sklearn/tests/test_base.py
Outdated
def fit(self, *args, **kwargs): | ||
return self | ||
|
||
@available_if(lambda self: hasattr(self.estimator, "transform")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need to use available_if here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The available_if
is not needed here. If we introduce a generic "FrozenEstimator", then the available_if
would be useful.
That being said, the test does not require available_if
, so I'll remove it.
return _clone_parametrized(estimator, safe=safe) | ||
|
||
|
||
def _clone_parametrized(estimator, *, safe=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this benefit from being public? Or will those who need it be able to use super
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking third party estimators would call super
in __sklearn_clone__
or clone
directly.
API wise, I think it would be a little strange to have two clone
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not safe for a third party estimator to call clone
from its __sklearn_clone__
. Yes it could call BaseEstimator.__sklearn_clone__
but then it needs to import BaseEstimator
(or copy-paste).
Having a public clone_default_implementation
or clone_parametrized
wouldn't hurt...? But I don't know how important it is to support non-use of BaseEstimator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Yea, it comes down to how much we want to support the non-use of BaseEstimator
.
What do you think about adding a use_sklearn_clone_protocol
parameter to clone
? If use_sklearn_clone_protocol=False
, then the default implementation is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we're not caring about not having to inherit from BaseEstimator as much any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe let's keep it like this and see is someone complains that they can't use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with keeping this PR as is and develop a solution if a third party estimator developer requires it.
|
||
def fit_transform(self, *args, **kwargs): | ||
return self.fitted_transformer.transform(*args, **kwargs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth noting that most implementations would call super().__sklearn_clone__()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, most estimators would not touch __sklearn_clone__
at all. What use case would most implementations call super().__sklearn_clone__()
?
I think the documentation would be improved with a paragraph that explains what the call to It is probably pretty simple once you know it and that is why it never comes up in the SLEP or in the docs here. But as someone who hasn't got the first idea, I feel lost. |
Fix merge conflicts? |
Excited! |
Reference Issues/PRs
Related to #8370
Implementation of SLEP017: scikit-learn/enhancement_proposals#67
What does this implement/fix? Explain your changes.
This PR introduces the
__sklearn_clone__
protocol. This PR contains a test that has a possible implementation of freezing:The idea is mostly like #9464, but it uses
__getattr__
to pass all the non-overridden methods to the inner estimator. This means attributes and methods are available from theFrozenEstimator
object.