New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEA Add EngineAwareMixin
to factor the common logic of estimators which are plugin-extendable
#13
Conversation
a899776
to
dfb3fe0
Compare
Sorry I wasn't aware of this PR until now. Will take a look ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Just a few minor suggestions below:
Implemented the suggestions almost exactly as suggested. Made a change to the docstring suggestion and renamed the conversion function (not sure I like the name but it seems better than "convert to numpy", when really it is "to sklearn types"). |
@ogrisel what do you think about the changes in the last commit? This makes it possible to perform input validation in the "acceptance" function. Ideas on making the return value better (I dislike the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sklearn/cluster/_kmeans.py
Outdated
Should fail as quickly as possible. | ||
""" | ||
# The default engine accepts everything and does not convert inputs | ||
return True, {"X": X, "y": y, "sample_weight": sample_weight} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the feeling that we should factor in the calls to self.estimator._validate_data
and remove the _check_test_data
method and simplify the _prepare_fit
method.
Otherwise, the default engine would not behave in a consistent way with non-default engine and therefore would not serve as an educational implementation template for third party engine implementers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# XXX Maybe a bit useless as it should never get called, but it | ||
# does demonstrate the API | ||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for now. I think this is a no-op generally.
@@ -112,3 +112,37 @@ def get_engine_classes(engine_name, default, verbose=False): | |||
f"trying engine {engine_class.__module__}.{engine_class.__qualname__} ." | |||
) | |||
yield provider, engine_class | |||
|
|||
|
|||
def convert_attributes(method): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for future iteration: Could method decorated by convert_attributes
be called concurrently? If it is the case, should we add some handling to make converting attributes safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can assume that calling fit concurrently on the same estimator instance is not thread safe.
However, calling fit on different clones of the same estimator should always be thread-safe.
Thanks for the work @betatim I also caught up with the PR, I like the overall approach, don't have anything to add beside what's has already been suggested. In particular like @ogrisel suggested I think When it's merged I'll open a PR on our gpu plugin to see how it goes ! |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Just to make sure that this is visible, I replied in the previous thread: |
I've updated the code so that To test it I also added a first draft of being able to pass an engine class to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good to me. I would just like to better define the case when reset=False
with a previously resolved engine class as explained below.
So we've completed a circle 😁 It would help if you can highlight how it's intended to be used with a few examples regarding the inputs that could reasonably be inputed by a user and limit use-cases in the range of supported inputs (list of lists, input dtype that might require casting depending on the engine,...). What would you recommend to plugin developers regarding the behavior of the engine in each case (~does If it's back to the same If you consider it's a bad practice, I'd suggest to throw the |
I would rather not do store the attribute. I think in the case of Later we might refine the |
👍 I'm fine with that. 👍 |
Small interjection: Do you think the following would be a better title for this PR ?
|
Quoting this because I like it as answer to the question of how In my mind To use an analogy: This means input validation, like shape and such, should happen in the I think it is good that, say, a "cupy plugin" |
EngineAwareMixin
to factor the common logic of estimators which are plugin-extendable
That is indeed a better title. Switched to it. However, I think this PR is becoming a collection of several ideas (e.g. it "accidentally" also implements the idea of having ad-hoc engines that aren't registered via entrypoints), so IMHO we should try and wrap this one up and start new PRs for new ideas, instead of putting them all here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the previous and following suggestion for class filtering by engine_name.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Adds a mixin for estimators to use when they support having an engine.
Also adds a decorator that can be used on
fit
to convert the estimators attributes to eitherengine_native
orsklearn_native
types.WDYT?
Notes: I considered using something like
__init_subclass__
to automagically wrapfit
but decided that "explicit is better than implicit", so this is how we got the decorator approach. After building something based on__init_subclass__
, which seemed cool because it means all you have to do is add the mixin, it seemed a bit too magical that this would wrapfit
. So I switched to having a decorator.