-
-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom imputation strategy to SimpleImputer #28053
Conversation
2c960f9
to
64eb8bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick turnaround. What I'd imagined was strategy=np.maximum
in which case we apply self.strategy(masked_X_column.compressed())
for each column, or, in an optimisation, identify that self.strategy
is a ufunc and apply the function vectorized across columns. I think it's okay, initially, to expect that the callable take a dense 1d array as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from changing the validator to callable
, LGTM.
Co-authored-by: Joel Nothman <joeln@canva.com>
sklearn/impute/_base.py
Outdated
else: | ||
raise RuntimeError(f"Unknown strategy {strategy}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this can ever happen since we validate parameters before getting here and people shouldn't be calling these private methods themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unable to get code coverage to pass without adding the final condition and this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove this line, codecov doesn't have them to want to cover in the first place.
sklearn/impute/_base.py
Outdated
else: | ||
raise RuntimeError(f"Unknown strategy {strategy}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the clean solution @mark-thm
Reference Issues/PRs
#27986
What does this implement/fix? Explain your changes.
Adds a 'custom' strategy to
SimpleImputer
that enables supplying ones own statistics to produce an imputation value.In my experience, it's useful to be able to compute, for instance, minimum and maximum values of the inputs in addition to, for instance, mean, and this enables unifying the imputation logic and producing a single location to manage all imputations.
Any other comments?
I proposed a similar change in #27986 and @adrinjalali and @jnothman requested to see a variation that accepts a callable instead of explicitly supporting new statistics.