Add custom imputation strategy to SimpleImputer #28053

mark-thm · 2024-01-03T01:02:28Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adds a 'custom' strategy to SimpleImputer that enables supplying ones own statistics to produce an imputation value.

In my experience, it's useful to be able to compute, for instance, minimum and maximum values of the inputs in addition to, for instance, mean, and this enables unifying the imputation logic and producing a single location to manage all imputations.

Any other comments?

I proposed a similar change in #27986 and @adrinjalali and @jnothman requested to see a variation that accepts a callable instead of explicitly supporting new statistics.

github-actions · 2024-01-03T01:06:02Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 0c21689. Link to the linter CI: here}

sklearn/impute/tests/test_impute.py

jnothman

Thanks for the quick turnaround. What I'd imagined was strategy=np.maximum in which case we apply self.strategy(masked_X_column.compressed()) for each column, or, in an optimisation, identify that self.strategy is a ufunc and apply the function vectorized across columns. I think it's okay, initially, to expect that the callable take a dense 1d array as input.

sklearn/impute/_base.py

jnothman

Aside from changing the validator to callable, LGTM.

Co-authored-by: Joel Nothman <joeln@canva.com>

sklearn/impute/_base.py

adrinjalali · 2024-01-04T09:32:20Z

sklearn/impute/_base.py

+                    else:
+                        raise RuntimeError(f"Unknown strategy {strategy}")


I don't think this can ever happen since we validate parameters before getting here and people shouldn't be calling these private methods themselves.

I was unable to get code coverage to pass without adding the final condition and this test.

If you remove this line, codecov doesn't have them to want to cover in the first place.

adrinjalali · 2024-01-04T09:32:28Z

sklearn/impute/_base.py

+        else:
+            raise RuntimeError(f"Unknown strategy {strategy}")


adrinjalali

LGTM. Thanks for the clean solution @mark-thm

) Co-authored-by: Joel Nothman <joeln@canva.com>

mark-thm mentioned this pull request Jan 3, 2024

[MRG] ENH Add 'minimum' and 'maximum' strategies to SimpleImputer #27986

Closed

github-actions bot added the module:impute label Jan 3, 2024

Add custom imputation strategy to SimpleImputer

64eb8bf

mark-thm force-pushed the me/custom-imputation2 branch from 2c960f9 to 64eb8bf Compare January 3, 2024 01:10

mark-thm commented Jan 3, 2024

View reviewed changes

sklearn/impute/tests/test_impute.py Outdated Show resolved Hide resolved

jnothman reviewed Jan 3, 2024

View reviewed changes

sklearn/impute/_base.py Outdated Show resolved Hide resolved

sklearn/impute/_base.py Outdated Show resolved Hide resolved

CR

ca2c6a9

mark-thm commented Jan 3, 2024

View reviewed changes

sklearn/impute/_base.py Outdated Show resolved Hide resolved

mark-thm added 4 commits January 3, 2024 11:31

ruff

d4396a7

Update v1.5.rst

259e1e4

remove

fca9a14

docs

75e8c7b

mark-thm requested a review from jnothman January 3, 2024 19:47

jnothman approved these changes Jan 4, 2024

View reviewed changes

Update sklearn/impute/_base.py

51a8d33

Co-authored-by: Joel Nothman <joeln@canva.com>

adrinjalali reviewed Jan 4, 2024

View reviewed changes

mark-thm added 2 commits January 4, 2024 08:47

docs

89229b9

Remove catch-all per CR

0c21689

adrinjalali approved these changes Jan 4, 2024

View reviewed changes

adrinjalali merged commit e2b3785 into scikit-learn:main Jan 4, 2024
26 checks passed

mark-thm deleted the me/custom-imputation2 branch January 4, 2024 15:15

thomasjpfan mentioned this pull request Jan 4, 2024

Allow Imputer to accept strategy=some_callable #2896

Closed

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 10, 2024

FEAT Add custom imputation strategy to SimpleImputer (scikit-learn#28053

8e5b81b

) Co-authored-by: Joel Nothman <joeln@canva.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add custom imputation strategy to SimpleImputer #28053

Add custom imputation strategy to SimpleImputer #28053

mark-thm commented Jan 3, 2024

github-actions bot commented Jan 3, 2024 •

edited

Loading

jnothman left a comment

jnothman left a comment

adrinjalali Jan 4, 2024

mark-thm Jan 4, 2024

adrinjalali Jan 4, 2024

adrinjalali Jan 4, 2024

adrinjalali left a comment

Add custom imputation strategy to SimpleImputer #28053

Add custom imputation strategy to SimpleImputer #28053

Conversation

mark-thm commented Jan 3, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Jan 3, 2024 • edited Loading

✔️ Linting Passed

jnothman left a comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

adrinjalali Jan 4, 2024

Choose a reason for hiding this comment

mark-thm Jan 4, 2024

Choose a reason for hiding this comment

adrinjalali Jan 4, 2024

Choose a reason for hiding this comment

adrinjalali Jan 4, 2024

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 3, 2024 •

edited

Loading