You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Columns which only contained missing values at fit are discarded upon transform if strategy is not "constant".
It'd be nice to be able to set this to a default value in transform instead of dropping those columns if there is no valid summary statistic in fit.
I recognize that it may only be a couple of lines, but it seems useful to have the logic captured within the class instead of checking beforehand for all np.nans per column.
The use case I came across was when creating a ByGroupImputer that would eventually concatenate all the groups together. Each group is relatively small and likely to contain different columns with all NaNs, so the final sizes of the imputed arrays is different. This would be solved by being able to fill in NaNs with 0 (or any filled value) instead of dropping them.
In general, having the input and output array sizes be the same after transform seems useful though.
Describe the workflow you want to enable
In the Notes of the
SimpleImputer
, it says that:It'd be nice to be able to set this to a default value in
transform
instead of dropping those columns if there is no valid summary statistic infit
.I recognize that it may only be a couple of lines, but it seems useful to have the logic captured within the class instead of checking beforehand for all
np.nan
s per column.Describe your proposed solution
Near the lines
scikit-learn/sklearn/impute/_base.py
Lines 512 to 525 in 74bf394
We could add something like
Describe alternatives you've considered, if relevant
An alternative is to do this logic before running the SimpleImputer.
Additional context
No response
The text was updated successfully, but these errors were encountered: