FIX make sure IterativeImputer does not skip iterative process when keep_empty_features=True#29779
Conversation
|
We will need some non-regression tests to check that we get the expected results. On the top of the head, we should test:
So from the code, I think that the diff is not what I would expect. I would only expect a change in the initial imputation with something like: if not self.keep_empty_features:
# drop empty features
valid_mask = np.flatnonzero(
np.logical_not(np.isnan(self.initial_imputer_.statistics_))
)
Xt = X[:, valid_mask]
mask_missing_values = mask_missing_values[:, valid_mask]
else:
# mark empty features as not missing and keep the original
# imputation
is_empty_feature = np.all(mask_missing_values, axis=0)
mask_missing_values[:, is_empty_feature] = False
Xt = X
Xt[:, is_empty_feature] = X_filled[:, is_empty_feature]Basically, |
|
In addition to the tests and the right fix, we need an entry in the changelog: Please add an entry to the change log at |
keep_empty_features is Set to True
Thanks for the feedback @glemaitre . I have added two more tests covering various
I have updated the changes as suggested and apparently there is one case that is not covered. It is when the Additionally, I have added one new entry to the 1.5.2 changelogs. |
I don't think that the It means that the line: valid_mask = np.flatnonzero(
np.logical_not(np.isnan(self.initial_imputer_.statistics_))
)is not doing what we think: for most imputation, we expect to have is_empty_feature = np.all(mask_missing_values, axis=0)instead |
glemaitre
left a comment
There was a problem hiding this comment.
A couple of additional small changes. But otherwise it looks good.
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
|
Last nitpick otherwise LGTM. We will need a second reviewer. |
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
|
Let's merge and hopefully we won't forget to clean it up when we fix the inconsistency in the SimpleImputer directly 😄 |
…eep_empty_features=True (scikit-learn#29779) Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Reference Issues
Fixes #29375
Empty Feature = Feature that consist exclusively of missing values.
What does this implement/fix? Explain your changes.
This PR fixes 2 cases mentioned on this issue #29375:
IterativeImputer, whenkeep_empty_featuresis set toTrue, the iterative imputation step is skipped, and the result is constructed solely from the initial imputation (SimpleImputer). This occurs becausemask_missing_valuesis set toTruefor all non-empty feature columns. As a result, the outcomes differ depending on whetherkeep_empty_featuresis set toFalseorTrue. The proposed solution is to remove this step and retain the originalmask_missing_values.ValueError: Found array with 0 sample(s) ...exception is raised because the estimator attempts to fit data with 0 rows. The proposed solution is to use the imputed values from the initial imputation result when the estimator encounters empty features. These values are constant across all rows.Any other comments?