DEP Adding a warning in the SimpleImputer when strategy mode is constant and keep_empty_features is False#29950
Conversation
…ant and keep_empty_features is False
|
I'll start to make a review with a couple of changes to do. |
|
The current docstring of |
|
So right now, the CI is failing because we raise the warning in some tests: So we should look at each case, and check if it makes sense and catch the warning / change the test or if this is a false positive and a bug in the code. I'll review the code and propose the changes. |
|
So luckily, for the tests However, could you add the following on the top of each test: For the remaining test |
glemaitre
left a comment
There was a problem hiding this comment.
OK here is a round of review.
Please add an entry to the change log at doc/whats_new/v1.6.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.
This will resolve the remaining CI failure related to checking the changelog.
|
Thank you, I will start working on all those reviews. |
…onstant, changing some tests, removing a useless test and updating the changelog
…le/scikit-learn into simple_imputer_add_warning
|
I've made the requested modifications based on your reviews. Sorry that I didn't have much time to focus on it earlier. Everything should be up to date now, and please let me know if there's anything else I need to adjust. I'm waiting for the CI results. |
No issue. Apparently the CI is not happy. I'll have a look at what is going on. |
|
The remaining failure is in another estimator called So here, you would need to catch the warning. The diff will look like: diff --git a/sklearn/impute/_iterative.py b/sklearn/impute/_iterative.py
index d2d47bae3f..a90a659980 100644
--- a/sklearn/impute/_iterative.py
+++ b/sklearn/impute/_iterative.py
@@ -635,6 +635,13 @@ class IterativeImputer(_BaseImputer):
X_missing_mask = _get_mask(X, self.missing_values)
mask_missing_values = X_missing_mask.copy()
+
+ # TODO (1.8): remove this once the deprecation is removed. In the meantime,
+ # we need to catch the warning to avoid false positives.
+ catch_warning = (
+ self.initial_strategy == "constant" and not self.keep_empty_features
+ )
+
if self.initial_imputer_ is None:
self.initial_imputer_ = SimpleImputer(
missing_values=self.missing_values,
@@ -642,9 +649,19 @@ class IterativeImputer(_BaseImputer):
fill_value=self.fill_value,
keep_empty_features=self.keep_empty_features,
).set_output(transform="default")
- X_filled = self.initial_imputer_.fit_transform(X)
+ if catch_warning:
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", FutureWarning)
+ X_filled = self.initial_imputer_.fit_transform(X)
+ else:
+ X_filled = self.initial_imputer_.fit_transform(X)
else:
- X_filled = self.initial_imputer_.transform(X)
+ if catch_warning:
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", FutureWarning)
+ X_filled = self.initial_imputer_.transform(X)
+ else:
+ X_filled = self.initial_imputer_.transform(X)
if in_fit:
self._is_empty_feature = np.all(mask_missing_values, axis=0)
@@ -658,7 +675,7 @@ class IterativeImputer(_BaseImputer):
# The constant strategy has a specific behavior and preserve empty
# features even with ``keep_empty_features=False``. We need to drop
# the column for consistency.
- # TODO: remove this `if` branch once the following issue is addressed:
+ # TODO (1.8): remove this `if` branch once the following issue is addressed:
# https://github.com/scikit-learn/scikit-learn/issues/29827
X_filled = X_filled[:, ~self._is_empty_feature]
Applying it will make the test pass. |
| .. versionchanged:: 1.6 | ||
| Currently, when `keep_empty_feature=False` and `strategy="constant"`, | ||
| empty features are not dropped. This behaviour will change in version | ||
| 1.8. Set `keep_empty_feature=True` to preserve this behaviour. |
There was a problem hiding this comment.
This should solve the documentation failure, sphinx is quite stubborn about those :).
| .. versionchanged:: 1.6 | |
| Currently, when `keep_empty_feature=False` and `strategy="constant"`, | |
| empty features are not dropped. This behaviour will change in version | |
| 1.8. Set `keep_empty_feature=True` to preserve this behaviour. | |
| .. versionchanged:: 1.6 | |
| Currently, when `keep_empty_feature=False` and `strategy="constant"`, | |
| empty features are not dropped. This behaviour will change in version | |
| 1.8. Set `keep_empty_feature=True` to preserve this behaviour. |
|
Thank you ! I have corrected what you told me, however I saw when fetching the main upstream that many lines have been deleted in v.1.6.rst file. I think it is the reason why the check changelog ci is failing. Do I need to modify another file for the changelog ? |
|
Indeed, we change the way we create changelog entry to have a more automated way. Now, we need to create a single file in the module folder. I push a commit to create the file in the last commit. |
glemaitre
left a comment
There was a problem hiding this comment.
LGTM. Thanks @ArthurCourselle
We will need a second reviewer.
Maybe @adrinjalali can have a look such that we get this one in to not delay the deprecation.
|
Ok thank you ! I'm looking forward for the other review then. I will look for other issues that I could work on in the future. Thanks @glemaitre for your help :) |
|
Nice. Thanks @ArthurCourselle for you contribution ;) |
Reference Issues/PRs
Fixes #29827
What does this implement/fix? Explain your changes.
Adding a warning in the SimpleImputer to prevent the user about an upcoming deprecation. This shows when the user set the keep_empty_features to False and the strategy mode is constant and the user have a column full of np.nan because it is not removed.