New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Improves feature names support for SelectFromModel + Est w/o names #21991
FIX Improves feature names support for SelectFromModel + Est w/o names #21991
Conversation
@@ -428,3 +428,34 @@ def test_importance_getter(estimator, importance_getter): | |||
) | |||
selector.fit(data, y) | |||
assert selector.transform(data).shape[1] == 1 | |||
|
|||
|
|||
class RandomForestNoFeatureNames(RandomForestClassifier): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should be using MinimalClassifier
and MinimalRegressor
.
return self | ||
|
||
|
||
def test_estimator_does_not_support_feature_names(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make a more general test where we iterate over all possible estimators that are inheriting from MetaEstimatorMixin
and create a classifier or a regressor that ensure the behaviour with the minimal classifier or regressor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated test in sklearn/tests/test_common.py
to generate meta-estimators with MinimalEstimators.
I left this one here to test get_feature_names_out
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scope of PR increased to cover all subclasses of MetaEstimatorMixin
.
return self | ||
|
||
|
||
def test_estimator_does_not_support_feature_names(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated test in sklearn/tests/test_common.py
to generate meta-estimators with MinimalEstimators.
I left this one here to test get_feature_names_out
.
The scope of this PR increased quite a bit. Meta-estimators are validating inputs when the delegate does not set the required Should we consider this behavior change a bug fix for |
doc/whats_new/v1.0.rst
Outdated
@@ -28,8 +28,7 @@ Version 1.0.2 | |||
:class:`multioutput.MultiOutputRegressor`, | |||
:class:`multiclass.OneVsRestClassifier`, | |||
:class:`multiclass.OutputCodeClassifier`, | |||
:class:`multiclass.OutputCodeClassifier`, | |||
:class:`pipeline.Pipeline`, and :class:`pipeline.FeatureUnion` | |||
:class:`multiclass.OutputCodeClassifier`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we have twice the same classifier here :)
sklearn/multiclass.py
Outdated
@@ -158,10 +158,9 @@ def predict_proba(self, X): | |||
force_all_finite=False, | |||
dtype=None, | |||
accept_sparse=True, | |||
ensure_2d=False, | |||
ensure_2d=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this change come with a test?
I think that we could limit to only the bug fix in |
sklearn/base.py
Outdated
@@ -602,6 +602,42 @@ def _validate_data( | |||
|
|||
return out | |||
|
|||
def _check_features_support(self, X, *, delegate=None, reset=True): | |||
"""Set or check both `n_features_in_` and `feature_names_in_` based on delegate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since _check_features_support
might be a bit generic for people not aware to what we intend to do, would it be good to reference the SLEP of the n_features_in_
and feature_names_in_
as well (in the long description of the docstring).
I have the same feeling. But the PR looks nice otherwise. |
c044c75
to
8cbfa1e
Compare
Updated PR to reduce the scope back to focusing on The alternative is to pass the delegate to |
The better behavior is for the meta-estimator to learn the feature names if the delegate does not, which will be a 1.1 feature. Edit: To pass common test |
scikit-learn#21991) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
#21991) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
scikit-learn#21991) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
scikit-learn#21991) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Reference Issues/PRs
Fixes #21949
What does this implement/fix? Explain your changes.
transform
will validate twice if the inner estimator supportsfeature_names_in_
and also validates. This double validation already happens onmain
.Any other comments?
In a future PR, I think we need a way to configure
feature_names_in_
validation depending on if the delegated estimator supportsfeature_names_in_
.