FIX Improves feature names support for SelectFromModel + Est w/o names #21991

thomasjpfan · 2021-12-15T21:32:45Z

Reference Issues/PRs

Fixes #21949

What does this implement/fix? Explain your changes.

transform will validate twice if the inner estimator supports feature_names_in_ and also validates. This double validation already happens on main.

Any other comments?

In a future PR, I think we need a way to configure feature_names_in_ validation depending on if the delegated estimator supports feature_names_in_.

…/o names

glemaitre · 2021-12-20T10:24:53Z

sklearn/feature_selection/tests/test_from_model.py

@@ -428,3 +428,34 @@ def test_importance_getter(estimator, importance_getter):
    )
    selector.fit(data, y)
    assert selector.transform(data).shape[1] == 1
+
+
+class RandomForestNoFeatureNames(RandomForestClassifier):


I think that we should be using MinimalClassifier and MinimalRegressor.

glemaitre · 2021-12-20T10:27:16Z

sklearn/feature_selection/tests/test_from_model.py

+        return self
+
+
+def test_estimator_does_not_support_feature_names():


Could we make a more general test where we iterate over all possible estimators that are inheriting from MetaEstimatorMixin and create a classifier or a regressor that ensure the behaviour with the minimal classifier or regressor?

Updated test in sklearn/tests/test_common.py to generate meta-estimators with MinimalEstimators.

I left this one here to test get_feature_names_out.

thomasjpfan

Scope of PR increased to cover all subclasses of MetaEstimatorMixin.

thomasjpfan · 2021-12-20T20:31:45Z

sklearn/feature_selection/tests/test_from_model.py

+        return self
+
+
+def test_estimator_does_not_support_feature_names():


Updated test in sklearn/tests/test_common.py to generate meta-estimators with MinimalEstimators.

I left this one here to test get_feature_names_out.

thomasjpfan · 2021-12-20T21:29:47Z

The scope of this PR increased quite a bit. Meta-estimators are validating inputs when the delegate does not set the required feature_names_in_ or n_features_in_.

Should we consider this behavior change a bug fix for 1.0.2 or something for 1.1? I can reduce this PR back down to just fixing SelectFromModel if we think the scope is too big.

glemaitre · 2021-12-21T10:40:35Z

doc/whats_new/v1.0.rst

@@ -28,8 +28,7 @@ Version 1.0.2
  :class:`multioutput.MultiOutputRegressor`,
  :class:`multiclass.OneVsRestClassifier`,
  :class:`multiclass.OutputCodeClassifier`,
-  :class:`multiclass.OutputCodeClassifier`,
-  :class:`pipeline.Pipeline`, and :class:`pipeline.FeatureUnion`
+  :class:`multiclass.OutputCodeClassifier`.


It looks like we have twice the same classifier here :)

glemaitre · 2021-12-21T10:54:34Z

sklearn/multiclass.py

@@ -158,10 +158,9 @@ def predict_proba(self, X):
            force_all_finite=False,
            dtype=None,
            accept_sparse=True,
-            ensure_2d=False,
+            ensure_2d=True,


Should this change come with a test?

glemaitre · 2021-12-21T11:02:28Z

I think that we could limit to only the bug fix in SelectFromModel for 1.0.2
The additional change will not break backward compatibility and will add support for feature names in the case where we were not doing it before. So we could postpone for 1.1

glemaitre · 2021-12-21T11:05:13Z

sklearn/base.py

@@ -602,6 +602,42 @@ def _validate_data(

        return out

+    def _check_features_support(self, X, *, delegate=None, reset=True):
+        """Set or check both `n_features_in_` and `feature_names_in_` based on delegate.


Since _check_features_support might be a bit generic for people not aware to what we intend to do, would it be good to reference the SLEP of the n_features_in_ and feature_names_in_ as well (in the long description of the docstring).

ogrisel · 2021-12-21T18:09:13Z

I think that we could limit to only the bug fix in SelectFromModel for 1.0.2

I have the same feeling. But the PR looks nice otherwise.

…elagate

thomasjpfan · 2021-12-21T21:49:42Z

Updated PR to reduce the scope back to focusing on SelectFromModel. Technically this PR changes the behavior of SelectFromModel by setting feature_names_in_ if the delegate does not set it.

The alternative is to pass the delegate to _validate_data and do not raise the warning when the delegate does not have feature_names_in_.

…elagate

thomasjpfan · 2021-12-22T01:39:43Z

~~I updated the PR with the bare minimal change to be a bug fix. SelectFromModel no longer validates feature names and delegates to the the base estimator to validate.~~

get_feature_names_out gives the incorrect feature names is expected behavior since SelectFromModel does not have feature names when the base estimator does not define it. One would need to pass in the original feature names to get_feature_names_out to give expected results.

The better behavior is for the meta-estimator to learn the feature names if the delegate does not, which will be a 1.1 feature.

Edit: To pass common test SelectFromModel needs validate. I updated PR to have the better behavior as described above.

This reverts commit 73df555.

…elagate

scikit-learn#21991) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

#21991) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

scikit-learn#21991) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

FIX Improves feature names support for SelectFromModel + estimators w…

9cce148

…/o names

github-actions bot added the module:feature_selection label Dec 15, 2021

DOC Adds whats new PR number

2372615

glemaitre reviewed Dec 20, 2021

View reviewed changes

glemaitre added the Blocker label Dec 20, 2021

glemaitre added this to the 1.0.2 milestone Dec 20, 2021

glemaitre mentioned this pull request Dec 20, 2021

Sklearn installation via pip in python 3.10 linux (ubuntu 20.04 dist) never finishes #21511

Closed

CLN Use MinimalClassifier

8cbfa1e

thomasjpfan changed the title ~~FIX Fixes feature names support for SelectFromModel + Est w/o names~~ FIX Fixes feature attributes for meta estimators + inner estimator that do not support feature attributes Dec 20, 2021

thomasjpfan commented Dec 20, 2021

View reviewed changes

glemaitre reviewed Dec 21, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/main' into select_from_model_d…

2e6c67b

…elagate

thomasjpfan force-pushed the select_from_model_delagate branch from c044c75 to 8cbfa1e Compare December 21, 2021 21:44

thomasjpfan changed the title ~~FIX Fixes feature attributes for meta estimators + inner estimator that do not support feature attributes~~ FIX Improves feature names support for SelectFromModel + Est w/o names Dec 21, 2021

thomasjpfan added 2 commits December 21, 2021 20:30

Merge remote-tracking branch 'upstream/main' into select_from_model_d…

1a54c72

…elagate

ENH Do not check feature names

73df555

thomasjpfan added 3 commits December 23, 2021 16:14

Revert "ENH Do not check feature names"

592ec2a

This reverts commit 73df555.

DOC Improve docstring

7a4382a

Merge remote-tracking branch 'upstream/main' into select_from_model_d…

03ff090

…elagate

glemaitre self-assigned this Dec 24, 2021

glemaitre added 2 commits December 24, 2021 14:10

Merge remote-tracking branch 'origin/main' into pr/thomasjpfan/21991

4df7a42

DOC update chanelog

c49d78c

glemaitre merged commit 6db0e2c into scikit-learn:main Dec 24, 2021

glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021

FIX Improves feature names support for SelectFromModel + Est w/o names (

888e53f

scikit-learn#21991) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

glemaitre added a commit that referenced this pull request Dec 25, 2021

FIX Improves feature names support for SelectFromModel + Est w/o names (

f952d92

#21991) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

venkyyuvy pushed a commit to venkyyuvy/scikit-learn that referenced this pull request Jan 1, 2022

FIX Improves feature names support for SelectFromModel + Est w/o names (

3782035

scikit-learn#21991) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

mathijs02 pushed a commit to mathijs02/scikit-learn that referenced this pull request Dec 27, 2022

FIX Improves feature names support for SelectFromModel + Est w/o names (

a8de494

scikit-learn#21991) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX Improves feature names support for SelectFromModel + Est w/o names #21991

FIX Improves feature names support for SelectFromModel + Est w/o names #21991

thomasjpfan commented Dec 15, 2021

glemaitre Dec 20, 2021

glemaitre Dec 20, 2021

thomasjpfan Dec 20, 2021

thomasjpfan left a comment

thomasjpfan Dec 20, 2021

thomasjpfan commented Dec 20, 2021 •

edited

glemaitre Dec 21, 2021

glemaitre Dec 21, 2021

glemaitre commented Dec 21, 2021

glemaitre Dec 21, 2021

ogrisel commented Dec 21, 2021

thomasjpfan commented Dec 21, 2021 •

edited

thomasjpfan commented Dec 22, 2021 •

edited

		return self


		def test_estimator_does_not_support_feature_names():

FIX Improves feature names support for SelectFromModel + Est w/o names #21991

FIX Improves feature names support for SelectFromModel + Est w/o names #21991

Conversation

thomasjpfan commented Dec 15, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

glemaitre Dec 20, 2021

Choose a reason for hiding this comment

glemaitre Dec 20, 2021

Choose a reason for hiding this comment

thomasjpfan Dec 20, 2021

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan Dec 20, 2021

Choose a reason for hiding this comment

thomasjpfan commented Dec 20, 2021 • edited

glemaitre Dec 21, 2021

Choose a reason for hiding this comment

glemaitre Dec 21, 2021

Choose a reason for hiding this comment

glemaitre commented Dec 21, 2021

glemaitre Dec 21, 2021

Choose a reason for hiding this comment

ogrisel commented Dec 21, 2021

thomasjpfan commented Dec 21, 2021 • edited

thomasjpfan commented Dec 22, 2021 • edited

thomasjpfan commented Dec 20, 2021 •

edited

thomasjpfan commented Dec 21, 2021 •

edited

thomasjpfan commented Dec 22, 2021 •

edited