-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Improves feature names support for SelectFromModel + Est w/o names #21991
Changes from 2 commits
9cce148
2372615
8cbfa1e
2e6c67b
1a54c72
73df555
592ec2a
7a4382a
03ff090
4df7a42
c49d78c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -428,3 +428,34 @@ def test_importance_getter(estimator, importance_getter): | |
) | ||
selector.fit(data, y) | ||
assert selector.transform(data).shape[1] == 1 | ||
|
||
|
||
class RandomForestNoFeatureNames(RandomForestClassifier): | ||
def fit(self, X, y): | ||
super().fit(X, y) | ||
# Remove feature names | ||
del self.feature_names_in_ | ||
return self | ||
|
||
|
||
def test_estimator_does_not_support_feature_names(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we make a more general test where we iterate over all possible estimators that are inheriting from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated test in I left this one here to test |
||
"""SelectFromModel works with estimators that do not support feature_names_in_. | ||
|
||
Non-regression test for #21949. | ||
""" | ||
pytest.importorskip("pandas") | ||
X, y = datasets.load_iris(as_frame=True, return_X_y=True) | ||
all_feature_names = set(X.columns) | ||
|
||
rf = RandomForestNoFeatureNames() | ||
selector = SelectFromModel(rf).fit(X, y) | ||
|
||
# selector learns the feature names itself | ||
assert_array_equal(selector.feature_names_in_, X.columns) | ||
|
||
feature_names_out = set(selector.get_feature_names_out()) | ||
assert feature_names_out < all_feature_names | ||
|
||
with pytest.warns(None) as records: | ||
selector.transform(X.iloc[1:3]) | ||
assert not [str(record.message) for record in records] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should be using
MinimalClassifier
andMinimalRegressor
.