New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH add multioutput support for RFE #16103
Conversation
Checked the type of the target variable, if the target is equal to multioutput then multioutput is set to zero before check_X_y
Thanks for the pull request! Please fix the pep8 error, and please add a test |
You need to add a test. Could you test that the multioutput is supported in RFECV as well. Since RFECV inherit from RFE, it should work as well quite easily |
Sure @glemaitre I will do that :) |
sklearn/feature_selection/_rfe.py
Outdated
X, y = check_X_y(X, y, "csc", ensure_min_features=2, | ||
force_all_finite=not tags.get('allow_nan', True)) | ||
force_all_finite=not tags.get('allow_nan', True), | ||
multi_output=multioutput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we just use multi_output=True in all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! that's actually possible! I thought we may doing extra computation by calling check_array function in validation.py. I will change that now :)
Sorry accidentally closed this PR reopening it again. @rth Can you please check this and let me know what you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @divyaprabha123 ! LTGM, aside for a minor comment.
Please add an entry to the change log at doc/whats_new/v0.23.rst
. Like the other entries there, please reference this pull request with :pr:
and credit yourself with :user:
.
@@ -395,3 +395,12 @@ def test_rfe_allow_nan_inf_in_x(cv): | |||
rfe = RFE(estimator=clf) | |||
rfe.fit(X, y) | |||
rfe.transform(X) | |||
|
|||
|
|||
def test_multioutput(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine the same should work with RFECV? If so please parametrize this test with,
@pytest.mark.parametrize('ClsRFE', [RFE, RFECV])
def test_multioutput(ClsRFE):
...
rfe_test = ClsRFE(clf)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this will work with REFCV I will also add this in the test. Thank you so much @rth : )
@@ -156,7 +156,8 @@ def _fit(self, X, y, step_score=None): | |||
|
|||
tags = self._get_tags() | |||
X, y = check_X_y(X, y, "csc", ensure_min_features=2, | |||
force_all_finite=not tags.get('allow_nan', True)) | |||
force_all_finite=not tags.get('allow_nan', True), | |||
multi_output=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side comment, I think it is indeed best to assume that multi_output is supported here, pass it to the underlying estimator and let it error if it doesn't. As opposed to relying on multioutput
estimator tag, which can potentially be not accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending @rth's testing suggestion. Thanks @divyaprabha123
Hi @rth and @jnothman I have added the test function as suggested, please go through and let me know if I have change or improve anything! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last comment below, also you need to resolve conflicts (can be done in Github UI).
doc/whats_new/v0.23.rst
Outdated
:class:`ensemble.GradientBoostingClassifier` as well as ``predict`` method of | ||
:class:`tree.DecisionTreeRegressor`, :class:`tree.ExtraTreeRegressor`, and | ||
:class:`ensemble.GradientBoostingRegressor`. | ||
:pr:`16331` by :user:`Alexandre Batisse <batalex>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merge conflict resolution went wrong somehow for the what's new. There should be no diff here aside for your changes. Will push a commit to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rth Thanks for accepting the pull request :)
What does this implement/fix? Explain your changes.
Regressor like random forest natively supports multi-output regression but when we use RFE bad input shape error is thrown, making it insufficient to utilize the multi-output facility.
This implementation checks the type of the target variable using utils.multiclass.type_of_target, if the type is equal to multioutput then "multi_output" argument of check_X_y is set to True.