Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH add multioutput support for RFE #16103

Merged
merged 20 commits into from Feb 4, 2020
Merged

ENH add multioutput support for RFE #16103

merged 20 commits into from Feb 4, 2020

Conversation

divyaprabha123
Copy link
Contributor

What does this implement/fix? Explain your changes.

Regressor like random forest natively supports multi-output regression but when we use RFE bad input shape error is thrown, making it insufficient to utilize the multi-output facility.

This implementation checks the type of the target variable using utils.multiclass.type_of_target, if the type is equal to multioutput then "multi_output" argument of check_X_y is set to True.

Checked the type of the target variable, if the target is equal to multioutput then multioutput is set to zero before check_X_y
@divyaprabha123 divyaprabha123 changed the title Mulioutput support for REF Mulioutput support for RFE Jan 12, 2020
@jnothman
Copy link
Member

Thanks for the pull request! Please fix the pep8 error, and please add a test

@glemaitre glemaitre changed the title Mulioutput support for RFE ENH add multioutput support for RFE Jan 13, 2020
@glemaitre
Copy link
Contributor

You need to add a test. Could you test that the multioutput is supported in RFECV as well. Since RFECV inherit from RFE, it should work as well quite easily

@divyaprabha123
Copy link
Contributor Author

You need to add a test. Could you test that the multioutput is supported in RFECV as well. Since RFECV inherit from RFE, it should work as well quite easily

Sure @glemaitre I will do that :)

X, y = check_X_y(X, y, "csc", ensure_min_features=2,
force_all_finite=not tags.get('allow_nan', True))
force_all_finite=not tags.get('allow_nan', True),
multi_output=multioutput)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just use multi_output=True in all cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! that's actually possible! I thought we may doing extra computation by calling check_array function in validation.py. I will change that now :)

@divyaprabha123
Copy link
Contributor Author

Sorry accidentally closed this PR reopening it again. @rth Can you please check this and let me know what you think?

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @divyaprabha123 ! LTGM, aside for a minor comment.

Please add an entry to the change log at doc/whats_new/v0.23.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself with :user:.

@@ -395,3 +395,12 @@ def test_rfe_allow_nan_inf_in_x(cv):
rfe = RFE(estimator=clf)
rfe.fit(X, y)
rfe.transform(X)


def test_multioutput():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine the same should work with RFECV? If so please parametrize this test with,

@pytest.mark.parametrize('ClsRFE', [RFE, RFECV])
def test_multioutput(ClsRFE):
    ...
    rfe_test = ClsRFE(clf)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this will work with REFCV I will also add this in the test. Thank you so much @rth : )

@@ -156,7 +156,8 @@ def _fit(self, X, y, step_score=None):

tags = self._get_tags()
X, y = check_X_y(X, y, "csc", ensure_min_features=2,
force_all_finite=not tags.get('allow_nan', True))
force_all_finite=not tags.get('allow_nan', True),
multi_output=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side comment, I think it is indeed best to assume that multi_output is supported here, pass it to the underlying estimator and let it error if it doesn't. As opposed to relying on multioutput estimator tag, which can potentially be not accurate.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending @rth's testing suggestion. Thanks @divyaprabha123

@divyaprabha123
Copy link
Contributor Author

LGTM pending @rth's testing suggestion. Thanks @divyaprabha123

Hi @rth and @jnothman I have added the test function as suggested, please go through and let me know if I have change or improve anything!

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment below, also you need to resolve conflicts (can be done in Github UI).

doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved
:class:`ensemble.GradientBoostingClassifier` as well as ``predict`` method of
:class:`tree.DecisionTreeRegressor`, :class:`tree.ExtraTreeRegressor`, and
:class:`ensemble.GradientBoostingRegressor`.
:pr:`16331` by :user:`Alexandre Batisse <batalex>`.
Copy link
Member

@rth rth Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merge conflict resolution went wrong somehow for the what's new. There should be no diff here aside for your changes. Will push a commit to fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rth Thanks for accepting the pull request :)

@rth rth merged commit 63cfc5f into scikit-learn:master Feb 4, 2020
thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020
panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants