Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Enable percentage for n_features_to_select in RFE(Fix PR #16228) #17090

Merged
merged 30 commits into from Jun 27, 2020

Conversation

lschwetlick
Copy link
Contributor

@lschwetlick lschwetlick commented Apr 30, 2020

This brings PR #16228 up to date, adds tests and fixes some issues with it.

That PR was meant to Fix #14567, close #14627 and close #15529 by adding the option to select n features by percentage instead of by absolute number.

The changes PR #16228 made to isotronic.py have since been added already and are not included in this PR. Here I fix some bugs from PR #16228 and add tests for the new functionality.

Ping @noatamir

@lschwetlick lschwetlick changed the title [WIP] Enable percentage for n_features_to_select in RFE(Fix PR #16228) [MRG] Enable percentage for n_features_to_select in RFE(Fix PR #16228) Apr 30, 2020
@glemaitre glemaitre added this to TO REVIEW in Guillaume's pet May 27, 2020
@cmarmo
Copy link
Member

cmarmo commented Jun 9, 2020

Hi @lschwetlick, do you mind fixing the conflicts? Your PR will close a number of stalled PR, I think we should push it to be reviewed. Thanks!

@lschwetlick
Copy link
Contributor Author

Yes, I hope it's fixed it now!

@cmarmo cmarmo requested a review from rth June 9, 2020 09:06
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments otherwise LGTM. Thanks @lschwetlick !

Please add an entry to the change log at doc/whats_new/v0.24.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

sklearn/feature_selection/tests/test_rfe.py Outdated Show resolved Hide resolved
sklearn/feature_selection/_rfe.py Outdated Show resolved Hide resolved
@lschwetlick
Copy link
Contributor Author

Alright, I integrated your suggestions @rth . Looks good?

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @lschwetlick !

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @lschwetlick !

sklearn/feature_selection/_rfe.py Outdated Show resolved Hide resolved
doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved
doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved
sklearn/feature_selection/_rfe.py Outdated Show resolved Hide resolved
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@rth
Copy link
Member

rth commented Jun 17, 2020

Hopefully @thomasjpfan is there to catch all the things I miss :)

@lschwetlick
Copy link
Contributor Author

@thomasjpfan did I get everything?

sklearn/feature_selection/_rfe.py Outdated Show resolved Hide resolved
sklearn/feature_selection/_rfe.py Outdated Show resolved Hide resolved
sklearn/feature_selection/tests/test_rfe.py Outdated Show resolved Hide resolved
doc/whats_new/v0.24.rst Show resolved Hide resolved
lschwetlick and others added 4 commits June 18, 2020 00:12
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
def test_rfe_negative_n_features():
clf = SVC(kernel="linear")
with pytest.raises(ValueError, match=r"n_features_to_select must be *"):
RFE(estimator=clf, n_features_to_select=-1, step=0.1).fit(...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fit(...) suggestion means that ... should be replaced with something meaningful. In this case:

iris = load_iris()
rfe = RFE(estimator=clf, n_features_to_select=-1, step=0.1)
with pytest.raises(ValueError, match=r"n_features_to_select must be *"):
    rfe.fit(iris.data, iris.target)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes of course, sorry, I should have seen that!

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few small updates to this PR. I have one more thought regarding error when n_features_to_select is a float and > 1.0.

sklearn/feature_selection/_rfe.py Outdated Show resolved Hide resolved
sklearn/feature_selection/_rfe.py Outdated Show resolved Hide resolved
sklearn/feature_selection/_rfe.py Show resolved Hide resolved
@lschwetlick
Copy link
Contributor Author

After taking a look at the trees code, I made the error message more explicit and added the case you asked for!

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @lschwetlick !

LGTM

@thomasjpfan thomasjpfan merged commit 911137f into scikit-learn:master Jun 27, 2020
@glemaitre glemaitre moved this from TO REVIEW to REVIEWED AND WAITING FOR CHANGES in Guillaume's pet Jul 7, 2020
@glemaitre glemaitre moved this from REVIEWED AND WAITING FOR CHANGES to MERGED in Guillaume's pet Jul 7, 2020
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Jul 17, 2020
…earn#16228) (scikit-learn#17090)

* added code from PR scikit-learn#14627

* fixed error handling None as n_features_to_select

* added test for error message and percentage passing

* linting

* more linting

* even more linting

* sklearn/feature_selection/_rfe.py make int casting simpler

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

* exchange redundant test case with small float

* add what's new

* removed useless import

* lint

* Fix whats_new/v0.24.rst

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* move whats new section

* disambiguate choice between 1 feature and 100% of features

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* move checking for negatives to _fit

* update error to include nones

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* markdown style in what's new

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* Update sklearn/feature_selection/tests/test_rfe.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* forgot to delete line

* fix test for negative n_features error

* BUG Fixes issues

* added case for float >1 and more detailed error messages

* lint

* more lint

* more lint

* CLN Minor adjustments

* BLD Force build on ci

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
…earn#16228) (scikit-learn#17090)

* added code from PR scikit-learn#14627

* fixed error handling None as n_features_to_select

* added test for error message and percentage passing

* linting

* more linting

* even more linting

* sklearn/feature_selection/_rfe.py make int casting simpler

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

* exchange redundant test case with small float

* add what's new

* removed useless import

* lint

* Fix whats_new/v0.24.rst

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* move whats new section

* disambiguate choice between 1 feature and 100% of features

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* move checking for negatives to _fit

* update error to include nones

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* markdown style in what's new

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* Update sklearn/feature_selection/tests/test_rfe.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* forgot to delete line

* fix test for negative n_features error

* BUG Fixes issues

* added case for float >1 and more detailed error messages

* lint

* more lint

* more lint

* CLN Minor adjustments

* BLD Force build on ci

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

RFI: Setting the number of features to select 'n_features_to_select' to be percentage
4 participants