Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] FEA Add sequential feature selection transformer #17159

Merged
merged 123 commits into from Aug 21, 2020

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented May 7, 2020

This PR adds a SequentialFeatureSelector transformer for forward and backward sequential selection.

Reboots and Closes #8684 @rasbt
Closes #6545

  • Docs and User Guide
  • Example
  • Tests

To minimize reviewing efforts, I made this a minimal working version. In particular there is no support for setting n_features_to_select into a range. This should be reasonably easy to implement but I want to leave that for subsequent PRs.

@NicolasHug
Copy link
Member Author

TLDR: Are we OK to change the linux 32bits CI from using atlas to using blas + lapack?


check_estimators_dtypes() had been failing for a while with the new SequentialFeatureSelector, on the linux 32bits CI. While I don't fully understand the issue, it's related to the call to svd in LinearDiscriminantAnalysis which eventually leads to:

On entry to DGESDD parameter number 10 had an illegal value

Switching from atlas to blas + lapack has solved the issue. It required changing the tolerance of a recently introduced (and unrelated) check in RBM.

CC @ogrisel @thomasjpfan @jeremiedbb @rth

@thomasjpfan
Copy link
Member

check_estimators_dtypes() had been failing for a while with the new SequentialFeatureSelector, on the linux 32bits CI. While I don't fully understand the issue, it's related to the call to svd in LinearDiscriminantAnalysis which eventually leads to:

Can we extract out the dataset that causes LinearDiscriminantAnalysis to segfault?

@NicolasHug
Copy link
Member Author

Can we extract out the dataset that causes LinearDiscriminantAnalysis to segfault?

IDK, can we SSH into the azure instances?

@thomasjpfan
Copy link
Member

thomasjpfan commented Aug 7, 2020

I do not know how much we want to support support atlas on 32 bit. I think we would need some historical context to answer that question.

Edit: The "hack" would be to change the seed in

bias=5.0, noise=20, random_state=42,
and see what happens.

@NicolasHug
Copy link
Member Author

Edit: The "hack" would be to change the seed

were you able to reproduce @thomasjpfan, or is this a wild guess?

@thomasjpfan
Copy link
Member

Wild guess.

@thomasjpfan
Copy link
Member

thomasjpfan commented Aug 8, 2020

I ran

python -m pytest --showlocals --durations=20 --junitxml=test-data.xml \
--pyargs sklearn.tests.test_common -k check_estimators_dtypes -v

with altas here and it passed. This is a very strange bug.

@NicolasHug
Copy link
Member Author

Is there any difference between our CI and the instance you ran @thomasjpfan ? (apart from the -k part)?

This is a very strange bug.

Indeed... not to mention that the 32bits thing started failing after a commit that was just merging with master.

@NicolasHug
Copy link
Member Author

? @thomasjpfan

@thomasjpfan
Copy link
Member

32 bit is not failing anymore with the atlas.

OSX just timeouted. (This has been happening from time to time)

@thomasjpfan
Copy link
Member

Just reran the osx test and it did not hang. All the tests pass now.

@NicolasHug
Copy link
Member Author

what happened? The package version didn't even change, did it?

@NicolasHug
Copy link
Member Author

Looks like the atlas issue was randomly fixed...

I addressed all you comments @glemaitre, ready to merge?

@thomasjpfan
Copy link
Member

Looks like the atlas issue was randomly fixed...

I wasn't able to reproduce the bug when I was debugging it the other day. The "biggest" change was that pytest-xdist is not installed anymore on the 32bit instance.

@glemaitre glemaitre merged commit 5bf37e8 into scikit-learn:master Aug 21, 2020
1 check was pending
@glemaitre
Copy link
Contributor

Only made a numpydoc change. Thanks @rasbt and @NicolasHug

@glemaitre glemaitre moved this from TO REVIEW to TO BE MERGED in Guillaume's pet Aug 24, 2020
@glemaitre glemaitre moved this from TO BE MERGED to MERGED in Guillaume's pet Aug 24, 2020
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Co-authored-by: rasbt <mail@sebastianraschka.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

Add forward selection to scikit-learn
7 participants