New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate randomized_l1 module #8995

Closed
amueller opened this Issue Jun 6, 2017 · 25 comments

Comments

Projects
None yet
@amueller
Member

amueller commented Jun 6, 2017

This module should be deprecated and scheduled for removal in 0.21 (or 0.22?). Ping @GaelVaroquaux @agramfort

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jun 6, 2017

Member

no strong feeling.

@GaelVaroquaux any feelings?

Member

agramfort commented Jun 6, 2017

no strong feeling.

@GaelVaroquaux any feelings?

@Sentient07

This comment has been minimized.

Show comment
Hide comment
@Sentient07

Sentient07 Jun 6, 2017

Contributor

Can I go ahead with this?

Contributor

Sentient07 commented Jun 6, 2017

Can I go ahead with this?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jun 6, 2017

Member

@Sentient07 sure, but pick one

Member

amueller commented Jun 6, 2017

@Sentient07 sure, but pick one

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jun 6, 2017

Member

@Sentient07 wait for @GaelVaroquaux to proceed here

Member

agramfort commented Jun 6, 2017

@Sentient07 wait for @GaelVaroquaux to proceed here

@Sentient07

This comment has been minimized.

Show comment
Hide comment
@Sentient07

Sentient07 Jun 6, 2017

Contributor

I was about to push. I will wait.

Contributor

Sentient07 commented Jun 6, 2017

I was about to push. I will wait.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 6, 2017

Member

+1 to deprecate!

If you have the energy it might be good to move them out to separate package, or even a gist, so that the code is not lost.

Member

GaelVaroquaux commented Jun 6, 2017

+1 to deprecate!

If you have the energy it might be good to move them out to separate package, or even a gist, so that the code is not lost.

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jun 6, 2017

Member
Member

agramfort commented Jun 6, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jun 7, 2017

Member

@kmike, here will be one to test out eli5's handling of deprecated and disappearing estimators...

Member

jnothman commented Jun 7, 2017

@kmike, here will be one to test out eli5's handling of deprecated and disappearing estimators...

@sebp

This comment has been minimized.

Show comment
Hide comment
@sebp

sebp Aug 20, 2017

Contributor

Hi, I noticed that RandomizedLogisticRegression was deprecated in version 0.19, but couldn't find an explanation on what was the reasoning behind it. What is the preferred way to perform stability selection now?

Contributor

sebp commented Aug 20, 2017

Hi, I noticed that RandomizedLogisticRegression was deprecated in version 0.19, but couldn't find an explanation on what was the reasoning behind it. What is the preferred way to perform stability selection now?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Aug 22, 2017

Member

I'd appreciate it if someone who advocated for this module to be deprecated (@amueller, @GaelVaroquaux? I know you're both not very available for the next while...) can put its justification on record here. (See also #9603.)

Member

jnothman commented Aug 22, 2017

I'd appreciate it if someone who advocated for this module to be deprecated (@amueller, @GaelVaroquaux? I know you're both not very available for the next while...) can put its justification on record here. (See also #9603.)

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Aug 22, 2017

Thanks @jnothman for linking to my question about it. I have been using Randomized version of both LASSO and LogisticRegression and wonder why are they both being removed? If it is easy to implement using other scikit libraries it would be great to have a code example. Thanks!

ghost commented Aug 22, 2017

Thanks @jnothman for linking to my question about it. I have been using Randomized version of both LASSO and LogisticRegression and wonder why are they both being removed? If it is easy to implement using other scikit libraries it would be great to have a code example. Thanks!

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Aug 22, 2017

Member
Member

agramfort commented Aug 22, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Aug 22, 2017

Member
Member

jnothman commented Aug 22, 2017

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Aug 23, 2017

+1 to moving it into feature_selection. Not having it there is probably the main reason people don't know about it - I only found it in a blog about feature selection in scikit. Would be a waste to remove it given that the code is already there.

ghost commented Aug 23, 2017

+1 to moving it into feature_selection. Not having it there is probably the main reason people don't know about it - I only found it in a blog about feature selection in scikit. Would be a waste to remove it given that the code is already there.

@milroy

This comment has been minimized.

Show comment
Hide comment
@milroy

milroy Aug 30, 2017

Hello, I'm still confused about the future of RandomizedLogisticRegression. Is it being removed entirely or just moved within sklearn? If it's being removed, I support @sebp @jnothman and @JGH1000 in their request for an explanation. I've been using it successfully in some recent work, but will need to look for other algorithms or implementations if RLR is going away.

milroy commented Aug 30, 2017

Hello, I'm still confused about the future of RandomizedLogisticRegression. Is it being removed entirely or just moved within sklearn? If it's being removed, I support @sebp @jnothman and @JGH1000 in their request for an explanation. I've been using it successfully in some recent work, but will need to look for other algorithms or implementations if RLR is going away.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Aug 30, 2017

Member

My motivation for removing it was that in talking with @GaelVaroquaux and @agramfort, the original authors, they felt like there were some issues with the maintenance, and that it was not as helpful in practice as thought.
I don't really have an opinion on the matter. If many people find it useful, we can undo the deprecation. We could move it, too.

Member

amueller commented Aug 30, 2017

My motivation for removing it was that in talking with @GaelVaroquaux and @agramfort, the original authors, they felt like there were some issues with the maintenance, and that it was not as helpful in practice as thought.
I don't really have an opinion on the matter. If many people find it useful, we can undo the deprecation. We could move it, too.

@milroy

This comment has been minimized.

Show comment
Hide comment
@milroy

milroy Aug 30, 2017

Thanks for the additional information, @amueller. While my initial reaction is to support retaining or moving the module, I'm open to alternatives.

Without getting too far afield, @GaelVaroquaux and @agramfort, can you point to practical limitations with the algorithm from Meinshausen and Bühlmann, 2010? If so, were the maintenance issues related to the algorithm's limitations? Is there a more popular alternative for feature selection with high dimensional data offered by sklearn? I'd be grateful for any insights you have.

milroy commented Aug 30, 2017

Thanks for the additional information, @amueller. While my initial reaction is to support retaining or moving the module, I'm open to alternatives.

Without getting too far afield, @GaelVaroquaux and @agramfort, can you point to practical limitations with the algorithm from Meinshausen and Bühlmann, 2010? If so, were the maintenance issues related to the algorithm's limitations? Is there a more popular alternative for feature selection with high dimensional data offered by sklearn? I'd be grateful for any insights you have.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Aug 31, 2017

Member
Member

GaelVaroquaux commented Aug 31, 2017

@milroy

This comment has been minimized.

Show comment
Hide comment
@milroy

milroy Aug 31, 2017

Finicky to hyper parameters.

In my experience the selected set of variables is unstable, particularly when varying n_resampling. I don't believe this parameter is mentioned in Meinshausen and Bühlmann, 2010. @GaelVaroquaux in your experience, which other parameters produce varied results?

I don't think that there are feature-selection methods that work in high dimension outside of easy cases (very sparse case, with uncorrelated design). I know that people will claim opposite, but it's my experience. And a detail read of publications will reveal limitations.

Can you list a few papers? That would be really helpful for my research.

milroy commented Aug 31, 2017

Finicky to hyper parameters.

In my experience the selected set of variables is unstable, particularly when varying n_resampling. I don't believe this parameter is mentioned in Meinshausen and Bühlmann, 2010. @GaelVaroquaux in your experience, which other parameters produce varied results?

I don't think that there are feature-selection methods that work in high dimension outside of easy cases (very sparse case, with uncorrelated design). I know that people will claim opposite, but it's my experience. And a detail read of publications will reveal limitations.

Can you list a few papers? That would be really helpful for my research.

@yasserfarouk

This comment has been minimized.

Show comment
Hide comment
@yasserfarouk

yasserfarouk Oct 6, 2017

In my experience, RandomizedLinearRegression() was slightly unstable in its selection (small random variations of inputs may lead to widely different selected features), yet if its scores are combined with other selectors, it improved the quality of selected features (as measured by the generalization accuracy score of a classifier trained on them). I believe it is still useful. May be though it should be in feature_selection though

yasserfarouk commented Oct 6, 2017

In my experience, RandomizedLinearRegression() was slightly unstable in its selection (small random variations of inputs may lead to widely different selected features), yet if its scores are combined with other selectors, it improved the quality of selected features (as measured by the generalization accuracy score of a classifier trained on them). I believe it is still useful. May be though it should be in feature_selection though

@prubbens

This comment has been minimized.

Show comment
Hide comment
@prubbens

prubbens Oct 9, 2017

I have also been using this method recently, and have seen that, as @yasserfarouk says, the final performance of a classifier can be increased, selecting features in function of the score the method gives rise to. And indeed, the final score can be quite sensitive for certain parameters and initialization values, but this is not (necessarily) bad? But I agree that maybe it should be in a different package concerning feature selection (but maybe then also the univariate feature selection methods should be?)

On a side note, the paper from Meinshausen and Bühlmann has been cited 1171 times on google scholar (of which 52 papers contain the term 'scikit').

prubbens commented Oct 9, 2017

I have also been using this method recently, and have seen that, as @yasserfarouk says, the final performance of a classifier can be increased, selecting features in function of the score the method gives rise to. And indeed, the final score can be quite sensitive for certain parameters and initialization values, but this is not (necessarily) bad? But I agree that maybe it should be in a different package concerning feature selection (but maybe then also the univariate feature selection methods should be?)

On a side note, the paper from Meinshausen and Bühlmann has been cited 1171 times on google scholar (of which 52 papers contain the term 'scikit').

@rosscleung

This comment has been minimized.

Show comment
Hide comment
@rosscleung

rosscleung Oct 13, 2017

I also had used this method in my past projects and I compared the frequently selected features against feature_importance rankings from a randomforest. While there are 1-2 features that are different between the two model's feature rankings, the top selected features are comparable if not the same. I use randomized lasso as another "check" to see if there's any discrepancies with RandomForest's feature importance scores. I find it useful.

The code is there already, and I don't think there's much maintenance needed unless the Pandas library changes (correct me if I'm wrong). Let's keep it but move it over to feature selection :)

If anything, it is better than univariate feature selection methods, is it not?

rosscleung commented Oct 13, 2017

I also had used this method in my past projects and I compared the frequently selected features against feature_importance rankings from a randomforest. While there are 1-2 features that are different between the two model's feature rankings, the top selected features are comparable if not the same. I use randomized lasso as another "check" to see if there's any discrepancies with RandomForest's feature importance scores. I find it useful.

The code is there already, and I don't think there's much maintenance needed unless the Pandas library changes (correct me if I'm wrong). Let's keep it but move it over to feature selection :)

If anything, it is better than univariate feature selection methods, is it not?

@michcio1234

This comment has been minimized.

Show comment
Hide comment
@michcio1234

michcio1234 Apr 5, 2018

I find it uncool that

  • there's a doubt whether this should be deprecated
  • at the same time you force showing DeprecationWarnings.

michcio1234 commented Apr 5, 2018

I find it uncool that

  • there's a doubt whether this should be deprecated
  • at the same time you force showing DeprecationWarnings.
@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Apr 5, 2018

Member
Member

agramfort commented Apr 5, 2018

@prubbens

This comment has been minimized.

Show comment
Hide comment
@prubbens

prubbens Apr 5, 2018

This sounds like a great idea!

prubbens commented Apr 5, 2018

This sounds like a great idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment