Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Implemented SelectFromModel meta-transformer #4242

Merged
merged 11 commits into from
Oct 11, 2015

Conversation

MechCoder
Copy link
Member

Continuation of #3011

@MechCoder MechCoder changed the title Implemented SelectFromModel meta-transformer [WIP] Implemented SelectFromModel meta-transformer Feb 12, 2015
@MechCoder
Copy link
Member Author

@amueller @agramfort Do you agree with the general direction in which this PR is going?

Basically a meta-transformer is coupled with an estimator to provide the transform methods, instead of inheriting from _LearntSelectorMixin.

@agramfort
Copy link
Member

hum. What's the vision? too fuzzy for me now

@ogrisel
Copy link
Member

ogrisel commented Feb 13, 2015

Can you please summarize the discussion of #3011 in the description of this PR?

@MechCoder
Copy link
Member Author

Sure, just give me some time.

@jnothman
Copy link
Member

#3011 was meant to pilot the idea of having selection based on coefficients/importances performed by a metaestimator rather than a mixin to predictors. Part of the point is not to clutter the individual estimators' APIs with parameters only used in rare cases. Instead, a metaestimator should:

  • make code much more explicit as to the meaning of the transformation (i.e. constructing a metaestimator provides intrinsic documentation, while using a predictor as a transformer without comment is a bit awkward);
  • provide a focal point for documenting what has become a fairly sophisticated threshold argument, and;
  • potentially make it easier to play with thresholds over a pre-fitted model (although the interface for this is new territory without my magic freeze_model wrapper, without non-standardly adding a parameter to transform.

By way of Zen tradeoffs, it prefers "explicit is better than implicit" over "flat is better than nested". Is the vision clearer, @agramfort?

Also, it's possible @maheshakya had no intention to continue working on this, but @MechCoder it may be wise to check before simply taking it over, even if it's stale.

@jnothman
Copy link
Member

The main immediate benefits of this are: there's no need to move the threshold parameter from transform to the class to make it gridsearchable; there's no need to add the mixin to all classes that have feature_importances_ or coef_ for the sake of consistency.

@jnothman
Copy link
Member

@ogrisel, I think most of the discussion on #3011 was implementation detail. A surprising amount needs to be touched to make this complete as a deprecation, including examples, tests, etc.

@maheshakya
Copy link
Contributor

Thank you @MechCoder for bringing this up again. You can continue this :)
I was not able to complete it since we did not have enough opinion whether to change every example and test according to this and so. So, as @jnothman mentioned the amount of work that needs to be done on those is quite large. I think we should wait to hear what others have to say and for a final resolution about changes.

@agramfort
Copy link
Member

@jnothman I think I get the design decision but my question was more the end user point of view. Somehing along the line of "Users with be able to do XXX with estimators A, B and C by only adding this piece of code in ...".

@MechCoder
Copy link
Member Author

@jnothman @maheshakya Sorry for taking this PR over without saying. I incorrectly assumed that the lack of activity for around 6 months that the Pull Request is stale.

@agramfort A transform can meet two different things now.
1] Implementing the transform method in a transformer means transforming the sample points to a centroid space using the euclidean distances.
2] Inheriting from _LearntSelectorMixin, which means extracting the features of X which are below a certain threshold.

Though both are indeed transforming, inheriting from _LearntSelectorMixin classifies all models as transformers, which transform into different dimensional space. By using this metatransformer, such ambiguity is resolved.

So it goes like "Users will be able to extract the n most important features by wrapping around this Metatransformer with all estimators, without having to subclass from _LearntSelectorMixin"

@agramfort
Copy link
Member

agramfort commented Feb 16, 2015 via email

@jnothman
Copy link
Member

(It's more that stale doesn't strictly imply ripe for adoption. There may
be reasons why something is stale.)

On 17 February 2015 at 04:00, Alexandre Gramfort notifications@github.com
wrote:

ok got it. I like the idea. I guess we need to document this new
SelectFromModel class and explain its usage. It should also appear in
one of the examples.


Reply to this email directly or view it on GitHub
#4242 (comment)
.

@amueller
Copy link
Member

I agree with the direction, thought I didn't have too close a look. Is the idea to remove transform from the models that already have the mixin?

@MechCoder
Copy link
Member Author

Sorry, for the huge delay. I just had a look at the pull request today and found out all the hard work has already been done by @maheshakya . I've updated the pull request description to list out todo's about what all is left to be done.

I will not be able to work on this in the near future. I've updated the label as easy as this should be a very good issue for a new developer.

@amueller
Copy link
Member

amueller commented Sep 2, 2015

@MechCoder maybe this would be a good project for you to finish? It would be very helpful.

@MechCoder
Copy link
Member Author

OK, if you insist.

@MechCoder MechCoder force-pushed the select_from_model branch 4 times, most recently from 5dff9a9 to 22bac70 Compare September 3, 2015 03:39
@MechCoder
Copy link
Member Author

Thinking about it, is it necessary to test each and every model that has a coef_ and feature_importances_ attribute after fitting. I don't think so because the implementation in SelectFromModel is independent of the implementation of the underlying base estimators.

@amueller could you verify that the tests in test_from_model are enough?


:class:`SelectFromModel` is a meta-transformer that can be used along with any
estimator that has a ``coef_`` or ``feature_importances_`` attribute after fitting.
It should be given a ``threshold`` parameter below which the features are considered
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe remove "below"?

@amueller
Copy link
Member

amueller commented Oct 9, 2015

doctests raised some Function transform is deprecated warnings.

@amueller
Copy link
Member

amueller commented Oct 9, 2015

sklearn.utils.tests.test_estimator_checks.test_check_estimator also has one. the rest seems good. [you could raise instead to see if it happens in the tests]

@MechCoder MechCoder force-pushed the select_from_model branch 2 times, most recently from 822560e to 4f09146 Compare October 10, 2015 01:56
@MechCoder
Copy link
Member Author

@amueller thanks for your comments. The last commit should address all of them.

This can be both a fitted (if ``prefit`` is set to True)
or a non-fitted estimator.

threshold : string, float, optional default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is one of those cases where default None is not very helpful, but the description should say "By default, [this is its behaviour]..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mentioned that below. (and might be too long to write here)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that "default None" could be dropped, but say "By default,..." rather than "When None,..." below. But it's a really minor nitpick.

@jnothman
Copy link
Member

Once those are fixed up, it's good for merge (LGTM).

@MechCoder
Copy link
Member Author

@jnothman I'll merge after Travis passes?

@jnothman jnothman changed the title [MRG+1] Implemented SelectFromModel meta-transformer [MRG+2] Implemented SelectFromModel meta-transformer Oct 11, 2015
@jnothman
Copy link
Member

Yes, with two whats_new entries: one under API changes to tell people their transforms will be disappearing, and one under new features. thank you.

MechCoder added a commit that referenced this pull request Oct 11, 2015
[MRG+1] Implemented SelectFromModel meta-transformer
@MechCoder MechCoder merged commit 652b950 into scikit-learn:master Oct 11, 2015
@MechCoder MechCoder deleted the select_from_model branch October 11, 2015 03:15
@MechCoder
Copy link
Member Author

Thanks a lot! @maheshakya @glouppe @amueller @jnothman

@amueller
Copy link
Member

🍻 was lange währt währt endlich gut. Thanks @MechCoder :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants