Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121 #6181

Merged
merged 12 commits into from Oct 24, 2016

Conversation

@antoinewdg
Copy link
Contributor

@antoinewdg antoinewdg commented Jan 18, 2016

As @jnothman mentionned in the mailing list it would be nice to also add this option for RFE. The unit tests for RFE cover the case of a sparse coeff matrix, which I have trouble to handle, so I would need a little help if this were to be done.

@jnothman
Copy link
Member

@jnothman jnothman commented Jan 18, 2016

Thanks! Unfortunately, norm along an axis is not supported in the earliest version of numpy we support. You could add an axis-supporting variant to sklearn.utils.fixes. Or reuse sklearn.preprocessing.normalize which supports fewer ord options.

Also, please add tests.

@MechCoder MechCoder changed the title Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121 [MRG] Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121 Jan 20, 2016
@@ -22,7 +22,8 @@ def _get_feature_importances(estimator):
importances = np.abs(estimator.coef_)

else:
importances = np.sum(np.abs(estimator.coef_), axis=0)
importances = np.linalg.norm(estimator.coef_,

This comment has been minimized.

@MechCoder

MechCoder Jan 20, 2016
Member

As @jnothman says, you can use sklearn.preprocessing.normalize to handle ord not being supported in older versions of NumPy

@@ -173,6 +174,10 @@ class SelectFromModel(BaseEstimator, SelectorMixin):
Otherwise train the model using ``fit`` and then ``transform`` to do
feature selection.
norm_order : non-zero int, inf, -inf, default 1
Order of the norm used to compare the vectors of coefficients in the
case where the coeff_ attribute of the estimator is of dimension 2.

This comment has been minimized.

@MechCoder

MechCoder Jan 20, 2016
Member

I think we can just use the word features here since there can also be feature_importances_

This comment has been minimized.

@antoinewdg

antoinewdg Jan 25, 2016
Author Contributor

Since there is no checking of the dimension of feature_importances_ in the code, I assumed this would always be 1 dimensional and that the norm_order parameter would always be irrelevant when feature_importances_ is used.

This comment has been minimized.

@MechCoder

MechCoder Jan 25, 2016
Member

Oh yes, that is true, sorry for the noise

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Jan 20, 2016

For the test, I would just say to

  1. Fit SelectFromModel on multi-target data, transform the data
  2. Fit the same base estimator on the same data, manually compute the mask by comparing norm(coefficients) < threshold, mask the data
    Check 1] and 2] are same. Similar to (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/tests/test_from_model.py#L73)
@jnothman
Copy link
Member

@jnothman jnothman commented Jan 20, 2016

(I think this is more about multiclass than multi-target.) You could
alternatively create a dummy estimator where fit() stores a fixed coef_.

On 21 January 2016 at 08:24, Manoj Kumar notifications@github.com wrote:

For the test, I would just say to

  1. Fit SelectFromModel on multi-target data, transform the data
  2. Fit the same base estimator on the same data, manually compute the mask
    by comparing norm(coefficients) < threshold, mask the data
    Check 1] and 2] are same. Similar to (
    https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/tests/test_from_model.py#L73
    )


Reply to this email directly or view it on GitHub
#6181 (comment)
.

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Jan 20, 2016

Were you referring to the common use-case or to the test that I linked to?

@jnothman
Copy link
Member

@jnothman jnothman commented Jan 20, 2016

Probably taking your comments out of context.

On 21 January 2016 at 08:57, Manoj Kumar notifications@github.com wrote:

Were you referring to the common use-case or to the test that I linked to?


Reply to this email directly or view it on GitHub
#6181 (comment)
.

@jnothman
Copy link
Member

@jnothman jnothman commented Jan 20, 2016

Is multi-target supported here?

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Jan 20, 2016

It seems so, although untested.

@jnothman
Copy link
Member

@jnothman jnothman commented Jan 21, 2016

Remind me what the shape of multitarget multiclass coef_ is?

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Jan 21, 2016

Oh I meant multitarget/multioutput in a regression setting, not "multitarget multiclass"

@jnothman
Copy link
Member

@jnothman jnothman commented Jan 21, 2016

Oh, okay.

On 21 January 2016 at 14:21, Manoj Kumar notifications@github.com wrote:

Oh I meant multitarget in a regression setting, not "multitarget
multiclass"


Reply to this email directly or view it on GitHub
#6181 (comment)
.

@antoinewdg
Copy link
Contributor Author

@antoinewdg antoinewdg commented Jan 25, 2016

Thank you for all these pieces of advice, if all goes well I will update the pull request within this week.

@antoinewdg
Copy link
Contributor Author

@antoinewdg antoinewdg commented Jan 26, 2016

I chose to implement a fix for the norm function in sklearn.utils.fixes inspired by sklearn.preprocessing.normalize The code in normalize is consequently not really DRY, is it OK to refractor it to use the fix ?

@antoinewdg
Copy link
Contributor Author

@antoinewdg antoinewdg commented Jan 26, 2016

It seems I got confused with the numpy version to check when implementing the fix (keepdims only exists in numpy 1.10), I'll fix this as soon as possible.


if np_version < (1, 10):

def norm(X, ord=None, axis=None, keepdims=False):

This comment has been minimized.

@MechCoder

MechCoder Feb 2, 2016
Member

The keepdims argument seems to be YAGNI to me atleast in this context.

This comment has been minimized.

@antoinewdg

antoinewdg Feb 4, 2016
Author Contributor

Now that you mention it, it seems silly to have bothered with it !

"""
Handles the axis and keepdims parameters for the norm function in
old versions of numpy.
WARNING: Only tested for 2 dimensions, do not use with 1 dimensional arrays

This comment has been minimized.

@MechCoder

MechCoder Feb 2, 2016
Member

Why not just test it? :P

This comment has been minimized.

@antoinewdg

antoinewdg Feb 4, 2016
Author Contributor

Good point, I'll change that

X = X.T

norm = np.zeros(X.shape[0])
for i in range(len(norm)):

This comment has been minimized.

@MechCoder

MechCoder Feb 2, 2016
Member

nitpick: You can enumerate over X

This comment has been minimized.

@antoinewdg

antoinewdg Feb 4, 2016
Author Contributor

I'm sorry I don't get it..

This comment has been minimized.

@maniteja123

maniteja123 Feb 4, 2016
Contributor

I am not an expert but I suppose what was implied is using enumerate to iterate X. Hope it helps. Please correct me if I am missing something.

This comment has been minimized.

@antoinewdg

antoinewdg Feb 4, 2016
Author Contributor

Yes but why and where ? Is the way I chose to implement it inneficient ? I chose to use np.linalg.norm on each line so that I don't have to bother with the ord parameter, I don't see how I could use enumerate without having to write one case for each possible ord.

@@ -173,6 +174,10 @@ class SelectFromModel(BaseEstimator, SelectorMixin):
Otherwise train the model using ``fit`` and then ``transform`` to do
feature selection.
norm_order : non-zero int, inf, -inf, default 1
Order of the norm used to compare the vectors of coefficients in the

This comment has been minimized.

@MechCoder

MechCoder Feb 2, 2016
Member

compare the vectors of coefficients -> filter below threshold?

if axis == 1:
norm = norm.T

return norm

This comment has been minimized.

@MechCoder

MechCoder Feb 2, 2016
Member

norm is also the name of the function

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Feb 2, 2016

LGTM pending comments

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Feb 5, 2016

[MRG] -> [MRG+1]

@MechCoder MechCoder changed the title [MRG] Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121 [MRG+1] Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121 Feb 5, 2016
@MechCoder
Copy link
Member

@MechCoder MechCoder commented Apr 13, 2016

@jnothman Merge?

@@ -173,6 +174,11 @@ class SelectFromModel(BaseEstimator, SelectorMixin):
Otherwise train the model using ``fit`` and then ``transform`` to do
feature selection.
norm_order : non-zero int, inf, -inf, default 1
Order of the norm used to filter the vectors of coefficients below
``threshold`` in the case where the ``coeff_`` attribute of the

This comment has been minimized.

@jnothman

jnothman Apr 14, 2016
Member

*coef_ (one f)

result = np.linalg.norm(X, ord=ord)
return result

if axis == 0:

This comment has been minimized.

@jnothman

jnothman Apr 14, 2016
Member

Should we implement the more general case for a nd-array with integer axis using rollaxis to move axis to the last slot, then reshape so that it's 2d? Alternatively, and more simply, we should explicitly raise NotImplementedError for axis not in (0, 1)

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 14, 2016

  1. Please handle that not (0, 1) case
  2. Fix that typo
  3. Add an entry in doc/whats_new.rst
  4. LGTM; Merge!
@amueller
Copy link
Member

@amueller amueller commented Oct 11, 2016

any updates on this?

antoinewdg added 2 commits Oct 22, 2016
# Conflicts:
#	sklearn/feature_selection/tests/test_from_model.py
#	sklearn/utils/fixes.py
#	sklearn/utils/tests/test_fixes.py
@antoinewdg
Copy link
Contributor Author

@antoinewdg antoinewdg commented Oct 23, 2016

Here are the last fixes, sorry about this ridiculous delay.

@amueller
Copy link
Member

@amueller amueller commented Oct 24, 2016

@antoinewdg don't sweat it, recently found an issue from 2013 that I forgot to reply to ;)

@@ -419,3 +419,33 @@ def __getstate__(self):
self._fill_value)
else:
from numpy.ma import MaskedArray # noqa

if 'axis' not in signature(np.linalg.norm).parameters:

This comment has been minimized.

@amueller

amueller Oct 24, 2016
Member

Can you maybe add the numpy version this was added in a comment?

@amueller
Copy link
Member

@amueller amueller commented Oct 24, 2016

Other than that, LGTM, too

@amueller amueller changed the title [MRG+1] Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121 [MRG+2] Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121 Oct 24, 2016
@antoinewdg antoinewdg force-pushed the antoinewdg:master branch from e5593a6 to b4e0127 Oct 24, 2016
@antoinewdg
Copy link
Contributor Author

@antoinewdg antoinewdg commented Oct 24, 2016

Sure! I'm not really sure what version that is though. The docs start mentioning it from 1.8 so I'll go with that, can't find any patch note about that.

@antoinewdg
Copy link
Contributor Author

@antoinewdg antoinewdg commented Oct 24, 2016

While looking for patch notes I stumbled upon this stackoverflow thread.
This makes me kinda sad, but maybe using einsum directly instead of implementing a whole fix is simpler ?

EDIT: no this is stupid, we would have to make a big 'switch' over the order then.

@amueller
Copy link
Member

@amueller amueller commented Oct 24, 2016

Yeah you should go with when the docs mention it. I'm confused on why you would do einsum instead of the transpose but whatever. The patch looks good :)

@amueller amueller merged commit 74a9756 into scikit-learn:master Oct 24, 2016
2 of 3 checks passed
2 of 3 checks passed
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
sergeyf added a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
…ntSelectorMixin) scikit-learn#2121 (scikit-learn#6181)

* Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) scikit-learn#2121

* safe_pwr utility

* Norm fix

* Removed safe_pwr

* 1D arrays support for norm fix

* Test case for 2d coef in SelectFromModel

* Fix numpy version requirement for norm fix

* Implement fixes suggested by @jnothman

* Add numpy version requiring the fix.
Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…ntSelectorMixin) scikit-learn#2121 (scikit-learn#6181)

* Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) scikit-learn#2121

* safe_pwr utility

* Norm fix

* Removed safe_pwr

* 1D arrays support for norm fix

* Test case for 2d coef in SelectFromModel

* Fix numpy version requirement for norm fix

* Implement fixes suggested by @jnothman

* Add numpy version requiring the fix.
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…ntSelectorMixin) scikit-learn#2121 (scikit-learn#6181)

* Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) scikit-learn#2121

* safe_pwr utility

* Norm fix

* Removed safe_pwr

* 1D arrays support for norm fix

* Test case for 2d coef in SelectFromModel

* Fix numpy version requirement for norm fix

* Implement fixes suggested by @jnothman

* Add numpy version requiring the fix.
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…ntSelectorMixin) scikit-learn#2121 (scikit-learn#6181)

* Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) scikit-learn#2121

* safe_pwr utility

* Norm fix

* Removed safe_pwr

* 1D arrays support for norm fix

* Test case for 2d coef in SelectFromModel

* Fix numpy version requirement for norm fix

* Implement fixes suggested by @jnothman

* Add numpy version requiring the fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.