[MRG] Issue#6673:Make a wrapper around functions that score an individual feature #8038

amanp10 · 2016-12-11T19:45:59Z

It adds a wrapper around scoring functions like 'mutual_info_classif', 'f_classif, etc. Takes as input (X, y) where y can be None.

I have worked on Issue2 as I felt that Issue1 has already been taken care of (Please correct me if I am wrong). Here, I have not added any extra tests to test this wrapper function. Also, its usage is a little different than that mentioned in the Issue2 example.

jnothman · 2016-12-12T02:53:05Z

I don't get the point of this.

… newbranch

amanp10 · 2016-12-12T03:47:23Z

Well, I am looking forward to your instructions on what to do next.

jnothman · 2016-12-12T04:43:21Z

But what's it for? It doesn't related to the wrapper described in #6673's second issue as far as I can tell.

amanp10 · 2016-12-12T12:57:01Z

I have made this function which takes 3 inputs, scoring function, X and y(=None). Now the feature selectors like SelectKBest take as parameter the scoring function and then the base class calls the scoring function through the wrapper.
The usage is different from the Issue2 example that the SelectKBest doesnt take as parameter the wrapper function but the scoring function and the base class uses the wrapper function.
Also, the negative score values returned by scoring functions are changed to 0 in the wrapper function.

I feel I am missing something. Please guide me on what exactly is expected by the Issue2. Thanks a lot for your help.

… newbranch

jnothman · 2016-12-14T00:04:03Z

I have made this function which takes 3 inputs, scoring function, X and y(=None). Now the feature selectors like SelectKBest take as parameter the scoring function and then the base class calls the scoring function through the wrapper.

Yes, but can you give me some example of where this is useful??

I think the proposal in issue 2 was that it should allow you to translate a function that operates over a feature vector into one that operates over a matrix. I'm not sure it's entirely necessary, personally.

amanp10 · 2016-12-14T05:04:55Z

I am not sure about the necessity of this function myself, I have only tried to achieve what was required in the issue, I think it went wrong. If you say then I will start working on the Issue's proposal as you said above.

… newbranch

jnothman · 2016-12-14T06:18:03Z

@hlin117 your input would be welcome

hlin117 · 2016-12-17T08:08:28Z

I have made this function which takes 3 inputs, scoring function, X and y(=None). Now the feature selectors like SelectKBest take as parameter the scoring function and then the base class calls the scoring function through the wrapper.
Yes, but can you give me some example of where this is useful??

I think the proposal in issue 2 was that it should allow you to translate a function that operates over a feature vector into one that operates over a matrix. I'm not sure it's entirely necessary, personally.

Yes, you're correct about the objective of the issue. As for the necessity, it's not necessary, however it does act as a convenience function.

Otherwise, say my scoring function was scipy.stats.pearsonr. I want to use SelectKBest, with respect to this scoring function. With the current scikit-learn framework, I have no way of doing this.

hlin117

Got any test cases?

hlin117 · 2016-12-17T08:10:41Z

sklearn/feature_selection/base.py

@@ -11,7 +11,7 @@
 from scipy.sparse import issparse, csc_matrix

 from ..base import TransformerMixin
-from ..utils import check_array, safe_mask
+from ..utils import check_array, safe_mask, check_X_y


Nit: Make this alphabetical, check_X_y should come before safe_mask.

hlin117 · 2016-12-17T08:11:49Z

sklearn/feature_selection/base.py

+
+
+def wrapper_scorer(score_func, X, y=None):
+    """ A wrapper function around score functions. This function takes as


The first line of a docstring should be a one sentence summary. See PEP 257

hlin117 · 2016-12-17T08:18:10Z

sklearn/feature_selection/base.py

+        The target values (class labels in classification, real numbers in
+        regression).
+
+    Notes


Reading this docstring, maybe there's a disconnect of how you think this function wrapper would be useful. It would help you - and people using your code - to provide an example of how they should expect to use it.

hlin117 · 2016-12-17T08:21:51Z

sklearn/feature_selection/base.py

+    """
+
+    if not callable(score_func):
+        raise TypeError("The score function should be a callable, %s (%s) "


Scikit-learn typically raises ValueErrors for these kinds of things.

hlin117 · 2016-12-17T08:23:37Z

sklearn/feature_selection/base.py

+    if y is None:
+        X = check_array(X, ('csr', 'csc'))
+    else:
+        X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=True)


You can change ['csr', 'csc'] to ('csr', 'csc'). Tuples have less overhead than lists.

Also, in my opinion, you should add in the sparsity test cases later. Try getting this PR approved for dense matrices before you move on to adapt it to sparse matrices.

hlin117 · 2016-12-17T08:25:11Z

sklearn/feature_selection/base.py

+    if y is None:
+        score_func_ret = score_func(X)
+    else:
+        score_func_ret = score_func(X, y)


Reviewing this code, it looks like you have a lot of special casing for when the return values of the functions is a pair rather than a single value. I'd suggest separating the logic out into two helper functions.

hlin117 · 2016-12-17T08:25:49Z

sklearn/feature_selection/base.py

+    y : array-like or None, shape = [n_samples]
+        The target values (class labels in classification, real numbers in
+        regression).
+


You should add docs of what this function returns.

hlin117 · 2016-12-17T08:27:36Z

sklearn/feature_selection/base.py

+    if pvalues is None:
+        return scores
+    else:
+        return scores, pvalues


Hmm, looking at these return values, I don't think you're understanding what issue 2 of #6673 is describing. We're just looking for a way to wrap this scoring function; the output of this function should be another callable. And this callable can be used - for example - into select k best.

hlin117 · 2016-12-17T08:28:33Z

sklearn/feature_selection/univariate_selection.py

@@ -17,7 +17,7 @@
                     safe_mask)
 from ..utils.extmath import norm, safe_sparse_dot, row_norms
 from ..utils.validation import check_is_fitted
-from .base import SelectorMixin
+from .base import SelectorMixin, wrapper_scorer


What are you doing in this file? I think you're making this PR far too big, and it's diverging in scope.

… newbranch

amanp10 · 2017-01-09T19:43:37Z

The work seems to be done, the example mentioned in the issue is working fine. However, I have not used another callable for the output as suggested by @hlin117 , as it didnt feel necessary. Please correct me if I am wrong.
Also, should I add tests as well @jnothman ?

jnothman · 2017-01-09T22:38:03Z

Maybe the issue was incorrect. I don't think there's any need for a wrapper scorer. Already the code handles the case where the return value is not a tuple; a wrapper is not needed. See documentation for score_func in SelectKBest, SelectPercentile.

amanp10 · 2017-01-10T03:53:04Z

Yes @jnothman . But the second part of the issue is the actual one. That is what I have tried to solve here, a wrapper for scipy.stats scoring functions since they could not be used directly by SelectKBest and others.

jnothman · 2017-01-10T04:10:50Z

Sorry. Got lost over the last few weeks.

jnothman · 2017-01-10T04:18:45Z

I must admit, the name wrapper_scorer diminishes the clarity of what's going on vastly. How about make_per_feature(pearsonr)(X, y)? I'm not coming up with something perfect yet:

make_per_feature(pearsonr)(X, y)
make_per_column(pearsonr)(X, y)
make_columnwise(pearsonr)(X, y)
make_featurewise(pearsonr)(X, y)
per_feature(pearsonr)(X, y)
featurewise(pearsonr)(X, y)
stat_per_feature(pearsonr)(X, y)

amanp10 · 2017-01-10T08:21:59Z

What about feature_wise_scorer(pearsonr)(X, y) or feature_wise_stat_scorer(pearsonr)(X, y)?

jnothman · 2017-01-10T09:36:27Z

I don't mind feature_wise_scorer. Perhaps even feature_scorer or make_feature_scorer

…

On 10 January 2017 at 19:22, Aman Pratik ***@***.***> wrote: What about feature_wise_scorer(pearsonr)(X, y) or feature_wise_stat_scorer(pearsonr)(X, y)? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8038 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz62oyRvXTr82v98O8adIoGpGYkYneks5rQz-ogaJpZM4LKAzM> .

hlin117 · 2017-01-10T18:52:08Z

I'm eager to see this issue addressed! Thank you for working on it, @amanp10 .

jnothman · 2017-01-10T22:40:30Z

Do you have a suggestion for name?

…

On 11 January 2017 at 05:52, Henry Lin ***@***.***> wrote: I'm eager to see this issue addressed! Thank you for working on it, @amanp10 <https://github.com/amanp10> . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8038 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6-j7tJpBX5uRC3j-Z3ee_JUgVa4eks5rQ9NZgaJpZM4LKAzM> .

amanp10 · 2017-01-11T03:35:19Z

Should I finally name it feature_wise_scorer(pearsonr)(X, y) ? Also, should I add a test or make the final changes by changing the whats_new.rst and commit?

jnothman · 2017-01-11T05:26:09Z

You can change the name, but the name may yet be disputed before this is merged! Your work will be approved by two core devs before it is merged. You should add a test in any case, and you are welcome to add an entry in what's new, though we usually do that closer to merge to ensure that things (the target version, the scope of functionality) don't change too much.

…

On 11 January 2017 at 14:35, Aman Pratik ***@***.***> wrote: Should I finally name it feature_wise_scorer(pearsonr)(X, y) ? Also, should I add a test or make the final changes by changing the whats_new.rst and commit? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8038 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz66QAr-HTGMbeYxbYob-pvG28en_Tks5rRE35gaJpZM4LKAzM> .

amanp10 · 2017-10-08T16:14:38Z

I need some help here. I am unable to figure out why the ci/circleci tests are failing. It reports a problem somewhere in an example.

…nto newbranch

amanp10 · 2017-12-03T08:33:46Z

@jnothman @hlin117 I have removed the option for y=None as I couldnt find scoring functions which take only X as input. I wanted your views on it.

jnothman

I'm still not convinced that users will often be helped by this or know to look for it... Some narrative docs under doc/modules/feature_selection.rst giving an example with spearmanr, for example, would help.

jnothman · 2017-12-12T00:02:56Z

sklearn/feature_selection/tests/test_base.py

+    if isinstance(score_func_ret, (list, tuple)):
+        score, p_val = score_func_ret
+    else:
+        score = score_func_ret


Never run in tests

I think I will just scrape the else part, since I couldn't get a function returning only scores.

What should I do about this? I am not sure how to test the else part. Also, is it necessary to test it sinice ScoreFunction is just a dummy function created for testing purpose.

Can't you just use lambda *args, **kwargs: spearmanr(*args, **kwargs)[0]??

jnothman · 2018-01-09T01:14:50Z

sklearn/feature_selection/base.py

+        for i in six.moves.range(X.shape[1]):
+            score_func_ret = score_func(X[:, i], y, **kwargs)
+
+            if isinstance(score_func_ret, (list, tuple)):


I suspect we should only support tuples here. Lists should have different meaning.

jnothman

Also, add it to doc/modules/classes.rst

jnothman · 2018-01-09T08:51:55Z

I think it's fair enough to keep the else

…

On 9 Jan 2018 7:50 pm, "Aman Pratik" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In sklearn/feature_selection/tests/test_base.py <#8038 (comment)> : > @@ -38,6 +41,22 @@ def _get_support_mask(self): feature_names_inv[1::2] = '' +def ScoreFunction(X, y, **kwargs): + # Custom score function, using 'spearmanr' for returning 'scores' only. + + score = [] + p_val = [] + + score_func_ret = spearmanr(X, y, **kwargs) + + if isinstance(score_func_ret, (list, tuple)): + score, p_val = score_func_ret + else: + score = score_func_ret I think I will just scrape the else part, since I couldn't get a function returning only scores. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8038 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6w5Pgk9ml8iBSHc5W_7F2nPBMlJcks5tIyhUgaJpZM4LKAzM> .

jnothman · 2018-01-09T22:03:28Z

doc/modules/feature_selection.rst

+:func:`featurewise_scorer` is a wrapper function which wraps around scoring
+functions like `spearmanr`, `pearsonr` etc. from the `scipy.stats` module and
+makes it usable for feature selection algorithms like :class:`SelectKBest`,
+:class:`SelectPercentile` etc.


Clarify that it compares each column of X to y

jnothman · 2018-01-09T22:03:46Z

doc/modules/feature_selection.rst

+makes it usable for feature selection algorithms like :class:`SelectKBest`,
+:class:`SelectPercentile` etc.
+
+The following example illustrates it's usage:


Drop the apostrophe

jnothman · 2018-01-09T22:04:26Z

doc/modules/feature_selection.rst

+  SelectKBest(k=10, score_func=...)
+  >>> new_X = skb.transform(X)
+
+This wrapper function returns the absolute value of the scores i.e. a score of


I'm not sure we should do this without an option

I think, in SelectKBest we are supposed to choose the features having maximum correlation with the target vector. In that case the magnitude of the scores (since they may be negative) returned should serve our purpose.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html

Yes, but that may not be true of all appropriate score functions

In that case, should we add a parameter like absolute_score which takes values True and False, if its True (default) absolute scores would be considered.
What do you say?

jnothman · 2018-01-10T10:17:14Z

sure. just call it abs

…

On 10 Jan 2018 9:00 pm, "Aman Pratik" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In doc/modules/feature_selection.rst <#8038 (comment)> : > +functions like `spearmanr`, `pearsonr` etc. from the `scipy.stats` module and +makes it usable for feature selection algorithms like :class:`SelectKBest`, +:class:`SelectPercentile` etc. + +The following example illustrates it's usage: + + >>> from sklearn.feature_selection import featurewise_scorer, SelectKBest + >>> from scipy.stats import spearmanr + >>> from sklearn.datasets import make_classification + >>> X, y = make_classification(random_state=0) + >>> skb = SelectKBest(featurewise_scorer(spearmanr, axis=0), k=10) + >>> skb.fit(X, y) #doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE + SelectKBest(k=10, score_func=...) + >>> new_X = skb.transform(X) + +This wrapper function returns the absolute value of the scores i.e. a score of In that case, should we add a parameter like absolute_score which takes values True and False, if its True (default) absolute scores would be considered. What do you say? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8038 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz64oAICLDj5y4WxWJ2Ki31H8l2Bq1ks5tJIpFgaJpZM4LKAzM> .

amanp10 · 2018-01-10T18:30:36Z

'abs' is a python function, so I went with absolute_score.

jnothman · 2018-01-10T21:57:58Z

nothing wrong with using a built-in function name as a kwarg, but okay.

jnothman · 2018-01-10T21:59:22Z

sklearn/feature_selection/base.py

@@ -131,6 +131,8 @@ def featurewise_scorer(score_func, **kwargs):
        Function taking arrays X and y, and returning a pair of arrays
        (scores, pvalues) or a single array with scores. This function is also
        allowed to take other parameters as input.
+    absolute_score : bool
+        If True (default), the absolute value of the scores are returned.


Add that "this is useful when using correlation coefficients"

jnothman

I remain +.5 on this. I understand how it is helpful, but I suspect this kind of helper should not be in the library, but merely present as an example. Note that the case in the example here can be implemented as:

def featurwise_spearmanr(X, y):
    scores, pvals = zip(*(spearmanr(x, y) for x in X.T))
    return np.abs(scores), pvals

jnothman · 2018-01-11T07:07:16Z

sklearn/feature_selection/base.py

+    -------
+    scores : array-like, shape (n_features,)
+        Score values returned by the scoring function.
+    p_vals : array-like, shape (n_features,)


Mark this dependent on the score function

jnothman · 2018-01-11T07:07:39Z

sklearn/feature_selection/base.py

+    Parameters
+    ----------
+    score_func : callable
+        Function taking arrays X and y, and returning a pair of arrays


Doesn't it just return a pair of numbers?

I will change the entire description appropriately.

jnothman · 2018-01-11T07:07:59Z

sklearn/feature_selection/base.py

+        allowed to take other parameters as input.
+    absolute_score : bool
+        If True (default), the absolute value of the scores are returned,
+        which is useful when using correlation coefficients.


Document kwargs also

jnothman · 2018-01-11T07:11:17Z

sklearn/feature_selection/tests/test_base.py

+    X, y = make_classification(random_state=0)
+
+    # spearmanr from scipy.stats
+    skb = SelectKBest(featurewise_scorer(spearmanr, axis=0), k=10)


You should really be testing the new function alone. We already have checked that selectkbest works.

I meant to test if the wrapper is working as it is supposed to be used. I will try to change the tests, testing the function alone.

amanp10 · 2018-01-14T10:38:12Z

@jnothman I am not able to comment on the necessity of this feature or its importance in the library. I think the other core devs might be able to help.

agramfort · 2019-02-25T10:15:51Z

I would close this one. It adds code to core for something I would expect a user to be able to write.

I agree with @jnothman here. Feel free to reopen if you disagree.

Issue#6673

a491eb1

amanp10 added 2 commits December 12, 2016 09:06

Issue#6673

d5bcc43

Merge branch 'master' of https://github.com/amanp10/scikit-learn into…

29c62d6

… newbranch

Merge branch 'master' of https://github.com/amanp10/scikit-learn into…

3c6ce4d

… newbranch

Merge branch 'master' of https://github.com/amanp10/scikit-learn into…

9ebd8eb

… newbranch

hlin117 suggested changes Dec 17, 2016

View reviewed changes

amanp10 added 5 commits December 18, 2016 12:55

Merge branch 'master' of https://github.com/amanp10/scikit-learn into…

08a459c

… newbranch

Issue#6673 Removed some errors in previous PR

d402b9c

Merge branch 'master' of https://github.com/amanp10/scikit-learn into…

08129f7

… newbranch

Wrapper for scoring function added

c7341ed

Merge branch 'master' of https://github.com/amanp10/scikit-learn into…

dec0bfd

… newbranch

amanp10 added 3 commits June 22, 2017 19:21

Resolving_Conflicts2

7222458

Resolving_Conflicts3

d23d9fd

Resolve Conflict, Build error correction

5f4ccbf

amanp10 added 5 commits December 2, 2017 00:55

circleci error correction: plot_stock_market.py

ea7e362

Merge branch 'master' into newbranch

4ea8a8d

Travisci error correction

0fef58c

Improve Coverage

becef97

Merge branch 'newbranch' of https://github.com/amanp10/scikit-learn i…

ea794a9

…nto newbranch

jnothman reviewed Jan 9, 2018

View reviewed changes

amanp10 added 2 commits January 9, 2018 14:31

Review Changes

28cff94

Review Changes

b91a5b7

jnothman reviewed Jan 9, 2018

View reviewed changes

Optional absolute score values

79b9418

jnothman reviewed Jan 10, 2018

View reviewed changes

absolute_score description

8dd85d7

jnothman reviewed Jan 11, 2018

View reviewed changes

Review Changes

100649c

Merge branch 'master' into newbranch

9520d69

agramfort closed this Feb 25, 2019

cmarmo mentioned this pull request May 29, 2020

More scoring flexibility for SelectKBest / SelectPercentile #6673

Closed

anhqngo mentioned this pull request Jul 28, 2020

Add keyword parameter to scoring functions to support different types of data #18023

Open

2 tasks



		def wrapper_scorer(score_func, X, y=None):
		""" A wrapper function around score functions. This function takes as

[MRG] Issue#6673:Make a wrapper around functions that score an individual feature #8038

[MRG] Issue#6673:Make a wrapper around functions that score an individual feature #8038

Conversation

amanp10 commented Dec 11, 2016

jnothman commented Dec 12, 2016

amanp10 commented Dec 12, 2016

jnothman commented Dec 12, 2016

amanp10 commented Dec 12, 2016

jnothman commented Dec 14, 2016

amanp10 commented Dec 14, 2016

jnothman commented Dec 14, 2016

hlin117 commented Dec 17, 2016

hlin117 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amanp10 commented Jan 9, 2017

jnothman commented Jan 9, 2017

amanp10 commented Jan 10, 2017

jnothman commented Jan 10, 2017

jnothman commented Jan 10, 2017 • edited

amanp10 commented Jan 10, 2017

jnothman commented Jan 10, 2017 via email

hlin117 commented Jan 10, 2017

jnothman commented Jan 10, 2017 via email

amanp10 commented Jan 11, 2017

jnothman commented Jan 11, 2017 via email

amanp10 commented Oct 8, 2017

amanp10 commented Dec 3, 2017

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

jnothman commented Jan 9, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman commented Jan 10, 2018 via email

amanp10 commented Jan 10, 2018

jnothman commented Jan 10, 2018 via email

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amanp10 commented Jan 14, 2018

agramfort commented Feb 25, 2019

jnothman commented Jan 10, 2017 •

edited