New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Issue#6673:Make a wrapper around functions that score an individual feature #8038
Conversation
I don't get the point of this. |
Well, I am looking forward to your instructions on what to do next. |
But what's it for? It doesn't related to the wrapper described in #6673's second issue as far as I can tell. |
I have made this function which takes 3 inputs, scoring function, X and y(=None). Now the feature selectors like SelectKBest take as parameter the scoring function and then the base class calls the scoring function through the wrapper. I feel I am missing something. Please guide me on what exactly is expected by the Issue2. Thanks a lot for your help. |
Yes, but can you give me some example of where this is useful?? I think the proposal in issue 2 was that it should allow you to translate a function that operates over a feature vector into one that operates over a matrix. I'm not sure it's entirely necessary, personally. |
I am not sure about the necessity of this function myself, I have only tried to achieve what was required in the issue, I think it went wrong. If you say then I will start working on the Issue's proposal as you said above. |
@hlin117 your input would be welcome |
Yes, you're correct about the objective of the issue. As for the necessity, it's not necessary, however it does act as a convenience function. Otherwise, say my scoring function was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got any test cases?
sklearn/feature_selection/base.py
Outdated
@@ -11,7 +11,7 @@ | |||
from scipy.sparse import issparse, csc_matrix | |||
|
|||
from ..base import TransformerMixin | |||
from ..utils import check_array, safe_mask | |||
from ..utils import check_array, safe_mask, check_X_y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Make this alphabetical, check_X_y
should come before safe_mask
.
sklearn/feature_selection/base.py
Outdated
|
||
|
||
def wrapper_scorer(score_func, X, y=None): | ||
""" A wrapper function around score functions. This function takes as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first line of a docstring should be a one sentence summary. See PEP 257
The target values (class labels in classification, real numbers in | ||
regression). | ||
|
||
Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this docstring, maybe there's a disconnect of how you think this function wrapper would be useful. It would help you - and people using your code - to provide an example of how they should expect to use it.
sklearn/feature_selection/base.py
Outdated
""" | ||
|
||
if not callable(score_func): | ||
raise TypeError("The score function should be a callable, %s (%s) " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scikit-learn typically raises ValueErrors for these kinds of things.
sklearn/feature_selection/base.py
Outdated
if y is None: | ||
X = check_array(X, ('csr', 'csc')) | ||
else: | ||
X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can change ['csr', 'csc']
to ('csr', 'csc')
. Tuples have less overhead than lists.
Also, in my opinion, you should add in the sparsity test cases later. Try getting this PR approved for dense matrices before you move on to adapt it to sparse matrices.
sklearn/feature_selection/base.py
Outdated
if y is None: | ||
score_func_ret = score_func(X) | ||
else: | ||
score_func_ret = score_func(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing this code, it looks like you have a lot of special casing for when the return values of the functions is a pair rather than a single value. I'd suggest separating the logic out into two helper functions.
y : array-like or None, shape = [n_samples] | ||
The target values (class labels in classification, real numbers in | ||
regression). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add docs of what this function returns.
sklearn/feature_selection/base.py
Outdated
if pvalues is None: | ||
return scores | ||
else: | ||
return scores, pvalues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, looking at these return values, I don't think you're understanding what issue 2 of #6673 is describing. We're just looking for a way to wrap this scoring function; the output of this function should be another callable. And this callable can be used - for example - into select k best.
@@ -17,7 +17,7 @@ | |||
safe_mask) | |||
from ..utils.extmath import norm, safe_sparse_dot, row_norms | |||
from ..utils.validation import check_is_fitted | |||
from .base import SelectorMixin | |||
from .base import SelectorMixin, wrapper_scorer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you doing in this file? I think you're making this PR far too big, and it's diverging in scope.
Maybe the issue was incorrect. I don't think there's any need for a wrapper scorer. Already the code handles the case where the return value is not a tuple; a wrapper is not needed. See documentation for |
Yes @jnothman . But the second part of the issue is the actual one. That is what I have tried to solve here, a wrapper for |
Sorry. Got lost over the last few weeks. |
I must admit, the name
|
What about |
I don't mind feature_wise_scorer. Perhaps even feature_scorer or
make_feature_scorer
…On 10 January 2017 at 19:22, Aman Pratik ***@***.***> wrote:
What about feature_wise_scorer(pearsonr)(X, y) or
feature_wise_stat_scorer(pearsonr)(X, y)?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8038 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz62oyRvXTr82v98O8adIoGpGYkYneks5rQz-ogaJpZM4LKAzM>
.
|
I'm eager to see this issue addressed! Thank you for working on it, @amanp10 . |
Do you have a suggestion for name?
…On 11 January 2017 at 05:52, Henry Lin ***@***.***> wrote:
I'm eager to see this issue addressed! Thank you for working on it,
@amanp10 <https://github.com/amanp10> .
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8038 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6-j7tJpBX5uRC3j-Z3ee_JUgVa4eks5rQ9NZgaJpZM4LKAzM>
.
|
Should I finally name it |
You can change the name, but the name may yet be disputed before this is
merged! Your work will be approved by two core devs before it is merged.
You should add a test in any case, and you are welcome to add an entry in
what's new, though we usually do that closer to merge to ensure that things
(the target version, the scope of functionality) don't change too much.
…On 11 January 2017 at 14:35, Aman Pratik ***@***.***> wrote:
Should I finally name it feature_wise_scorer(pearsonr)(X, y) ? Also,
should I add a test or make the final changes by changing the whats_new.rst
and commit?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8038 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz66QAr-HTGMbeYxbYob-pvG28en_Tks5rRE35gaJpZM4LKAzM>
.
|
I need some help here. I am unable to figure out why the ci/circleci tests are failing. It reports a problem somewhere in an example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not convinced that users will often be helped by this or know to look for it... Some narrative docs under doc/modules/feature_selection.rst
giving an example with spearmanr, for example, would help.
if isinstance(score_func_ret, (list, tuple)): | ||
score, p_val = score_func_ret | ||
else: | ||
score = score_func_ret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never run in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will just scrape the else part, since I couldn't get a function returning only scores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should I do about this? I am not sure how to test the else part. Also, is it necessary to test it sinice ScoreFunction
is just a dummy function created for testing purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just use lambda *args, **kwargs: spearmanr(*args, **kwargs)[0]
??
sklearn/feature_selection/base.py
Outdated
for i in six.moves.range(X.shape[1]): | ||
score_func_ret = score_func(X[:, i], y, **kwargs) | ||
|
||
if isinstance(score_func_ret, (list, tuple)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we should only support tuples here. Lists should have different meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, add it to doc/modules/classes.rst
I think it's fair enough to keep the else
…On 9 Jan 2018 7:50 pm, "Aman Pratik" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/feature_selection/tests/test_base.py
<#8038 (comment)>
:
> @@ -38,6 +41,22 @@ def _get_support_mask(self):
feature_names_inv[1::2] = ''
+def ScoreFunction(X, y, **kwargs):
+ # Custom score function, using 'spearmanr' for returning 'scores' only.
+
+ score = []
+ p_val = []
+
+ score_func_ret = spearmanr(X, y, **kwargs)
+
+ if isinstance(score_func_ret, (list, tuple)):
+ score, p_val = score_func_ret
+ else:
+ score = score_func_ret
I think I will just scrape the else part, since I couldn't get a function
returning only scores.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8038 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6w5Pgk9ml8iBSHc5W_7F2nPBMlJcks5tIyhUgaJpZM4LKAzM>
.
|
:func:`featurewise_scorer` is a wrapper function which wraps around scoring | ||
functions like `spearmanr`, `pearsonr` etc. from the `scipy.stats` module and | ||
makes it usable for feature selection algorithms like :class:`SelectKBest`, | ||
:class:`SelectPercentile` etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify that it compares each column of X to y
doc/modules/feature_selection.rst
Outdated
makes it usable for feature selection algorithms like :class:`SelectKBest`, | ||
:class:`SelectPercentile` etc. | ||
|
||
The following example illustrates it's usage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the apostrophe
doc/modules/feature_selection.rst
Outdated
SelectKBest(k=10, score_func=...) | ||
>>> new_X = skb.transform(X) | ||
|
||
This wrapper function returns the absolute value of the scores i.e. a score of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should do this without an option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, in SelectKBest
we are supposed to choose the features having maximum correlation with the target vector. In that case the magnitude of the scores (since they may be negative) returned should serve our purpose.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that may not be true of all appropriate score functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, should we add a parameter like absolute_score
which takes values True
and False
, if its True
(default) absolute scores would be considered.
What do you say?
sure. just call it abs
…On 10 Jan 2018 9:00 pm, "Aman Pratik" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In doc/modules/feature_selection.rst
<#8038 (comment)>
:
> +functions like `spearmanr`, `pearsonr` etc. from the `scipy.stats` module and
+makes it usable for feature selection algorithms like :class:`SelectKBest`,
+:class:`SelectPercentile` etc.
+
+The following example illustrates it's usage:
+
+ >>> from sklearn.feature_selection import featurewise_scorer, SelectKBest
+ >>> from scipy.stats import spearmanr
+ >>> from sklearn.datasets import make_classification
+ >>> X, y = make_classification(random_state=0)
+ >>> skb = SelectKBest(featurewise_scorer(spearmanr, axis=0), k=10)
+ >>> skb.fit(X, y) #doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
+ SelectKBest(k=10, score_func=...)
+ >>> new_X = skb.transform(X)
+
+This wrapper function returns the absolute value of the scores i.e. a score of
In that case, should we add a parameter like absolute_score which takes
values True and False, if its True (default) absolute scores would be
considered.
What do you say?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8038 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz64oAICLDj5y4WxWJ2Ki31H8l2Bq1ks5tJIpFgaJpZM4LKAzM>
.
|
'abs' is a python function, so I went with absolute_score. |
nothing wrong with using a built-in function name as a kwarg, but okay.
|
sklearn/feature_selection/base.py
Outdated
@@ -131,6 +131,8 @@ def featurewise_scorer(score_func, **kwargs): | |||
Function taking arrays X and y, and returning a pair of arrays | |||
(scores, pvalues) or a single array with scores. This function is also | |||
allowed to take other parameters as input. | |||
absolute_score : bool | |||
If True (default), the absolute value of the scores are returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add that "this is useful when using correlation coefficients"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remain +.5 on this. I understand how it is helpful, but I suspect this kind of helper should not be in the library, but merely present as an example. Note that the case in the example here can be implemented as:
def featurwise_spearmanr(X, y):
scores, pvals = zip(*(spearmanr(x, y) for x in X.T))
return np.abs(scores), pvals
------- | ||
scores : array-like, shape (n_features,) | ||
Score values returned by the scoring function. | ||
p_vals : array-like, shape (n_features,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mark this dependent on the score function
sklearn/feature_selection/base.py
Outdated
Parameters | ||
---------- | ||
score_func : callable | ||
Function taking arrays X and y, and returning a pair of arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it just return a pair of numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change the entire description appropriately.
allowed to take other parameters as input. | ||
absolute_score : bool | ||
If True (default), the absolute value of the scores are returned, | ||
which is useful when using correlation coefficients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document kwargs also
X, y = make_classification(random_state=0) | ||
|
||
# spearmanr from scipy.stats | ||
skb = SelectKBest(featurewise_scorer(spearmanr, axis=0), k=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should really be testing the new function alone. We already have checked that selectkbest works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to test if the wrapper is working as it is supposed to be used. I will try to change the tests, testing the function alone.
@jnothman I am not able to comment on the necessity of this feature or its importance in the library. I think the other core devs might be able to help. |
I would close this one. It adds code to core for something I would expect a user to be able to write. I agree with @jnothman here. Feel free to reopen if you disagree. |
Fixes #6673
It adds a wrapper around scoring functions like 'mutual_info_classif', 'f_classif, etc. Takes as input (X, y) where y can be None.
I have worked on Issue2 as I felt that Issue1 has already been taken care of (Please correct me if I am wrong). Here, I have not added any extra tests to test this wrapper function. Also, its usage is a little different than that mentioned in the Issue2 example.