Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Issue#6673:Make a wrapper around functions that score an individual feature #8038

Closed
wants to merge 38 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
a491eb1
Issue#6673
amanp10 Dec 11, 2016
d5bcc43
Issue#6673
amanp10 Dec 12, 2016
29c62d6
Merge branch 'master' of https://github.com/amanp10/scikit-learn into…
amanp10 Dec 12, 2016
3c6ce4d
Merge branch 'master' of https://github.com/amanp10/scikit-learn into…
amanp10 Dec 13, 2016
9ebd8eb
Merge branch 'master' of https://github.com/amanp10/scikit-learn into…
amanp10 Dec 14, 2016
08a459c
Merge branch 'master' of https://github.com/amanp10/scikit-learn into…
amanp10 Dec 18, 2016
d402b9c
Issue#6673 Removed some errors in previous PR
amanp10 Dec 24, 2016
08129f7
Merge branch 'master' of https://github.com/amanp10/scikit-learn into…
amanp10 Dec 24, 2016
c7341ed
Wrapper for scoring function added
amanp10 Jan 9, 2017
dec0bfd
Merge branch 'master' of https://github.com/amanp10/scikit-learn into…
amanp10 Jan 9, 2017
aef38db
Unit test added, whats_new.rst updated
amanp10 Jan 11, 2017
92660d5
Doctests and flake8 errors corrected
amanp10 Jan 11, 2017
6becbfe
Build errors corrected
amanp10 Jan 11, 2017
00b93e7
Suggested Corrections
amanp10 Jan 15, 2017
b57f9fe
Merge remote-tracking branch 'upstream/master' into newbranch
amanp10 Jan 26, 2017
f642178
Suggested changes
amanp10 Jan 26, 2017
30363f2
Build error corrections
amanp10 Jan 27, 2017
efd93d5
Merge branch 'master' of https://github.com/amanp10/scikit-learn into…
amanp10 Jan 27, 2017
3ed84e1
Merge branch 'master' of https://github.com/amanp10/scikit-learn into…
amanp10 Jan 29, 2017
08d3373
Merge branch 'master' into newbranch
amanp10 Feb 10, 2017
5ff8aae
Merge branch 'master' into newbranch
amanp10 Apr 6, 2017
5651f75
Resolving Conflicts
amanp10 May 25, 2017
30ebf96
Merge branch 'master' into newbranch
amanp10 Jun 14, 2017
4a2f098
Resolved_Conflicts_22072017
amanp10 Jun 22, 2017
7222458
Resolving_Conflicts2
amanp10 Jun 22, 2017
d23d9fd
Resolving_Conflicts3
amanp10 Jun 22, 2017
5f4ccbf
Resolve Conflict, Build error correction
amanp10 Sep 17, 2017
ea7e362
circleci error correction: plot_stock_market.py
amanp10 Dec 1, 2017
4ea8a8d
Merge branch 'master' into newbranch
amanp10 Dec 2, 2017
0fef58c
Travisci error correction
amanp10 Dec 2, 2017
becef97
Improve Coverage
amanp10 Dec 3, 2017
ea794a9
Merge branch 'newbranch' of https://github.com/amanp10/scikit-learn i…
amanp10 Dec 3, 2017
28cff94
Review Changes
amanp10 Jan 9, 2018
b91a5b7
Review Changes
amanp10 Jan 9, 2018
79b9418
Optional absolute score values
amanp10 Jan 10, 2018
8dd85d7
absolute_score description
amanp10 Jan 11, 2018
100649c
Review Changes
amanp10 Jan 12, 2018
9520d69
Merge branch 'master' into newbranch
amanp10 Dec 21, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -567,6 +567,7 @@ From text
feature_selection.chi2
feature_selection.f_classif
feature_selection.f_regression
feature_selection.featurewise_scorer
feature_selection.mutual_info_classif
feature_selection.mutual_info_regression

Expand Down
29 changes: 29 additions & 0 deletions doc/modules/feature_selection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,35 @@ samples for accurate estimation.

* :ref:`sphx_glr_auto_examples_feature_selection_plot_f_test_vs_mi.py`

Wrapper for using SciPy score functions
---------------------------------------

The score functions in `scipy.stats` work on feature vectors i.e. comparing
each column in the input samples ``X`` with the target vector ``y``. Whereas,
here we need scoring functions that work on the input samples ``X`` as a whole
matrix, comparing it to the target vector ``y``. This makes it difficult for
us to use feature selection algorithms like :class:`SelectKBest` with score
functions from `scipy.stats`.
:func:`featurewise_scorer` is a wrapper function which wraps around scoring
functions like `spearmanr`, `pearsonr` etc. from the `scipy.stats` module and
makes it usable for feature selection algorithms like :class:`SelectKBest`,
:class:`SelectPercentile` etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify that it compares each column of X to y


The following example illustrates its usage:

>>> from sklearn.feature_selection import featurewise_scorer, SelectKBest
>>> from scipy.stats import spearmanr
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(random_state=0)
>>> skb = SelectKBest(featurewise_scorer(spearmanr, axis=0), k=10)
>>> skb.fit(X, y) #doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
SelectKBest(k=10, score_func=...)
>>> new_X = skb.transform(X)

This wrapper function returns the absolute value of the scores by default
i.e. a score of +1 is same as -1. For unchanged score values set
``absolute_score=False``.

.. _rfe:

Recursive feature elimination
Expand Down
3 changes: 3 additions & 0 deletions sklearn/feature_selection/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@

from .mutual_info_ import mutual_info_regression, mutual_info_classif

from .base import featurewise_scorer


__all__ = ['GenericUnivariateSelect',
'RFE',
Expand All @@ -39,5 +41,6 @@
'f_classif',
'f_oneway',
'f_regression',
'featurewise_scorer',
'mutual_info_classif',
'mutual_info_regression']
75 changes: 74 additions & 1 deletion sklearn/feature_selection/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from scipy.sparse import issparse, csc_matrix

from ..base import TransformerMixin
from ..utils import check_array, safe_mask
from ..utils import check_array, check_X_y, safe_mask
from ..externals import six


Expand Down Expand Up @@ -120,3 +120,76 @@ def inverse_transform(self, X):
Xt = np.zeros((X.shape[0], support.size), dtype=X.dtype)
Xt[:, support] = X
return Xt


def featurewise_scorer(score_func, absolute_score=True, **kwargs):
""" A wrapper function around score functions.

Parameters
----------
score_func : callable
Function taking two 1-d arrays (feature vector and target vector) and
returning a pair of values (score, p-value) or just a score.
absolute_score : bool
If True (default), the absolute value of the scores are returned,
which is useful when using correlation coefficients.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document kwargs also

kwargs : keyword arguments
Keyword arguments(comma separated) passed to the score function
`score_func`.

Returns
-------
scores : array-like, shape (n_features,)
Score values returned by the scoring function.
p_vals : array-like, shape (n_features,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark this dependent on the score function

The set of p-values returned by the scoring function. However, it is
dependent on the score function `score_func` whether it returns
p-values or just scores.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add docs of what this function returns.

Notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this docstring, maybe there's a disconnect of how you think this function wrapper would be useful. It would help you - and people using your code - to provide an example of how they should expect to use it.

-----
This wrapper function wraps around scoring functions like `spearmanr`,
`pearsonr` etc. from the `scipy.stats` module and makes it usable for
feature selection algorithms like `SelectKBest`. Also, this wrapper
function returns the absolute value of the scores by default i.e. a score
of +1 is same as -1. For unchanged score values set `absolute_score=False`.

Example
-------
>>> from sklearn.feature_selection import featurewise_scorer, SelectKBest
>>> from scipy.stats import spearmanr
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(random_state=0)
>>> skb = SelectKBest(featurewise_scorer(spearmanr, axis=0), k=10)
>>> skb.fit(X, y) #doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
SelectKBest(k=10, score_func=...)
>>> new_X = skb.transform(X)

"""
def call_scorer(X, y):
X, y = check_X_y(X, y, ('csr', 'csc'), multi_output=True)

scores = []
p_vals = []

for i in six.moves.range(X.shape[1]):
score_func_ret = score_func(X[:, i], y, **kwargs)

if isinstance(score_func_ret, tuple):
score, p_val = score_func_ret
p_vals.append(p_val)
else:
score = score_func_ret

if absolute_score:
score = abs(score)
scores.append(score)

scores = np.asarray(scores)
if len(p_vals) > 0:
p_vals = np.asarray(p_vals)
return (scores, p_vals)
else:
return scores

return call_scorer
39 changes: 38 additions & 1 deletion sklearn/feature_selection/tests/test_base.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
import numpy as np
from scipy import sparse as sp
from scipy.stats import spearmanr

from numpy.testing import assert_array_equal

from sklearn.datasets import make_classification
from sklearn.base import BaseEstimator
from sklearn.feature_selection.base import SelectorMixin
from sklearn.feature_selection.base import featurewise_scorer, SelectorMixin
from sklearn.feature_selection import SelectKBest
from sklearn.utils import check_array
from sklearn.utils.testing import assert_raises, assert_equal

Expand Down Expand Up @@ -113,3 +116,37 @@ def test_get_support():
sel.fit(X, y)
assert_array_equal(support, sel.get_support())
assert_array_equal(support_inds, sel.get_support(indices=True))


def test_featurewise_scorer():
X, y = make_classification(random_state=0)

# spearmanr from scipy.stats with SelectKBest
skb = SelectKBest(featurewise_scorer(spearmanr, axis=0), k=10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should really be testing the new function alone. We already have checked that selectkbest works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to test if the wrapper is working as it is supposed to be used. I will try to change the tests, testing the function alone.

skb.fit(X, y)
new_X = skb.transform(X)
assert_equal(new_X.shape[1], 10)

# Using custom score function returning only scores
score1 = featurewise_scorer(lambda *args, **kwargs:
spearmanr(*args, **kwargs)[0], axis=0)(X, y)
score2, pval = featurewise_scorer(spearmanr, axis=0)(X, y)
assert_array_equal(score1, score2)

# Test keyword argument absolute_score
score_integer, pval = featurewise_scorer(spearmanr, absolute_score=False,
axis=0)(X, y)
assert_array_equal(abs(score_integer), score2)


def test_featurewise_scorer_list_input():
# Test featurewise_scorer for input X and y as lists.
X, y = make_classification(random_state=0)
score_arr, pval_arr = featurewise_scorer(spearmanr, axis=0)(X, y)

X = X.tolist() # convert X from array to list
y = y.tolist() # convert y from array to list
score_list, pval_list = featurewise_scorer(spearmanr, axis=0)(X, y)

assert_array_equal(score_arr, score_list)
assert_array_equal(pval_arr, pval_list)