Skip to content

SelectFdr has serious thresholding bug #2771

Closed
depristo opened this Issue Jan 19, 2014 · 5 comments

4 participants

@depristo

The current code reads like:

def _get_support_mask(self):
    alpha = self.alpha
    sv = np.sort(self.pvalues_)
    threshold = sv[sv < alpha * np.arange(len(self.pvalues_))].max()
    return self.pvalues_ <= threshold

But this doesn't actually control FDR at all, the correct implementation should have:

    bf_alpha = alpha / len(self.pvalues_)
    threshold = sv[sv < bf_alpha * np.arange(len(self.pvalues_))].max()

Note the k / m term in the equation at:
http://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure

@agramfort
scikit-learn member
@depristo

Sorry, I don't have a concrete test for it. But that implementation is just not correct, given the docstring saying it does Benjamini-Hochberg correction, as the term m (== len(self.pvalues_) here) is completely missing from the equation.

@jnothman
scikit-learn member
@depristo
@agramfort
scikit-learn member
@ajtulloch ajtulloch added a commit to ajtulloch/scikit-learn that referenced this issue Mar 4, 2014
@ajtulloch ajtulloch [Feature Selection] Fix SelectFDR thresholding bug (#2771)
From scikit-learn#2771, we were
not correctly scaling the alpha
parameter (http://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
with the number of features (== hypothesis).  Thus, the alpha
parameter was not invariant wrt the number of features.

The correction is as suggested in the original issue, and a test has
been added that verifies that for various numbers of features, an
appropriate false discovery rate is generated when using the selector.
321fbcc
@amueller amueller added this to the 0.15.1 milestone Jul 18, 2014
@amueller amueller added a commit to amueller/scikit-learn that referenced this issue Jan 22, 2015
@ajtulloch ajtulloch [Feature Selection] Fix SelectFDR thresholding bug (#2771)
From scikit-learn#2771, we were
not correctly scaling the alpha
parameter (http://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
with the number of features (== hypothesis).  Thus, the alpha
parameter was not invariant wrt the number of features.

The correction is as suggested in the original issue, and a test has
been added that verifies that for various numbers of features, an
appropriate false discovery rate is generated when using the selector.
7e61d07
@amueller amueller pushed a commit to amueller/scikit-learn that referenced this issue Feb 7, 2015
@ajtulloch ajtulloch [Feature Selection] Fix SelectFDR thresholding bug (#2771)
From scikit-learn#2771, we were
not correctly scaling the alpha
parameter (http://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
with the number of features (== hypothesis).  Thus, the alpha
parameter was not invariant wrt the number of features.

The correction is as suggested in the original issue, and a test has
been added that verifies that for various numbers of features, an
appropriate false discovery rate is generated when using the selector.
0850789
@amueller amueller pushed a commit to amueller/scikit-learn that referenced this issue Feb 7, 2015
@ajtulloch ajtulloch [Feature Selection] Fix SelectFDR thresholding bug (#2771)
From scikit-learn#2771, we were
not correctly scaling the alpha
parameter (http://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
with the number of features (== hypothesis).  Thus, the alpha
parameter was not invariant wrt the number of features.

The correction is as suggested in the original issue, and a test has
been added that verifies that for various numbers of features, an
appropriate false discovery rate is generated when using the selector.
1e03484
@amueller amueller added a commit to amueller/scikit-learn that referenced this issue Feb 24, 2015
@ajtulloch ajtulloch [Feature Selection] Fix SelectFDR thresholding bug (#2771)
From scikit-learn#2771, we were
not correctly scaling the alpha
parameter (http://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
with the number of features (== hypothesis).  Thus, the alpha
parameter was not invariant wrt the number of features.

The correction is as suggested in the original issue, and a test has
been added that verifies that for various numbers of features, an
appropriate false discovery rate is generated when using the selector.
1add39f
@amueller amueller closed this in #4146 Feb 24, 2015
@rasbt rasbt added a commit to rasbt/scikit-learn that referenced this issue Apr 6, 2015
@ajtulloch ajtulloch [Feature Selection] Fix SelectFDR thresholding bug (#2771)
From scikit-learn#2771, we were
not correctly scaling the alpha
parameter (http://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
with the number of features (== hypothesis).  Thus, the alpha
parameter was not invariant wrt the number of features.

The correction is as suggested in the original issue, and a test has
been added that verifies that for various numbers of features, an
appropriate false discovery rate is generated when using the selector.
2ad697c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.