LinearDiscriminantAnalysis predict probability bug #6848

agamemnonc · 2016-06-01T16:48:08Z

I am pretty confident there is a bug introduced in commit
7c1101d

Concretely, line 518 of the current version
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/discriminant_analysis.py
should be removed as it yields wrong results.

There is no reason why constant 1 should be added to the computed probability after exponentiation and before inversion.

To verify this, I have run a one-to-one comparison between the outcome of the method and MATLAB's builtin LDA classifier on the Iris dataset. Only after removal of line 518, results match (up to a tolerance).

If everyone agrees on that, I am happy to submit a PR.

The text was updated successfully, but these errors were encountered:

ghost · 2016-07-17T20:47:32Z

Hi Agamemnon,

I am currently looking at this issue. I agree with you that there is a bug but I do not think it is related to adding the constant 1. I think that the code for LDA works fine for dataset with two classes but it uses an incorrect formula for three or more classes. Do you mind sharing your MATLAB's results for the Iris dataset? A Gist would be great.

Thanks.

agamemnonc · 2016-07-18T10:51:05Z

MATLAB snippet (version R2016a)
https://gist.github.com/agamemnonc/4c9ddb472cd7c3ac84718cee1d2f6ce6

Output:
0.0000 0.9993 0.0007

Python/sklearn snippet (sklearn version 0.17.1)
https://gist.github.com/agamemnonc/0d668ae2552188b73958be7961a7e9e3

Output (currently)
[ 0.33333333 0.33333333 0.33333333]

Output (when line in question removed)
[ 4.69392059e-20 9.99390132e-01 6.09867606e-04]

ghost · 2016-07-18T22:39:16Z

Great. I appreciate your info. This is very helpful. Can you send me your output for a couple of more values? In particular, 23 and 101? Thanks a lot.

agamemnonc · 2016-07-19T09:43:48Z

Sure, no problem.

Data point 24 (please note MATLAB uses 1-based indexing, so this is 23 in Python)
[1.0000 0.0000 0.0000]

Data point 102 (101 in Python)
[0.0000 0.0011 0.9989]

From a theoretical point of view, I still believe that the line prob += 1 is the one causing the trouble. decision_function(x) returns the log-probability up to an additive constant. Then, we only need to exponentiate to get the unnormalized probability distribution (i.e probability distribution up to a multiplicative constant due to the exponentiation).

prob *= -1 followed by np.reciprocal(prob, prob) is used to avoid overflow issues I believe.

In mathematical terms.

As you can hopefully see in the above formulae adding a constant to the posterior probability is not required and indeed it affects the results.

ghost · 2016-07-21T16:54:59Z

Hi, thanks again.

In your notation, what do you mean by π and δ? Isn't π the ratio of the two posterior probabilities? If so, where do you assume its dependency with respect to the classes?
Have you looked at the documentation?

I do have a solution which does not involve removing +1 but modifying the formula per the case of more than three classes. I am busy right now but I will work on it as soon as I have time. Thanks.

agamemnonc · 2016-07-25T09:55:30Z

Sorry, I should have made it more explicit. I follow the notation of the textbook "The elements of statistical learning" by Hastie, Tibshirani and Friedman which is freely available for download (pdf) here
https://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf

δ (it should be δ_k) is the discriminant function for class k, i.e. the element of the vector returned by the decision_function() method which corresponds to class k. Please refer to 4.10 (p 109) of the textbook.

By π_k I meant the posterior probability for class k (it is not a fraction). However in the textbook π_k is used to denote the prior probability so apologies about the confusion. From now on, I denote the prior with π_k and the posterior p(y=k|x). (I' ve also edited my comment above to reflect this).

In the multiclass case the posterior probabilities are given by

p(y=k|x) = e ^ {δ_k(x)} / sum(e ^ {δ_k(x)}).

In the binary classification case, you can replace p(y=1|x) with 1 - p(y=0|x) and then substitute in the fraction (which only makes sense in the binary case) (eq 4.9 of textbook) and I suspect that this is where the +1 comes from. But in the multiclass case we can't do that and I think since it generalises the binary case it is more elegant.

ogrisel · 2016-09-06T07:49:40Z

The += 1 (alone) is not the problem. If you remove that line the tests fail with negative probability values...

The problem likely stems from the implementation of the decision_function from LinearClassifierMixin (which depends on the standard coef_ and intercept_ attributes) which is probably not directly equivalent to the traditional LDA parameterization of the log likelihood ratio of the positive class w.r.t. another fixed class. More work is needed to workout the correct normalization while keeping the LinearClassifierMixin parameterization which is useful to get LDA models to behave consistently like other sklearn linear models.

xavierbourretsicotte · 2018-06-18T13:27:55Z

Hi guys, is this bug still open? I confirm that I get bizarre results using LDA with 3 classes on the IRIS dataset (X1 (horizontal axis) = sepal_length, X2 (vertical axis) = sepal_width)

In particular, the covariance matrix seems to be off: for example the covariance matrix given by lda.covariance_ = array([[0.259708 , 0.09086667], [0.09086667, 0.11308 ]])

and the covariance given by covariance.empirical_covariance(X) =
array([[ 0.68112222, -0.04215111], [-0.04215111, 0.18871289]])

Which is what I get manually and with pandas.cov.

Since the covariance is off, the gaussian estimate is also off and hence the predictions ? All the best

agramfort · 2018-06-18T15:41:16Z

the cov in LDA is computed after centering with the class mean. So it's not the cov of the full X. ok?

agamemnonc · 2018-08-11T19:27:19Z

I have now submitted a PR which I think solves the issue: #11796

Do we perhaps want to include additional tests checking the output of predict_proba for LDA and QDA both for the binary and multi-class cases?

amueller added the Bug label Jul 15, 2016

amueller added this to the 0.18 milestone Jul 15, 2016

amueller modified the milestones: 0.18, 0.19 Sep 22, 2016

amueller modified the milestone: 0.19 Sep 29, 2016

jnothman modified the milestones: 0.20, 0.19 Jun 13, 2017

jnothman changed the title ~~LDA predict probability bug~~ LinearDiscriminantAnalysis predict probability bug Jun 14, 2017

rth mentioned this issue Jun 14, 2017

LatentDirichletAllocation Perplexity too big on Wiki dump #8943

Open

glemaitre modified the milestones: 0.20, 0.21 Jun 13, 2018

agamemnonc mentioned this issue Aug 11, 2018

[MRG+2] Fix LDA predict_proba() #11796

Merged

agramfort closed this as completed in #11796 Mar 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LinearDiscriminantAnalysis predict probability bug #6848

LinearDiscriminantAnalysis predict probability bug #6848

agamemnonc commented Jun 1, 2016

ghost commented Jul 17, 2016

agamemnonc commented Jul 18, 2016 •

edited

ghost commented Jul 18, 2016

agamemnonc commented Jul 19, 2016 •

edited

ghost commented Jul 21, 2016

agamemnonc commented Jul 25, 2016

ogrisel commented Sep 6, 2016 •

edited

xavierbourretsicotte commented Jun 18, 2018

agramfort commented Jun 18, 2018 via email

agamemnonc commented Aug 11, 2018 •

edited

LinearDiscriminantAnalysis predict probability bug #6848

LinearDiscriminantAnalysis predict probability bug #6848

Comments

agamemnonc commented Jun 1, 2016

ghost commented Jul 17, 2016

agamemnonc commented Jul 18, 2016 • edited

ghost commented Jul 18, 2016

agamemnonc commented Jul 19, 2016 • edited

ghost commented Jul 21, 2016

agamemnonc commented Jul 25, 2016

ogrisel commented Sep 6, 2016 • edited

xavierbourretsicotte commented Jun 18, 2018

agramfort commented Jun 18, 2018 via email

agamemnonc commented Aug 11, 2018 • edited

agamemnonc commented Jul 18, 2016 •

edited

agamemnonc commented Jul 19, 2016 •

edited

ogrisel commented Sep 6, 2016 •

edited

agamemnonc commented Aug 11, 2018 •

edited