Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Incorrect implementation of explained_variance_ in PCA #9105

Merged
merged 9 commits into from Jun 21, 2017
Merged

[MRG+2] Incorrect implementation of explained_variance_ in PCA #9105

merged 9 commits into from Jun 21, 2017

Conversation

@qinhanmin2014
Copy link
Member

@qinhanmin2014 qinhanmin2014 commented Jun 11, 2017

Reference Issue

Fixes #7699
Mentioned in #8541 (the 5th comment by IainStrachan on 11 May)
Mentioned in #8544 (the 6th comment by IainStrachan on 11 May)

What does this implement/fix? Explain your changes.

PCA.explained_variance_ is incorrectly implemented. The test is also wrong so that the mistake is not detected.

The result of explained_variance_ is wrong.

result from python:
04
result from R:
01
result according to the definition:
02

Reason for the incorrect implementation.

(1)In the code, explained_variance_ = (S ** 2) / n_samples should be explained_variance_ = (S ** 2) / (n_samples - 1). Because when using SVD to calculate PCA, we get the eigenvalue of A'A(A' means the transpose of A), but what we need is the eigenvalue of A'A/(n_samples - 1) , which is equivalent to the covariance matrix of A.
(2)In the test, we simply use np.var to calculate the variance. That's incorrect. We should use np.cor and pick the elements on the diagonal or set ddof=1 .

Any other comments?

I provide two ways for the new test, one based on current version, another based on the definition.
Please take some time to consider it. Thanks.

@qinhanmin2014 qinhanmin2014 changed the title [WIP] Incorrect implementation of explained_variance_ in PCA [MRG] Incorrect implementation of explained_variance_ in PCA Jun 11, 2017

# X_rpca = rpca.transform(X)
# assert_array_almost_equal(rpca.explained_variance_,
# expected_result, decimal=1)

This comment has been minimized.

@agramfort

agramfort Jun 12, 2017
Member

uncomment or remove

This comment has been minimized.

@qinhanmin2014

qinhanmin2014 Jun 12, 2017
Author Member

@agramfort Thanks. From my perspective, I think the test is necessary because it is based on the original definition, so I uncomment it, but I would like to know your advise.

@agramfort
Copy link
Member

@agramfort agramfort commented Jun 12, 2017

+1 for MRG when CIs are happy

@agramfort agramfort changed the title [MRG] Incorrect implementation of explained_variance_ in PCA [MRG+1] Incorrect implementation of explained_variance_ in PCA Jun 12, 2017
@jnothman jnothman added this to the 0.19 milestone Jun 18, 2017
@amueller
Copy link
Member

@amueller amueller commented Jun 19, 2017

This is issue #7699 right? I thought we were ok with the current implementation? I guess we changed our opinion? I don't have a strong opinion.

@amueller
Copy link
Member

@amueller amueller commented Jun 19, 2017

@qinhanmin2014 I think when looking at #7699 I checked the definition of PCA, and I don't think any of the books I looked at mentioned the bessel correction. In particular Elements of Statistical Learning, which is my standard reference. Can you provide a reference with your definition, other than prcomp?

@qinhanmin2014
Copy link
Member Author

@qinhanmin2014 qinhanmin2014 commented Jun 19, 2017

@amueller Thanks for your instruction, this issue is indeed related to #7699

(1)I think the problem is not whether to use bessel correction. The main point is that the variance is supposed to be equal to the eigenvalue of the covariance matrix. I propose an additional test to prove my implementation.
(2)It seems that Elements of Statistical Learning simply use SVD to calculate PCA and say nothing about the covariance matrix and the calculation of variance.
(3)R, Matlab(princomp) implemented in the same way as the pull request.
(4)Many papers and books also implemented in the same way as the pull request.(e.g. https://arxiv.org/abs/1404.1100 (P11-P12), Machine Learning in Action (https://github.com/pbharrin/machinelearninginaction/blob/master/Ch13/pca.py)). but I currently can't find one which implemented in the same way as sklearn.

What's more, from my perspective, even if we dont't find any strong evidence, it may be better to ensure that sklearn behave the same as others.

@amueller
Copy link
Member

@amueller amueller commented Jun 20, 2017

Oh wow, I misread the covariance test. Never mind, that test is clearly correct.
LGTM. Could you please add an entry to whats_new.rst in the bugfix section?

@amueller amueller changed the title [MRG+1] Incorrect implementation of explained_variance_ in PCA [MRG+2] Incorrect implementation of explained_variance_ in PCA Jun 20, 2017
@agramfort
Copy link
Member

@agramfort agramfort commented Jun 20, 2017

@qinhanmin2014 please update what's new and let's merge

@qinhanmin2014
Copy link
Member Author

@qinhanmin2014 qinhanmin2014 commented Jun 21, 2017

@amueller @agramfort Finished. I also revert #7843 since these statements in the document are no longer needed. Thanks.

@amueller amueller merged commit 2a36ff1 into scikit-learn:master Jun 21, 2017
5 checks passed
5 checks passed
ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.27%)
Details
codecov/project 96.3% (+0.03%) compared to 689f412
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@amueller
Copy link
Member

@amueller amueller commented Jun 21, 2017

thanks :)

@qinhanmin2014 qinhanmin2014 deleted the qinhanmin2014:my-feature-1 branch Jun 21, 2017
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants