New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] SPCA centering and fixing scaling issue #11585

Merged
merged 10 commits into from Jul 20, 2018

Conversation

Projects
None yet
7 participants
@FollowKenny
Contributor

FollowKenny commented Jul 17, 2018

Reference Issues/PRs

fixes #9394

What does this implement/fix? Explain your changes.

@andrewww reported a scaling issue with SPCA : the transform scaling was depending on the number of samples. Investigating this issue we found several things that were disturbing :

  • In the transform of SPCA the last lines were consisting in a peculiar normalization sequence. These lines were the cause of the issue.
  • The SPCA with alpha=0 and alpha_ridge=0 and the standard PCA were giving inconsistent results.
  • The data were not centred during the fit.
  • The data were not centred either during the transform.
  • The components were not normalized. I'm still not sure that they should be but from what I saw in this paper I'm inclined to say yes but I did not read it thoroughly yet and I did not cross sources.

This PR aims to fix all that. With all the fixes implemented, the scaling issue disappear and the transform from PCA and SPCA(alpha=0, ridge_alpha=0) gives exactly the same results.

Any other comments?

TODO :

  • Decide on the normalization strategy (read more thoroughly)
  • Implement the fixes inside a deprecation path
  • Implement the fixes for MiniBatchSparsePCA
  • Implement scaling test (from the bug report)
  • Implement PCA vs SPCA(0, 0) test (to check the new behaviour is correct)

Ivan PANICO added some commits Jul 17, 2018

@FollowKenny

This comment has been minimized.

Show comment
Hide comment
@FollowKenny

FollowKenny Jul 17, 2018

Contributor

@agramfort Rdy for first review (if tests pass)

Contributor

FollowKenny commented Jul 17, 2018

@agramfort Rdy for first review (if tests pass)

@FollowKenny FollowKenny changed the title from [WIP] SPCA centering and fixing scaling issue to [MRG] SPCA centering and fixing scaling issue Jul 17, 2018

Ivan PANICO
@massich

would it normalize_components be removed in version 0.22 and kept to true?

Or would that be a second deprecation cycle, and removed in 0.24?

if removed in 0.22, it should be stated in the deprecation messages.

Show outdated Hide outdated sklearn/decomposition/sparse_pca.py Outdated
@GaelVaroquaux

Aside from the small comments I made (including fixing travis), there needs to be an entry added to whats_new. This is an important change.

@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Jul 17, 2018

Contributor

I am confused here. Do we actually want to keep the buggy behaviour?

Contributor

glemaitre commented Jul 17, 2018

I am confused here. Do we actually want to keep the buggy behaviour?

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jul 17, 2018

Member
Member

GaelVaroquaux commented Jul 17, 2018

Show outdated Hide outdated sklearn/decomposition/sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/sparse_pca.py Outdated
@FollowKenny

This comment has been minimized.

Show comment
Hide comment
@FollowKenny

FollowKenny Jul 17, 2018

Contributor

All right. I’m off for tonight but I’ll take care of it tomorow evening if that’s all right. I’ll take all the comments into account and update the what’s new. Are we still ok with the double deprecation path we chose initialy (0.20 deprecate Default option False, 0.22 deprecate param and change the default to true, 0.24 delete param) ?

Contributor

FollowKenny commented Jul 17, 2018

All right. I’m off for tonight but I’ll take care of it tomorow evening if that’s all right. I’ll take all the comments into account and update the what’s new. Are we still ok with the double deprecation path we chose initialy (0.20 deprecate Default option False, 0.22 deprecate param and change the default to true, 0.24 delete param) ?

Ivan PANICO
@FollowKenny

This comment has been minimized.

Show comment
Hide comment
@FollowKenny

FollowKenny Jul 18, 2018

Contributor

@GaelVaroquaux @glemaitre @massich Is travis supposed to fail on Depreciation warnings ? Because I think excepted some last changes, that's the last step

Contributor

FollowKenny commented Jul 18, 2018

@GaelVaroquaux @glemaitre @massich Is travis supposed to fail on Depreciation warnings ? Because I think excepted some last changes, that's the last step

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jul 19, 2018

Member

Yes, travis is supposed to fail on uncaught deprecation warnings. You should catch them in the test, using warning.simplefilter or @pytest.mark.filterwarnings

Member

GaelVaroquaux commented Jul 19, 2018

Yes, travis is supposed to fail on uncaught deprecation warnings. You should catch them in the test, using warning.simplefilter or @pytest.mark.filterwarnings

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jul 19, 2018

Member

maths are correct (i think)

@FollowKenny you need to make sure now that normalize_components is set to True in all examples and documentation pages so no deprecation warning pops up.

Member

agramfort commented Jul 19, 2018

maths are correct (i think)

@FollowKenny you need to make sure now that normalize_components is set to True in all examples and documentation pages so no deprecation warning pops up.

Ivan PANICO added some commits Jul 19, 2018

@FollowKenny

This comment has been minimized.

Show comment
Hide comment
@FollowKenny

FollowKenny Jul 19, 2018

Contributor

I used git grep "SparsePCA(" and it only showed examples/decomposition/plot_faces_decomposition.py. Am I missing something or is it safe to assume this is the only doc update ?
Tests running

Contributor

FollowKenny commented Jul 19, 2018

I used git grep "SparsePCA(" and it only showed examples/decomposition/plot_faces_decomposition.py. Am I missing something or is it safe to assume this is the only doc update ?
Tests running

Ivan PANICO

@GaelVaroquaux GaelVaroquaux changed the title from [MRG] SPCA centering and fixing scaling issue to [MRG+1] SPCA centering and fixing scaling issue Jul 20, 2018

@GaelVaroquaux

LGTM.

+1 for merge

@GaelVaroquaux GaelVaroquaux added the Bug label Jul 20, 2018

@GaelVaroquaux GaelVaroquaux added this to the 0.20 milestone Jul 20, 2018

@glemaitre

Couple of nitpicks before merging

Show outdated Hide outdated sklearn/decomposition/sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/tests/test_sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/tests/test_sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/tests/test_sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/tests/test_sparse_pca.py Outdated
Show outdated Hide outdated sklearn/decomposition/tests/test_sparse_pca.py Outdated
Ivan PANICO
@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Jul 20, 2018

Contributor

Waiting for the CI to be green

Contributor

glemaitre commented Jul 20, 2018

Waiting for the CI to be green

@glemaitre glemaitre merged commit 6eb1983 into scikit-learn:master Jul 20, 2018

7 checks passed

ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: python2 Your tests passed on CircleCI!
Details
ci/circleci: python3 Your tests passed on CircleCI!
Details
codecov/patch 95.12% of diff hit (within 1% threshold of 95.37%)
Details
codecov/project 95.31% (-0.06%) compared to 58fa28e
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Jul 20, 2018

Contributor

@FollowKenny Thanks a lot!!!!

Contributor

glemaitre commented Jul 20, 2018

@FollowKenny Thanks a lot!!!!

@FollowKenny

This comment has been minimized.

Show comment
Hide comment
@FollowKenny

FollowKenny Jul 20, 2018

Contributor

Awesome! @agramfort @GaelVaroquaux @glemaitre @massich Thanks for your help!

Contributor

FollowKenny commented Jul 20, 2018

Awesome! @agramfort @GaelVaroquaux @glemaitre @massich Thanks for your help!

@@ -182,6 +183,14 @@ Decomposition, manifold learning and clustering
This applies to the dictionary and sparse code.
:issue:`6374` by :user:`John Kirkham <jakirkham>`.
- :class:`decomposition.SparsePCA` now exposes ``normalize_components``. When
set to True, the train and test data are centered with the train mean
repsectively during the fit phase and the transform phase. This fixes the

This comment has been minimized.

@amueller

amueller Jul 21, 2018

Member

respectively

@amueller

amueller Jul 21, 2018

Member

respectively

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 21, 2018

Member

I'm surprised we do a backward deprecation for a bug. We haven't really done that in the past and I'm -0 on it. @jnothman you have an opinion?
I'd rather remove the deprecation (does that count as talking you out of it @GaelVaroquaux).
My main argument would be consistency with how we usually do things, which is break behavior if we consider it a bug. I haven't looked at the issue, though from the description it sounds like a bug.

Member

amueller commented Jul 21, 2018

I'm surprised we do a backward deprecation for a bug. We haven't really done that in the past and I'm -0 on it. @jnothman you have an opinion?
I'd rather remove the deprecation (does that count as talking you out of it @GaelVaroquaux).
My main argument would be consistency with how we usually do things, which is break behavior if we consider it a bug. I haven't looked at the issue, though from the description it sounds like a bug.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jul 21, 2018

Member

Sure, if you want to remove the deprecation, it does count as talking me out of it.

It's really a change in behavior, but the previous behavior had no statistical meaning whatsoever.

Member

GaelVaroquaux commented Jul 21, 2018

Sure, if you want to remove the deprecation, it does count as talking me out of it.

It's really a change in behavior, but the previous behavior had no statistical meaning whatsoever.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jul 21, 2018

Member

@FollowKenny : thanks a lot for this work. It was hard and important.

Member

GaelVaroquaux commented Jul 21, 2018

@FollowKenny : thanks a lot for this work. It was hard and important.

@FollowKenny FollowKenny deleted the FollowKenny:spca branch Jul 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment