New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] BUG Avoid unexpected error in PCA when n_components='mle' #9886

Merged
merged 5 commits into from Oct 9, 2017

Conversation

Projects
None yet
3 participants
@qinhanmin2014
Member

qinhanmin2014 commented Oct 8, 2017

Reference Issue

Fixes #9884

What does this implement/fix? Explain your changes.

Currently, in python3.X, when we call PCA with large dataset (max(X.shape) > 500) and n_components='mle' without explicitly setting svd_solver='full', we will get an error.

Any other comments?

@jnothman

In the docstring for n_components it seems to suggest 'mle' only applies when 'full'. Please clarify it.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 8, 2017

Member

Last comment rescinded.

Member

jnothman commented Oct 8, 2017

Last comment rescinded.

@jnothman

Otherwise LGTM

Show outdated Hide outdated sklearn/decomposition/tests/test_pca.py
@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Oct 9, 2017

Member

@jnothman Thanks for the instant review :)

In the docstring for n_components it seems to suggest 'mle' only applies when 'full'. Please clarify it.

I have updated the document. Since the paragraph seems to be too long, I have splitted it into several parts.

2017-10-09_084300

I would also consider checking that the mle is correctly found.

We already have a test below it (test_pca_dim) to test the correctness of n_components_ set by mle with a simple case. It seems hard to calculate n_components_ set by mle in difficult case like current test. So it you still think we need to further test n_components_ set by mle, please give me some suggestions.

def test_pca_dim():
    # Check automated dimensionality setting
    rng = np.random.RandomState(0)
    n, p = 100, 5
    X = rng.randn(n, p) * .1
    X[:10] += np.array([3, 4, 5, 1, 2])
    pca = PCA(n_components='mle', svd_solver='full').fit(X)
    assert_equal(pca.n_components, 'mle')
    assert_equal(pca.n_components_, 1)
Member

qinhanmin2014 commented Oct 9, 2017

@jnothman Thanks for the instant review :)

In the docstring for n_components it seems to suggest 'mle' only applies when 'full'. Please clarify it.

I have updated the document. Since the paragraph seems to be too long, I have splitted it into several parts.

2017-10-09_084300

I would also consider checking that the mle is correctly found.

We already have a test below it (test_pca_dim) to test the correctness of n_components_ set by mle with a simple case. It seems hard to calculate n_components_ set by mle in difficult case like current test. So it you still think we need to further test n_components_ set by mle, please give me some suggestions.

def test_pca_dim():
    # Check automated dimensionality setting
    rng = np.random.RandomState(0)
    n, p = 100, 5
    X = rng.randn(n, p) * .1
    X[:10] += np.array([3, 4, 5, 1, 2])
    pca = PCA(n_components='mle', svd_solver='full').fit(X)
    assert_equal(pca.n_components, 'mle')
    assert_equal(pca.n_components_, 1)
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 9, 2017

Member

I only meant that it seems silly not to check that the number of components selected by 'full' and 'auto' in the mle case is identical.

Member

jnothman commented Oct 9, 2017

I only meant that it seems silly not to check that the number of components selected by 'full' and 'auto' in the mle case is identical.

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Oct 9, 2017

Member

@jnothman Thanks. Comments addressed :)
Sorry for misunderstanding your previous comment.

Member

qinhanmin2014 commented Oct 9, 2017

@jnothman Thanks. Comments addressed :)
Sorry for misunderstanding your previous comment.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 9, 2017

Member

LGTM

Member

jnothman commented Oct 9, 2017

LGTM

@jnothman jnothman changed the title from [MRG] BUG Avoid unexpected error in PCA when n_components='mle' to [MRG+1] BUG Avoid unexpected error in PCA when n_components='mle' Oct 9, 2017

qinhanmin2014 added some commits Oct 9, 2017

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Oct 9, 2017

Member

LGTM as well. Thanks for the fix!

Member

ogrisel commented Oct 9, 2017

LGTM as well. Thanks for the fix!

@ogrisel ogrisel merged commit 60e7e15 into scikit-learn:master Oct 9, 2017

6 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.16%)
Details
codecov/project 96.16% (+<.01%) compared to a0dfa30
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details

@ogrisel ogrisel added this to the 0.19.1 milestone Oct 9, 2017

@qinhanmin2014 qinhanmin2014 deleted the qinhanmin2014:pca_mle branch Oct 9, 2017

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Oct 9, 2017

Member

@jnothman @ogrisel Thanks a lot :)
Note : I have already provided an entry in v0.20 what's new

Member

qinhanmin2014 commented Oct 9, 2017

@jnothman @ogrisel Thanks a lot :)
Note : I have already provided an entry in v0.20 what's new

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Oct 9, 2017

Member

Yes, if we backport to 0.19.1 we will adjust the location of the what's new entry.

Member

ogrisel commented Oct 9, 2017

Yes, if we backport to 0.19.1 we will adjust the location of the what's new entry.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 9, 2017

Member
Member

jnothman commented Oct 9, 2017

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Oct 9, 2017

Member

ok so no backport for this one.

Member

ogrisel commented Oct 9, 2017

ok so no backport for this one.

@ogrisel ogrisel removed this from the 0.19.1 milestone Oct 9, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] BUG Avoid unexpected error in PCA when n_components='mle' (#9886
)

* n_components mle

* update doc

* improve

* update what's new

* update what's new

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG+1] BUG Avoid unexpected error in PCA when n_components='mle' (#9886
)

* n_components mle

* update doc

* improve

* update what's new

* update what's new
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment