Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Allow sparse input to incremental PCA #13960

Conversation

@scottgigante
Copy link
Contributor

scottgigante commented May 27, 2019

Reference Issues/PRs

Fixes #13957.

What does this implement/fix? Explain your changes.

Implements sparse input for IncrementalPCA. IncrementalPCA is by design suited to accepting sparse input; this allows the input to be sparse, and if it is so, converts the data to dense on a batch-wise basis.

@scottgigante scottgigante force-pushed the scottgigante:feature/incremental_pca_sparse_input branch 4 times, most recently from 880a7a3 to 17d7248 May 27, 2019
@abhinavsagar

This comment has been minimized.

Copy link
Contributor

abhinavsagar commented May 28, 2019

Looks good to me.

@scottgigante scottgigante changed the title Allow sparse input to incremental PCA [MRG] Allow sparse input to incremental PCA May 28, 2019
doc/modules/decomposition.rst Outdated Show resolved Hide resolved
sklearn/decomposition/incremental_pca.py Show resolved Hide resolved
sklearn/decomposition/incremental_pca.py Outdated Show resolved Hide resolved
@scottgigante scottgigante force-pushed the scottgigante:feature/incremental_pca_sparse_input branch from 17d7248 to dc276bf May 28, 2019
@scottgigante scottgigante changed the title [MRG] Allow sparse input to incremental PCA [WIP] Allow sparse input to incremental PCA May 28, 2019
Copy link
Contributor

NicolasHug left a comment

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:.

sklearn/decomposition/incremental_pca.py Show resolved Hide resolved
sklearn/decomposition/tests/test_incremental_pca.py Outdated Show resolved Hide resolved
sklearn/decomposition/tests/test_incremental_pca.py Outdated Show resolved Hide resolved
@scottgigante scottgigante force-pushed the scottgigante:feature/incremental_pca_sparse_input branch 6 times, most recently from 3afe748 to 9e119f3 May 29, 2019
@scottgigante scottgigante changed the title [WIP] Allow sparse input to incremental PCA [MRG] Allow sparse input to incremental PCA May 29, 2019
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented May 30, 2019

If I want to apply partial_fit batchwise to multiple input matrices, I would not be able to run step 2 without step 1.

yes, but if the way you treat sparse matrices is to densify them, you might as well get the user to do that...

@scottgigante

This comment has been minimized.

Copy link
Contributor Author

scottgigante commented Jun 1, 2019

If I want to apply partial_fit batchwise to multiple input matrices, I would not be able to run step 2 without step 1.

yes, but if the way you treat sparse matrices is to densify them, you might as well get the user to do that...

A clarification: if the user wants to batchwise densify and fit multiple sparse matrices, then this is not currently possible. Example scenario: you want to fit a single estimator to multiple large mtx files. Alternative solution would be to load all of the files, concatenate the matrices with scipy.sparse.vstack and then apply fit.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jun 2, 2019

@scottgigante

This comment has been minimized.

Copy link
Contributor Author

scottgigante commented Jun 2, 2019

I would presume that turning the entire sparse matrix to dense at once would be undesirable for memory reasons.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jun 3, 2019

@scottgigante scottgigante force-pushed the scottgigante:feature/incremental_pca_sparse_input branch from 9e119f3 to 5b45e71 Jun 3, 2019
@scottgigante

This comment has been minimized.

Copy link
Contributor Author

scottgigante commented Jun 3, 2019

Note: it's failing tests due to some other unrelated change, I think - failing tests in grid search CV.

@scottgigante scottgigante reopened this Jun 6, 2019
@scottgigante scottgigante force-pushed the scottgigante:feature/incremental_pca_sparse_input branch 2 times, most recently from 28c78a0 to 7ea580c Jun 6, 2019
@scottgigante

This comment has been minimized.

Copy link
Contributor Author

scottgigante commented Jun 6, 2019

@NicolasHug @jnothman any further comments?

Copy link
Member

jnothman left a comment

Sorry my review time has been limited. Please add a check that partial_fit still raises an appropriate error when passed sparse X. Otherwise lgtm, thanks!

Copy link
Contributor

NicolasHug left a comment

As Joel mentioned please test the error in partial_fit for dense input.

Also all the methods that accept/return a sparse X should be changed to array-like or sparse matrix

Otherwise LGTM too!

@scottgigante scottgigante force-pushed the scottgigante:feature/incremental_pca_sparse_input branch from 7ea580c to bf978a8 Jun 12, 2019
@scottgigante

This comment has been minimized.

Copy link
Contributor Author

scottgigante commented Jun 12, 2019

@jnothman @NicolasHug should be good to go now.

@scottgigante scottgigante force-pushed the scottgigante:feature/incremental_pca_sparse_input branch from bf978a8 to cb5a185 Jun 13, 2019
Copy link
Contributor

NicolasHug left a comment

Thanks @scottgigante !

Copy link
Contributor

NicolasHug left a comment

Thanks @scottgigante !

@scottgigante scottgigante changed the title [MRG] Allow sparse input to incremental PCA [MRG+1] Allow sparse input to incremental PCA Jun 13, 2019
@jnothman jnothman merged commit df7dd83 into scikit-learn:master Jun 14, 2019
16 checks passed
16 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.81%)
Details
codecov/project 96.82% (+<.01%) compared to 4a6264d
Details
scikit-learn.scikit-learn Build #20190613.31 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_np_atlas) Linux py35_np_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda) Linux pylatest_conda succeeded
Details
scikit-learn.scikit-learn (Windows py35_32) Windows py35_32 succeeded
Details
scikit-learn.scikit-learn (Windows py37_64) Windows py37_64 succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda) macOS pylatest_conda succeeded
Details
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jun 14, 2019

Thanks Scott!

@scottgigante

This comment has been minimized.

Copy link
Contributor Author

scottgigante commented Jun 14, 2019

Thanks @jnothman and @NicolasHug for the detailed reviews!

@scottgigante scottgigante deleted the scottgigante:feature/incremental_pca_sparse_input branch Jun 14, 2019
koenvandevelde added a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.