Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Allow sparse input to incremental PCA #13960

Conversation

scottgigante
Copy link
Contributor

@scottgigante scottgigante commented May 27, 2019

Reference Issues/PRs

Fixes #13957.

What does this implement/fix? Explain your changes.

Implements sparse input for IncrementalPCA. IncrementalPCA is by design suited to accepting sparse input; this allows the input to be sparse, and if it is so, converts the data to dense on a batch-wise basis.

@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch 4 times, most recently from 880a7a3 to 17d7248 Compare May 27, 2019
@abhinavsagar
Copy link
Contributor

@abhinavsagar abhinavsagar commented May 28, 2019

Looks good to me.

@scottgigante scottgigante changed the title Allow sparse input to incremental PCA [MRG] Allow sparse input to incremental PCA May 28, 2019
doc/modules/decomposition.rst Outdated Show resolved Hide resolved
sklearn/decomposition/incremental_pca.py Show resolved Hide resolved
sklearn/decomposition/incremental_pca.py Outdated Show resolved Hide resolved
@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch from 17d7248 to dc276bf Compare May 28, 2019
@scottgigante scottgigante changed the title [MRG] Allow sparse input to incremental PCA [WIP] Allow sparse input to incremental PCA May 28, 2019
Copy link
Member

@NicolasHug NicolasHug left a comment

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:.

sklearn/decomposition/incremental_pca.py Show resolved Hide resolved
sklearn/decomposition/tests/test_incremental_pca.py Outdated Show resolved Hide resolved
sklearn/decomposition/tests/test_incremental_pca.py Outdated Show resolved Hide resolved
@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch 6 times, most recently from 3afe748 to 9e119f3 Compare May 29, 2019
@scottgigante scottgigante changed the title [WIP] Allow sparse input to incremental PCA [MRG] Allow sparse input to incremental PCA May 29, 2019
@jnothman
Copy link
Member

@jnothman jnothman commented May 30, 2019

If I want to apply partial_fit batchwise to multiple input matrices, I would not be able to run step 2 without step 1.

yes, but if the way you treat sparse matrices is to densify them, you might as well get the user to do that...

@scottgigante
Copy link
Contributor Author

@scottgigante scottgigante commented Jun 1, 2019

If I want to apply partial_fit batchwise to multiple input matrices, I would not be able to run step 2 without step 1.

yes, but if the way you treat sparse matrices is to densify them, you might as well get the user to do that...

A clarification: if the user wants to batchwise densify and fit multiple sparse matrices, then this is not currently possible. Example scenario: you want to fit a single estimator to multiple large mtx files. Alternative solution would be to load all of the files, concatenate the matrices with scipy.sparse.vstack and then apply fit.

@jnothman
Copy link
Member

@jnothman jnothman commented Jun 2, 2019

@scottgigante
Copy link
Contributor Author

@scottgigante scottgigante commented Jun 2, 2019

I would presume that turning the entire sparse matrix to dense at once would be undesirable for memory reasons.

@jnothman
Copy link
Member

@jnothman jnothman commented Jun 3, 2019

@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch from 9e119f3 to 5b45e71 Compare Jun 3, 2019
@scottgigante
Copy link
Contributor Author

@scottgigante scottgigante commented Jun 3, 2019

Note: it's failing tests due to some other unrelated change, I think - failing tests in grid search CV.

@scottgigante scottgigante reopened this Jun 6, 2019
@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch 2 times, most recently from 28c78a0 to 7ea580c Compare Jun 6, 2019
@scottgigante
Copy link
Contributor Author

@scottgigante scottgigante commented Jun 6, 2019

@NicolasHug @jnothman any further comments?

Copy link
Member

@jnothman jnothman left a comment

Sorry my review time has been limited. Please add a check that partial_fit still raises an appropriate error when passed sparse X. Otherwise lgtm, thanks!

Copy link
Member

@NicolasHug NicolasHug left a comment

As Joel mentioned please test the error in partial_fit for dense input.

Also all the methods that accept/return a sparse X should be changed to array-like or sparse matrix

Otherwise LGTM too!

@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch from 7ea580c to bf978a8 Compare Jun 12, 2019
@scottgigante
Copy link
Contributor Author

@scottgigante scottgigante commented Jun 12, 2019

@jnothman @NicolasHug should be good to go now.

@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch from bf978a8 to cb5a185 Compare Jun 13, 2019
Copy link
Member

@NicolasHug NicolasHug left a comment

Thanks @scottgigante !

Copy link
Member

@NicolasHug NicolasHug left a comment

Thanks @scottgigante !

@scottgigante scottgigante changed the title [MRG] Allow sparse input to incremental PCA [MRG+1] Allow sparse input to incremental PCA Jun 13, 2019
@jnothman jnothman merged commit df7dd83 into scikit-learn:master Jun 14, 2019
16 checks passed
@jnothman
Copy link
Member

@jnothman jnothman commented Jun 14, 2019

Thanks Scott!

@scottgigante
Copy link
Contributor Author

@scottgigante scottgigante commented Jun 14, 2019

Thanks @jnothman and @NicolasHug for the detailed reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants