enable PCA on `data` to exploit sparseness #2333

jan-glx · 2019-11-18T15:37:36Z

This PR

adds a slot parameter to RunPCA.Seurat and RunPCA.Assay
- this allows to run RunPCA on raw (counts) or normalized data (data) without running ScaleData first
adds scale and center parameters to simplify use of implicit scaling/centering in RunPCA
- Performing PCA with implicit scaling on a sparse matrix allows for a significant speedup (~6x) over explicit scaling through ScaleData (converts to a dense matrix) (see comment below for details)
is based on Fix weight.by.var for approx=FALSE #2330 - which should be merged first
is not yet ready for merge because:
1. Should center, scale and slot parameter values be saved in the dimReduce object? Where?
2. To be more consistent with ScaleData scale and center might be better names do.scale and do.center, scale and scale. and center parameters values should the be used as default if supplied, to keep existing code using these (through ....) functional
3. for approx=FALSE only, RunPCA has by default performed centering, while this is only a problem if the user used do.center=FALSE in ScaleData, it is inconsistent and should be changed
4. Implementation could be simplified by switching from irlba to prcmp_irlba if simplify and and optimize prcomp_irlba bwlewis/irlba#52 gets merged
5. ?

control behavior through option, default: old behavior, warning simplify implementation

adds an additional argument `slot` that can be used to specify the sparse `'data'` matrix instead of the full rank `'scale.data'` matrix. Sparseness can be exploited using the `irlba` package providing the centering and scaling factors as additional arguments. This gives an ~ 6x speedup

jan-glx added 5 commits November 17, 2019 16:20

tests for correct weighting/scaling of PCs

1f81a35

test for downward compatible behavior of RunPCA

852e2d1

fix scaling of PCs for approx=FALSE

0730374

control behavior through option, default: old behavior, warning simplify implementation

Merge branch 'develop' into PCA_on_sparse

fb13fe9

dcollins15 force-pushed the develop branch from a87fd5f to 41d19a8 Compare November 28, 2023 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable PCA on `data` to exploit sparseness #2333

enable PCA on `data` to exploit sparseness #2333

jan-glx commented Nov 18, 2019

enable PCA on data to exploit sparseness #2333

Are you sure you want to change the base?

enable PCA on data to exploit sparseness #2333

Conversation

jan-glx commented Nov 18, 2019

enable PCA on `data` to exploit sparseness #2333

enable PCA on `data` to exploit sparseness #2333