Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for large matrices #4380

Closed
rhodesch opened this issue Apr 16, 2021 · 5 comments
Closed

Support for large matrices #4380

rhodesch opened this issue Apr 16, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@rhodesch
Copy link

rhodesch commented Apr 16, 2021

As mentioned in issue #1644, the sparse matrices used for count data cannot hold more than 2^32-1 non-zero elements. Is there any support for interaction with on-disk objects (i.e. loom) with Seurat V4?

Using SeuratDisk and/or loomR this would seem doable, but all options I've seen regarding hdf5 support on the Seurat and SeuratDisk websites suggests data is loaded from a hdf5 object and converted to a sparse matrix (not viable for matrices > 2^32-1 non-zero elements) prior to creating a Seurat object.

Last, as mentioned in issue #1868, the loom branch of Seurat that had functionality for loom objects ( i.e. NormalizeData(object = 'path/to/loom.loom', overwrite = TRUE, display.progress = FALSE) ) appears outdated and at least one dependency is no longer on CRAN.

If there is documentation for this type of functionality could you please point to it?

@rhodesch rhodesch added the enhancement New feature or request label Apr 16, 2021
@timoast
Copy link
Collaborator

timoast commented Apr 23, 2021

This is a longer-term goal that we are working on, but we do not currently support on-disk computation or other sparse matrix formats in Seurat

@timoast timoast closed this as completed Apr 23, 2021
@sscien
Copy link

sscien commented May 19, 2022

Is this issue still not being fixed?

@siefejo1
Copy link

siefejo1 commented Oct 6, 2022

This is still limiting:

combined <- IntegrateData(anchorset = anchors, normalization.method = "SCT", features.to.integrate = all_features)
sct.model: model1
Setting min_variance to:  -Inf
Calculating residuals of type pearson for 14776 genes
  |====================================================================================================================| 100%
sct.model: model1
Setting min_variance to:  -Inf
Calculating residuals of type pearson for 14776 genes
  |====================================================================================================================| 100%
Error in .m2sparse(from, paste0(kind, "g", repr), NULL, NULL) : 
  attempt to construct sparse matrix with more than 2^31-1 nonzero elements

Any way around this? I need gene level integrated values, ideally for all detected genes between two large datasets

Cutting it down to 10,000 genes results in a different error:

Error in validityMethod(as(object, superClass)) : 
  long vectors not supported yet: ../../src/include/Rinlinedfuns.h:537

10000 is really the minimum I need

@zhanghao-njmu
Copy link

You can try to use the modified version https://github.com/zhanghao-njmu/seurat which I have created a pull request #6527.

I change the integration matrix format to "spam" or "spam64" to process matrices with more than 2^31-1 non-zero elements.

However, it is important to note that processing large data requires sufficient memory.
I tested the modified version on the data with more than 200,000 cells. The maximum memory used during the calculation was 1TB.

@shahrozeabbas
Copy link

Hi @zhanghao-njmu would you be able to make the same changes to Signac for scATAC-seq? Might be a huge ask, but would be super helpful, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants