-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seurat3.0 Finding integration vectors: long vectors not supported yet #1029
Comments
Thanks for the question - we've explored this and the cause is that there are so many anchors, that it creates a sparse matrix with >2^31 elements in R, which can throw an error. This is not simply a function of cell or dataset number (we have performed much larger alignments), but we are implementing a fix that will not affect results. Our apologies for the delay in the mean time |
Greetings, I am running into the same issue with my dataset, is there any fix for this yet? My dataset is actually quite similar as the OP in that I have ~120k cells across 36 samples. Thanks for your time! |
Just to highlight that I also have this issue .. 24 samples with (downsampled to) ~3k cells for each sample ..
Has anyone tried using Thanks in advance, |
Ok, I tried |
I solved it .. I was integrating a bunch of features, not just those used for anchoring ..
This works:
|
I've also encountered this error during integration. Is there a recommended workaround while we await a fix? Thanks! |
@jeffjjohnston you can try using Mark's workaround which should work in most cases. We are working on further improvements to the integration procedure that will address this issue more fully, which will be made available soon |
HI, any news about this? I am facing the same problem. Is the workaround just running IntegrateData without using specific features? That doesn't work in my case. |
I am as yet unsure why, but this worked for me:
That is, not specifying anything for The very puzzling part is that I was specifying |
I could get around the issue when integrating my samples in two separate sets (using the SCT assay), followed by integrating the two resulting objects (using the integrated RNA assay of each object). However, now I would like to use the new method for directly integrating the SCT normalized data based on the pearson residuals (https://satijalab.org/seurat/v3.0/pancreas_integration_label_transfer.html) and I am not sure if it is sound to use the same two set workaround in this case. Any feedback? |
We have now introduced multiple ways to scale-up the integration, using either reference-based integration or reciprocal PCA rather than CCA. Please see the integration vignette for examples. |
I looked at the source code and this is the line of code that is the culprit: Line 2129 in c9f2660
Matrix , which still doesn't support vectors with more than 2^31 elements. It's just that a sparse matrix with too many non-zero elements is produced. This can be worked around by using the sparse matrix package spam64 , but will require changes to Seurat's source code. Actually supporting long vectors is on the to do list of Matrix developers, but somehow they still haven't implemented it.
Also, do you think Seurat should use |
I added |
Hi, I've also encountered this"long vector not available" error too, and trying to add "reference=1", but I was wondering what it means, which dataset it's using as a reference during the integrating. I'm quite new to bioinformatics, would very appreciate if you can give me any answer for that, thank you! |
That means using the first dataset in your list as the reference. With a reference, each of the other dataset will be integrated with the reference, so fewer integrations are done, which makes the code faster. |
I see. So it would give me different integrating results when using another dataset in list as the reference, but I was also wondering how to make sure which one is best as a reference, do I need to try all the datasets in my list to get the best results? Thank you. |
@Chenmengpin I was wondering if you got any replies to your question about how referncing would affect the integration or how to determine what a good reference is as I am new to bioinformatics and have the same question! |
@Chenmengpin I was also wondering whether you found the answer to that question. |
Sorry for the late reply. I don't really have the right answer for this issue, I tried several different datasets as reference and found the results were quite similar in my case. I guess you can try this way and see how it goes with your own data. |
I tried different datasets as reference, the intergration results look similar. Try a dataset with good quality would give you a good result I guess. |
Alright thanks @Chenmengpin |
Hi all, I also have the same error. I have a question what dose 1:pa$dims . @markrobinsonuzh seurat <- IntegrateData( anchorset=anchors, |
You can try to use the modified version https://github.com/zhanghao-njmu/seurat which I have created a pull request #6527. I change the integration matrix format to "spam" or "spam64" to process matrices with more than 2^31-1 non-zero elements. However, it is important to note that processing large data requires sufficient memory. |
Hi guys,
When I applied Seurat3.0's IntegrateData on a large dataset which consists of about 100k cells from 50 samples I got an error:
I guess it has to do with the data size, any workaround with this?
PS: when I group this samples into 7 sets and rerun the integration on the 7 datasets, it runs successfully.
Thanks!
Lyu
The text was updated successfully, but these errors were encountered: