Seurat3.0 Finding integration vectors: long vectors not supported yet #1029

ruqianl · 2018-12-29T02:15:27Z

Hi guys,

When I applied Seurat3.0's IntegrateData on a large dataset which consists of about 100k cells from 50 samples I got an error:

Integrating data
Merging dataset 46 3 34 2 1 47 51 45 49 37 50 into 19 42 36 44 20 30 25 33 26 43 12 14 35 39 21 24 28 31 8 32 29 27 18 23 11 15 17 10 16 7 13 9 38 22 6 41 40 4 5 48
Extracting anchors for merged samples
Finding integration vectors
Error in validityMethod(as(object, superClass)) : 
  long vectors not supported yet: ../../src/include/Rinlinedfuns.h:138
Calls: IntegrateData ... validObject -> anyStrings -> validityMethod -> .Call
Execution halted

I guess it has to do with the data size, any workaround with this?

PS: when I group this samples into 7 sets and rerun the integration on the 7 datasets, it runs successfully.

Thanks!
Lyu

The text was updated successfully, but these errors were encountered:

satijalab · 2019-01-08T15:58:24Z

Thanks for the question - we've explored this and the cause is that there are so many anchors, that it creates a sparse matrix with >2^31 elements in R, which can throw an error.

This is not simply a function of cell or dataset number (we have performed much larger alignments), but we are implementing a fix that will not affect results. Our apologies for the delay in the mean time

gouinK · 2019-04-01T18:26:18Z

Greetings, I am running into the same issue with my dataset, is there any fix for this yet? My dataset is actually quite similar as the OP in that I have ~120k cells across 36 samples. Thanks for your time!

markrobinsonuzh · 2019-04-02T15:19:48Z

Just to highlight that I also have this issue .. 24 samples with (downsampled to) ~3k cells for each sample ..

[snip]
Integrating data
Merging dataset 24 16 18 5 23 20 into 3 13 2 8 14 9 1 7 4 10 22
Extracting anchors for merged samples
Finding integration vectors
Error in validityMethod(as(object, superClass)) : 
  long vectors not supported yet: ../../src/include/Rinlinedfuns.h:519
> traceback()
15: validityMethod(as(object, superClass))
14: isTRUE(x)
13: anyStrings(validityMethod(as(object, superClass)))
12: validObject(.Object)
11: .nextMethod(.Object = .Object, ... = ...)
10: callNextMethod()
9: initialize(value, ...)
8: initialize(value, ...)
7: new("dgTMatrix", Dim = d, Dimnames = dn, i = i, j = j, x = x)
6: newTMat(i = c(ij1[, 1], ij2[, 1]), j = c(ij1[, 2], ij2[, 2]), 
       x = if (Generic == "+") c(e1@x, e2@x) else c(e1@x, -e2@x))
5: .Arith.Csparse(e1, e2, .Generic, class. = "dgCMatrix")
4: data.use2 - data.use1
3: data.use2 - data.use1
2: FindIntegrationMatrix(object = merged.obj, integration.name = integration.name, 
       features.integrate = features.to.integrate, verbose = verbose)
1: IntegrateData(anchorset = anchors, dims = 1:pa$dims, features.to.integrate = row.names(sl[[1]]))

Has anyone tried using anchor.features = 1000 (or something fewer than the default 2000)? I mean, how much does this affect the integration?

Thanks in advance,
Mark Robinson

markrobinsonuzh · 2019-04-02T19:34:19Z

Ok, I tried FindIntegrationAnchors(.., anchor.features = 1000) and no luck, I got the same error.

markrobinsonuzh · 2019-04-03T06:28:17Z

I solved it .. I was integrating a bunch of features, not just those used for anchoring ..

seurat <- IntegrateData( anchorset=anchors, 
                         dims = 1:pa$dims,
                         features.to.integrate=row.names(sl[[1]]) )

This works:

seurat <- IntegrateData( anchorset=anchors, 
                         dims = 1:pa$dims)

jeffjjohnston · 2019-05-08T14:39:47Z

I've also encountered this error during integration. Is there a recommended workaround while we await a fix? Thanks!

timoast · 2019-05-17T15:30:57Z

@jeffjjohnston you can try using Mark's workaround which should work in most cases. We are working on further improvements to the integration procedure that will address this issue more fully, which will be made available soon

MaxKman · 2019-06-22T07:42:06Z

HI, any news about this? I am facing the same problem. Is the workaround just running IntegrateData without using specific features? That doesn't work in my case.

kevinblighe · 2019-06-27T16:15:31Z

I am as yet unsure why, but this worked for me:

IntegrateData( anchorset = scData.Anchors )

That is, not specifying anything for dims or features.to.integrate.

The very puzzling part is that I was specifying dims = 1:20, code for which the error stated in the original question (above) is returned. The default for dims, however, is 1:30, so, 1:30 is used if nothing is set for dims (?).

MaxKman · 2019-07-10T17:55:28Z

I could get around the issue when integrating my samples in two separate sets (using the SCT assay), followed by integrating the two resulting objects (using the integrated RNA assay of each object). However, now I would like to use the new method for directly integrating the SCT normalized data based on the pearson residuals (https://satijalab.org/seurat/v3.0/pancreas_integration_label_transfer.html) and I am not sure if it is sound to use the same two set workaround in this case. Any feedback?

timoast · 2019-08-02T16:37:07Z

We have now introduced multiple ways to scale-up the integration, using either reference-based integration or reciprocal PCA rather than CCA. Please see the integration vignette for examples.

lambdamoses · 2019-09-25T05:16:30Z

I looked at the source code and this is the line of code that is the culprit:

seurat/R/integration.R

Line 2129 in c9f2660

integration.matrix <- data.use2 - data.use1

I don't think this is Seurat's problem, but the problem with Matrix, which still doesn't support vectors with more than 2^31 elements. It's just that a sparse matrix with too many non-zero elements is produced. This can be worked around by using the sparse matrix package spam64, but will require changes to Seurat's source code. Actually supporting long vectors is on the to do list of Matrix developers, but somehow they still haven't implemented it.

Also, do you think Seurat should use HDF5Array from Bioconductor for data that doesn't fit into memory?

lambdamoses · 2019-09-26T04:39:57Z

I added reference = 1 to anchors <- FindIntegrationAnchors(seus, normalization.method = "SCT", anchor.features = features_use, reference = 1), and this issue has been avoided. I still got decent results after integrating 21 datasets with over 180k cells in total.

Chenmengpin · 2019-10-17T07:26:15Z

I added reference = 1 to anchors <- FindIntegrationAnchors(seus, normalization.method = "SCT", anchor.features = features_use, reference = 1), and this issue has been avoided. I still got decent results after integrating 21 datasets with over 180k cells in total.

Hi, I've also encountered this"long vector not available" error too, and trying to add "reference=1", but I was wondering what it means, which dataset it's using as a reference during the integrating. I'm quite new to bioinformatics, would very appreciate if you can give me any answer for that, thank you!

lambdamoses · 2019-10-17T07:29:10Z

That means using the first dataset in your list as the reference. With a reference, each of the other dataset will be integrated with the reference, so fewer integrations are done, which makes the code faster.

Chenmengpin · 2019-10-17T07:37:54Z

That means using the first dataset in your list as the reference. With a reference, each of the other dataset will be integrated with the reference, so fewer integrations are done, which makes the code faster.

I see. So it would give me different integrating results when using another dataset in list as the reference, but I was also wondering how to make sure which one is best as a reference, do I need to try all the datasets in my list to get the best results? Thank you.

rebeccawuu · 2021-01-26T15:53:34Z

@Chenmengpin I was wondering if you got any replies to your question about how referncing would affect the integration or how to determine what a good reference is as I am new to bioinformatics and have the same question!

gabsax · 2021-02-23T11:15:51Z

@Chenmengpin I was also wondering whether you found the answer to that question.

Chenmengpin · 2021-02-24T09:23:03Z

@Chenmengpin I was wondering if you got any replies to your question about how referncing would affect the integration or how to determine what a good reference is as I am new to bioinformatics and have the same question!

Sorry for the late reply. I don't really have the right answer for this issue, I tried several different datasets as reference and found the results were quite similar in my case. I guess you can try this way and see how it goes with your own data.

Chenmengpin · 2021-02-24T09:26:44Z

@Chenmengpin I was also wondering whether you found the answer to that question.

I tried different datasets as reference, the intergration results look similar. Try a dataset with good quality would give you a good result I guess.

gabsax · 2021-02-24T13:19:46Z

Alright thanks @Chenmengpin

Pingxu0101 · 2022-09-06T18:40:00Z

Hi all, I also have the same error. I have a question what dose 1:pa$dims . @markrobinsonuzh
I saw you already solved this error. But I. din't understand the pa here.
Could you explain a little bit?
Thanks in advance.

seurat <- IntegrateData( anchorset=anchors,
dims = 1:pa$dims)

zhanghao-njmu · 2022-10-14T01:27:38Z

You can try to use the modified version https://github.com/zhanghao-njmu/seurat which I have created a pull request #6527.

I change the integration matrix format to "spam" or "spam64" to process matrices with more than 2^31-1 non-zero elements.

However, it is important to note that processing large data requires sufficient memory.
I tested the modified version on the data with more than 200,000 cells. The maximum memory used during the calculation was 1TB.

timoast closed this as completed Aug 2, 2019

WT215 mentioned this issue Jan 10, 2020

Work around for seurat object WT215/bayNorm#1

Closed

liu-xingliang mentioned this issue Mar 15, 2020

Hierarchical integration on multiple datasets #2723

Closed

liu-xingliang mentioned this issue Mar 22, 2020

Integration of integrated objects #1934

Closed

liu-xingliang mentioned this issue May 8, 2020

Best way to select integration features when integrating two integrated datasets #2979

Closed

vertesy mentioned this issue Dec 9, 2020

IntegrateData() fails: Error in validityMethod(as(object, superClass)) vertesy/Seurat.CBE.issues#4

Closed

parvathisudha mentioned this issue Dec 13, 2020

Issue while Integrating 20 seurat objects SCT #3823

Closed

torkencz mentioned this issue Jan 22, 2021

IntegrateData() Error finding integration vectors #3940

Closed

Gesmira mentioned this issue Jun 7, 2023

Integrating large datasets #7419

Closed

jr-leary7 mentioned this issue Mar 19, 2024

long vectors (argument 1) are not supported in .C after integration jr-leary7/SCISSORS#33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seurat3.0 Finding integration vectors: long vectors not supported yet #1029

Seurat3.0 Finding integration vectors: long vectors not supported yet #1029

ruqianl commented Dec 29, 2018

satijalab commented Jan 8, 2019

gouinK commented Apr 1, 2019

markrobinsonuzh commented Apr 2, 2019

markrobinsonuzh commented Apr 2, 2019 •

edited

Loading

markrobinsonuzh commented Apr 3, 2019 •

edited

Loading

jeffjjohnston commented May 8, 2019

timoast commented May 17, 2019

MaxKman commented Jun 22, 2019

kevinblighe commented Jun 27, 2019

MaxKman commented Jul 10, 2019 •

edited

Loading

timoast commented Aug 2, 2019

lambdamoses commented Sep 25, 2019 •

edited

Loading

lambdamoses commented Sep 26, 2019

Chenmengpin commented Oct 17, 2019

lambdamoses commented Oct 17, 2019

Chenmengpin commented Oct 17, 2019

rebeccawuu commented Jan 26, 2021

gabsax commented Feb 23, 2021

Chenmengpin commented Feb 24, 2021

Chenmengpin commented Feb 24, 2021

gabsax commented Feb 24, 2021

Pingxu0101 commented Sep 6, 2022

zhanghao-njmu commented Oct 14, 2022

Seurat3.0 Finding integration vectors: long vectors not supported yet #1029

Seurat3.0 Finding integration vectors: long vectors not supported yet #1029

Comments

ruqianl commented Dec 29, 2018

satijalab commented Jan 8, 2019

gouinK commented Apr 1, 2019

markrobinsonuzh commented Apr 2, 2019

markrobinsonuzh commented Apr 2, 2019 • edited Loading

markrobinsonuzh commented Apr 3, 2019 • edited Loading

jeffjjohnston commented May 8, 2019

timoast commented May 17, 2019

MaxKman commented Jun 22, 2019

kevinblighe commented Jun 27, 2019

MaxKman commented Jul 10, 2019 • edited Loading

timoast commented Aug 2, 2019

lambdamoses commented Sep 25, 2019 • edited Loading

lambdamoses commented Sep 26, 2019

Chenmengpin commented Oct 17, 2019

lambdamoses commented Oct 17, 2019

Chenmengpin commented Oct 17, 2019

rebeccawuu commented Jan 26, 2021

gabsax commented Feb 23, 2021

Chenmengpin commented Feb 24, 2021

Chenmengpin commented Feb 24, 2021

gabsax commented Feb 24, 2021

Pingxu0101 commented Sep 6, 2022

zhanghao-njmu commented Oct 14, 2022

markrobinsonuzh commented Apr 2, 2019 •

edited

Loading

markrobinsonuzh commented Apr 3, 2019 •

edited

Loading

MaxKman commented Jul 10, 2019 •

edited

Loading

lambdamoses commented Sep 25, 2019 •

edited

Loading