IntegrateData error: Error in idx[i, ] <- res[[i]][[1]] : #3930

NicolaasVanRenne · 2021-01-13T14:01:38Z

I often re-integrate data, meaning, taking a subset and re-integrate based on orig.ident. It really works well and yields biologically meaningful information. However, sometimes when cell count of a sample (orig.ident) is smaller than 200 I have to adjust the k.filter, otherwise it won't run. I understand this makes it less reliable, but it seems to work.

Now here is my problem: It appears now if some of the samples (orig.ident) are smaller than 100 it throws this error when running IntegrateData

Merging dataset 2 into 1
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
Error in idx[i, ] <- res[[i]][[1]] :
number of items to replace is not a multiple of replacement length

I can "solve" this by decreasing the k.weight to the number of cells of the sample with the lowest cell count. However; I'm very unsure about what this k.weight actually does.

So I'm not even sure if the error it throws is really a bug, or just a warning that I'm using a sample with too low a cell count.

So is it ok for me to lower the k weight and k filter?

thank you
Nicolaas

sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252
[3] LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C
[5] LC_TIME=Dutch_Belgium.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] patchwork_1.1.1 RColorBrewer_1.1-2 ggplot2_3.3.3
[4] Seurat_3.9.9.9024 dplyr_1.0.2

loaded via a namespace (and not attached):
[1] nlme_3.1-149 matrixStats_0.57.0 RcppAnnoy_0.0.18
[4] httr_1.4.2 sctransform_0.3.2 tools_4.0.3
[7] R6_2.5.0 irlba_2.3.3 rpart_4.1-15
[10] KernSmooth_2.23-17 uwot_0.1.10 lazyeval_0.2.2
[13] mgcv_1.8-33 colorspace_2.0-0 withr_2.3.0
[16] tidyselect_1.1.0 gridExtra_2.3 compiler_4.0.3
[19] plotly_4.9.2.2 labeling_0.4.2 scales_1.1.1
[22] lmtest_0.9-38 spatstat.data_1.7-0 ggridges_0.5.2
[25] pbapply_1.4-3 spatstat_1.64-1 goftest_1.2-2
[28] stringr_1.4.0 digest_0.6.27 spatstat.utils_1.17-0
[31] pkgconfig_2.0.3 htmltools_0.5.0 parallelly_1.23.0
[34] fastmap_1.0.1 htmlwidgets_1.5.3 rlang_0.4.10
[37] shiny_1.5.0 farver_2.0.3 generics_0.1.0
[40] zoo_1.8-8 jsonlite_1.7.2 ica_1.0-2
[43] magrittr_2.0.1 Matrix_1.2-18 Rcpp_1.0.5
[46] munsell_0.5.0 abind_1.4-5 reticulate_1.18
[49] lifecycle_0.2.0 stringi_1.5.3 MASS_7.3-53
[52] Rtsne_0.15 plyr_1.8.6 grid_4.0.3
[55] parallel_4.0.3 listenv_0.8.0 promises_1.1.1
[58] ggrepel_0.9.0 crayon_1.3.4 miniUI_0.1.1.1
[61] deldir_0.2-3 lattice_0.20-41 cowplot_1.1.1
[64] splines_4.0.3 tensor_1.5 pillar_1.4.7
[67] igraph_1.2.6 future.apply_1.7.0 reshape2_1.4.4
[70] codetools_0.2-16 leiden_0.3.6 glue_1.4.2
[73] data.table_1.13.6 png_0.1-7 vctrs_0.3.6
[76] httpuv_1.5.4 gtable_0.3.0 RANN_2.6.1
[79] purrr_0.3.4 polyclip_1.10-0 tidyr_1.1.2
[82] scattermore_0.7 future_1.21.0 rsvd_1.0.3
[85] mime_0.9 xtable_1.8-4 RSpectra_0.16-0
[88] later_1.1.0.1 survival_3.2-7 viridisLite_0.3.0
[91] tibble_3.0.4 cluster_2.1.0 globals_0.14.0
[94] fitdistrplus_1.1-3 ellipsis_0.3.1 ROCR_1.0-11

jaisonj708 · 2021-01-15T18:53:14Z

It is generally okay to decrease those parameters if you have to. You can think of k.weight as a smoothing parameter. A value of 100 means each cell will be transformed by a weighted combination of the nearest 100 anchors. The assumption is that there is some amount of randomness/sparsity in the data, making it desirable to combine anchors in the same neighborhood. If you set k.weight very low, you will have less smoothing and are basically assuming the information in each of your cells is more reliable/complete. This might be the case for your data, but for most single-cell RNA assays there is a good deal of sparsity

Rahul1711arora · 2023-01-12T20:42:21Z

Hi,

I am getting the same error message upon integration and I understand from this post that smoothing parameter can be modified.
But I would like to understand it more, for instance what is the recommended range in which the smoothing parameter can vary? And how do we set this k.weight parameter? Does it need to be modified depending upon the datasets being integrated?

Thanks in advance for your help!

andrea-de-micheli · 2023-05-30T13:35:14Z

Hello,
I have the same question and concern when re integrating datasets with very different cells numbers (some in the hundreds, others in the thousands). I'm not sure I understand this following error:

Error in idx[i, ] <- res[[i]][[1]] : 
  number of items to replace is not a multiple of replacement length

and how it is related to the smoothing parameter. Thank you.

keggleigh · 2023-06-14T15:37:51Z

Hi, I'm having a similar issue. When I subset and reintegrate I get the same error after running IntegrateData():

Merging dataset 9 into 14
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
Error in idx[i, ] <- res[[i]][[1]] :
number of items to replace is not a multiple of replacement length

My code is below - my smallest dataset to be integrated contains 169 cells and after running FindIntegrationAnchors(), I checked back through the output and the lowest number of anchors found was 90, so I set the k.weight parameter to =89 as suggested in #4427

This is my code:
#Read in SCT transform file
split_seurat <- readRDS("/mainfs/scratch/kjb1g22/ResearchProject/DoubletFinder/Integration/Cluster_Analysis/Microglia/Microglia_Macrophage_SCT.rds")

#Features
features <- SelectIntegrationFeatures(object.list = split_seurat, nfeatures = 3000)
split_seurat <- PrepSCTIntegration(object.list = split_seurat, anchor.features = features)

#Run PCA
split_seurat <- lapply(X = split_seurat, FUN = RunPCA, features = features)

#Integration
seurat.anchors <- FindIntegrationAnchors(object.list = split_seurat, normalization.method = "SCT",
anchor.features = features, dims = 1:30, reduction = "rpca", k.filter=168)
seurat.combined.sct <- IntegrateData(anchorset = seurat.anchors, normalization.method = "SCT", dims = 1:30, k.weight=89)

everything runs fine until the IntegrateData() step. Can anyone suggest what may be wrong? Original workflow comes from the Faster Intergration Using RPCA vignette (https://satijalab.org/seurat/articles/integration_rpca.html)

Thank you!

Gesmira · 2023-07-07T17:11:03Z

Hi @keggleigh,
Are you able to share this dataset or a reproducible example?

Gesmira · 2023-07-07T20:42:09Z

Hi,
We are adding a more informative error message for this issue currently in the development branch. It seems to occur when the number of anchor cells is less than the k.weight parameter which is the number of neighbors to use when weighting anchors. For now, I would recommend reducing k.weight, combining samples if there are few cells in certain samples, or adjusting parameters to FindIntegrationAnchors (which can be provided as inputs to IntegrateLayers) to increase the number of anchors/cells which act as anchors. Decreasing k.weight will decrease the power of your integration, so we recommend checking the results of your integration if these parameters are changed to ensure the results are satisfactory.

chiwwong · 2023-08-09T21:18:09Z

I also experienced the same issue as mentioned by @keggleigh.

In my case, the minimum sample size is 69, and I set k.weight to 50. However, errors persist. I am thinking if that is caused by normalization.method = "SCT", which has been used for both of us.

NicolaasVanRenne added the bug Something isn't working label Jan 13, 2021

jaisonj708 closed this as completed Jan 15, 2021

ksaunders73 mentioned this issue Jul 18, 2021

FindIntegrationAnchors error: Error in idx[i, ] <- res[[i]][[1]] #4803

Closed

JPGranizo mentioned this issue Jul 19, 2021

MapQuery() error #4812

Closed

mass-a mentioned this issue Sep 3, 2021

Error in idx[i, ] <- res[[i]][[1]] : number of items to replace is not a multiple of replacement length carmonalab/STACAS#12

Closed

allyhawkins mentioned this issue Sep 30, 2022

Integrate simulated data with uneven cell types AlexsLemonade/sc-data-integration#167

Merged

Gesmira mentioned this issue Feb 24, 2023

Error in idx[i, ] <- res[[i]][[1]] : number of items to replace is not a multiple of replacement length in IntegrateData() #6359

Closed

briglo mentioned this issue Mar 9, 2023

TransferData failing with rare reference celltypes digitalcytometry/cytospace#58

Closed

dy-lin mentioned this issue Mar 31, 2023

RunAzimuth error: Error in idx[i, ] <- res[[i]][[1]] satijalab/azimuth#144

Open

rayajallad mentioned this issue Jul 25, 2023

Finding integration anchors for small datasets generates an error #7612

Closed

erinlbrown mentioned this issue Nov 11, 2023

Integration error digitalcytometry/cytospace#94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IntegrateData error: Error in idx[i, ] <- res[[i]][[1]] : #3930

IntegrateData error: Error in idx[i, ] <- res[[i]][[1]] : #3930

NicolaasVanRenne commented Jan 13, 2021

jaisonj708 commented Jan 15, 2021

Rahul1711arora commented Jan 12, 2023

andrea-de-micheli commented May 30, 2023

keggleigh commented Jun 14, 2023

Gesmira commented Jul 7, 2023

Gesmira commented Jul 7, 2023

chiwwong commented Aug 9, 2023

IntegrateData error: Error in idx[i, ] <- res[[i]][[1]] : #3930

IntegrateData error: Error in idx[i, ] <- res[[i]][[1]] : #3930

Comments

NicolaasVanRenne commented Jan 13, 2021

jaisonj708 commented Jan 15, 2021

Rahul1711arora commented Jan 12, 2023

andrea-de-micheli commented May 30, 2023

keggleigh commented Jun 14, 2023

Gesmira commented Jul 7, 2023

Gesmira commented Jul 7, 2023

chiwwong commented Aug 9, 2023