Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntegrateData error: Error in idx[i, ] <- res[[i]][[1]] : #3930

Closed
NicolaasVanRenne opened this issue Jan 13, 2021 · 7 comments
Closed

IntegrateData error: Error in idx[i, ] <- res[[i]][[1]] : #3930

NicolaasVanRenne opened this issue Jan 13, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@NicolaasVanRenne
Copy link

I often re-integrate data, meaning, taking a subset and re-integrate based on orig.ident. It really works well and yields biologically meaningful information. However, sometimes when cell count of a sample (orig.ident) is smaller than 200 I have to adjust the k.filter, otherwise it won't run. I understand this makes it less reliable, but it seems to work.

Now here is my problem: It appears now if some of the samples (orig.ident) are smaller than 100 it throws this error when running IntegrateData

Merging dataset 2 into 1
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
Error in idx[i, ] <- res[[i]][[1]] :
number of items to replace is not a multiple of replacement length

I can "solve" this by decreasing the k.weight to the number of cells of the sample with the lowest cell count. However; I'm very unsure about what this k.weight actually does.

So I'm not even sure if the error it throws is really a bug, or just a warning that I'm using a sample with too low a cell count.

So is it ok for me to lower the k weight and k filter?

thank you
Nicolaas

sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252
[3] LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C
[5] LC_TIME=Dutch_Belgium.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] patchwork_1.1.1 RColorBrewer_1.1-2 ggplot2_3.3.3
[4] Seurat_3.9.9.9024 dplyr_1.0.2

loaded via a namespace (and not attached):
[1] nlme_3.1-149 matrixStats_0.57.0 RcppAnnoy_0.0.18
[4] httr_1.4.2 sctransform_0.3.2 tools_4.0.3
[7] R6_2.5.0 irlba_2.3.3 rpart_4.1-15
[10] KernSmooth_2.23-17 uwot_0.1.10 lazyeval_0.2.2
[13] mgcv_1.8-33 colorspace_2.0-0 withr_2.3.0
[16] tidyselect_1.1.0 gridExtra_2.3 compiler_4.0.3
[19] plotly_4.9.2.2 labeling_0.4.2 scales_1.1.1
[22] lmtest_0.9-38 spatstat.data_1.7-0 ggridges_0.5.2
[25] pbapply_1.4-3 spatstat_1.64-1 goftest_1.2-2
[28] stringr_1.4.0 digest_0.6.27 spatstat.utils_1.17-0
[31] pkgconfig_2.0.3 htmltools_0.5.0 parallelly_1.23.0
[34] fastmap_1.0.1 htmlwidgets_1.5.3 rlang_0.4.10
[37] shiny_1.5.0 farver_2.0.3 generics_0.1.0
[40] zoo_1.8-8 jsonlite_1.7.2 ica_1.0-2
[43] magrittr_2.0.1 Matrix_1.2-18 Rcpp_1.0.5
[46] munsell_0.5.0 abind_1.4-5 reticulate_1.18
[49] lifecycle_0.2.0 stringi_1.5.3 MASS_7.3-53
[52] Rtsne_0.15 plyr_1.8.6 grid_4.0.3
[55] parallel_4.0.3 listenv_0.8.0 promises_1.1.1
[58] ggrepel_0.9.0 crayon_1.3.4 miniUI_0.1.1.1
[61] deldir_0.2-3 lattice_0.20-41 cowplot_1.1.1
[64] splines_4.0.3 tensor_1.5 pillar_1.4.7
[67] igraph_1.2.6 future.apply_1.7.0 reshape2_1.4.4
[70] codetools_0.2-16 leiden_0.3.6 glue_1.4.2
[73] data.table_1.13.6 png_0.1-7 vctrs_0.3.6
[76] httpuv_1.5.4 gtable_0.3.0 RANN_2.6.1
[79] purrr_0.3.4 polyclip_1.10-0 tidyr_1.1.2
[82] scattermore_0.7 future_1.21.0 rsvd_1.0.3
[85] mime_0.9 xtable_1.8-4 RSpectra_0.16-0
[88] later_1.1.0.1 survival_3.2-7 viridisLite_0.3.0
[91] tibble_3.0.4 cluster_2.1.0 globals_0.14.0
[94] fitdistrplus_1.1-3 ellipsis_0.3.1 ROCR_1.0-11

@NicolaasVanRenne NicolaasVanRenne added the bug Something isn't working label Jan 13, 2021
@jaisonj708
Copy link
Collaborator

It is generally okay to decrease those parameters if you have to. You can think of k.weight as a smoothing parameter. A value of 100 means each cell will be transformed by a weighted combination of the nearest 100 anchors. The assumption is that there is some amount of randomness/sparsity in the data, making it desirable to combine anchors in the same neighborhood. If you set k.weight very low, you will have less smoothing and are basically assuming the information in each of your cells is more reliable/complete. This might be the case for your data, but for most single-cell RNA assays there is a good deal of sparsity

@Rahul1711arora
Copy link

Hi,

I am getting the same error message upon integration and I understand from this post that smoothing parameter can be modified.
But I would like to understand it more, for instance what is the recommended range in which the smoothing parameter can vary? And how do we set this k.weight parameter? Does it need to be modified depending upon the datasets being integrated?

Thanks in advance for your help!

@andrea-de-micheli
Copy link

Hello,
I have the same question and concern when re integrating datasets with very different cells numbers (some in the hundreds, others in the thousands). I'm not sure I understand this following error:

Error in idx[i, ] <- res[[i]][[1]] : 
  number of items to replace is not a multiple of replacement length

and how it is related to the smoothing parameter. Thank you.

@keggleigh
Copy link

Hi, I'm having a similar issue. When I subset and reintegrate I get the same error after running IntegrateData():

Merging dataset 9 into 14
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
Error in idx[i, ] <- res[[i]][[1]] :
number of items to replace is not a multiple of replacement length

My code is below - my smallest dataset to be integrated contains 169 cells and after running FindIntegrationAnchors(), I checked back through the output and the lowest number of anchors found was 90, so I set the k.weight parameter to =89 as suggested in #4427

This is my code:
#Read in SCT transform file
split_seurat <- readRDS("/mainfs/scratch/kjb1g22/ResearchProject/DoubletFinder/Integration/Cluster_Analysis/Microglia/Microglia_Macrophage_SCT.rds")

#Features
features <- SelectIntegrationFeatures(object.list = split_seurat, nfeatures = 3000)
split_seurat <- PrepSCTIntegration(object.list = split_seurat, anchor.features = features)

#Run PCA
split_seurat <- lapply(X = split_seurat, FUN = RunPCA, features = features)

#Integration
seurat.anchors <- FindIntegrationAnchors(object.list = split_seurat, normalization.method = "SCT",
anchor.features = features, dims = 1:30, reduction = "rpca", k.filter=168)
seurat.combined.sct <- IntegrateData(anchorset = seurat.anchors, normalization.method = "SCT", dims = 1:30, k.weight=89)

everything runs fine until the IntegrateData() step. Can anyone suggest what may be wrong? Original workflow comes from the Faster Intergration Using RPCA vignette (https://satijalab.org/seurat/articles/integration_rpca.html)

Thank you!

@Gesmira
Copy link
Contributor

Gesmira commented Jul 7, 2023

Hi @keggleigh,
Are you able to share this dataset or a reproducible example?

@Gesmira
Copy link
Contributor

Gesmira commented Jul 7, 2023

Hi,
We are adding a more informative error message for this issue currently in the development branch. It seems to occur when the number of anchor cells is less than the k.weight parameter which is the number of neighbors to use when weighting anchors. For now, I would recommend reducing k.weight, combining samples if there are few cells in certain samples, or adjusting parameters to FindIntegrationAnchors (which can be provided as inputs to IntegrateLayers) to increase the number of anchors/cells which act as anchors. Decreasing k.weight will decrease the power of your integration, so we recommend checking the results of your integration if these parameters are changed to ensure the results are satisfactory.

@chiwwong
Copy link

chiwwong commented Aug 9, 2023

I also experienced the same issue as mentioned by @keggleigh.

In my case, the minimum sample size is 69, and I set k.weight to 50. However, errors persist. I am thinking if that is caused by normalization.method = "SCT", which has been used for both of us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants