Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse to dense coercion when running merge on two Seurat objects. #9125

Open
joshuak94 opened this issue Jul 19, 2024 · 4 comments
Open

Sparse to dense coercion when running merge on two Seurat objects. #9125

joshuak94 opened this issue Jul 19, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@joshuak94
Copy link

joshuak94 commented Jul 19, 2024

I was trying to create a reproducible example of another issue I'm having with JoinLayers() taking an indefinite amount of time (killed manually after ~12 hours).

The dataset I used is from here, I used the gene_count_cleaned_sampled_100k.rds file along with the cell_annotation.csv file for metadata.

I split the gene matrix into two groups: E11.5 cells and E13.5 cells. When merging, I get the following warnings, and then eventually an error:

Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 6.6 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 3.6 GiB”

Error: cannot allocate vector of size 5.1 Gb
Traceback:

1. merge(data_115, data_135, add.cell.ids = c("115", "135"))
2. merge(data_115, data_135, add.cell.ids = c("115", "135"))
3. merge.default(data_115, data_135, add.cell.ids = c("115", "135"))
4. merge(as.data.frame(x), as.data.frame(y), ...)
5. merge.data.frame(as.data.frame(x), as.data.frame(y), ...)
6. cbind(x[ij[, 1L], , drop = FALSE], y[ij[, 2L], , drop = FALSE])
7. x[ij[, 1L], , drop = FALSE]
8. `[.data.frame`(x, ij[, 1L], , drop = FALSE)

My memory usage also skyrockets to 400+ GB.

Source code:

library(Seurat)

data = readRDS("/project/moca/gene_count_cleaned_sampled_100k.RDS")
metadata = read.csv("/project/moca/cell_annotate.csv")
rownames(data) = gsub("\\.\\d+$", "", rownames(data))

metadata_subset115 = metadata[which(metadata$sample %in% colnames(data) & metadata$development_stage == 11.5), ]
metadata_subset135 = metadata[which(metadata$sample %in% colnames(data) & metadata$development_stage == 13.5), ]

data_115 = data[, which(colnames(data) %in% metadata_subset115$sample)]
data_seurat_115 = CreateSeuratObject(data_115, meta.data = metadata_subset115)

data_135 = data[, which(colnames(data) %in% metadata_subset135$sample)]
data_seurat_135 = CreateSeuratObject(data_135, meta.data = metadata_subset135)

merged_data = merge(data_115, data_135, add.cell.ids=c("115", "135"))

sessionInfo():

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: MarIuX64 2.0 GNU/Linux

Matrix products: default
BLAS:   /pkg/R-4.4.0-0/lib/R/lib/libRblas.so 
LAPACK: /usr/lib/liblapack.so.3.10.1

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C           
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Seurat_5.1.0       SeuratObject_5.0.2 sp_2.1-4          

loaded via a namespace (and not attached):
  [1] deldir_2.0-4           pbapply_1.7-2          gridExtra_2.3         
  [4] rlang_1.1.4            magrittr_2.0.3         RcppAnnoy_0.0.22      
  [7] spatstat.geom_3.3-2    matrixStats_1.3.0      ggridges_0.5.6        
 [10] compiler_4.4.0         png_0.1-8              vctrs_0.6.5           
 [13] reshape2_1.4.4         stringr_1.5.1          pkgconfig_2.0.3       
 [16] crayon_1.5.3           fastmap_1.2.0          utf8_1.2.4            
 [19] promises_1.3.0         purrr_1.0.2            jsonlite_1.8.8        
 [22] goftest_1.2-3          later_1.3.2            uuid_1.1-1            
 [25] spatstat.utils_3.0-5   irlba_2.3.5.1          parallel_4.4.0        
 [28] cluster_2.1.6          R6_2.5.1               ica_1.0-3             
 [31] stringi_1.8.4          RColorBrewer_1.1-3     spatstat.data_3.1-2   
 [34] reticulate_1.38.0      spatstat.univar_3.0-0  parallelly_1.37.1     
 [37] lmtest_0.9-40          scattermore_1.2        Rcpp_1.0.12           
 [40] IRkernel_1.3.2         tensor_1.5             future.apply_1.11.2   
 [43] zoo_1.8-12             base64enc_0.1-3        sctransform_0.4.1     
 [46] httpuv_1.6.15          Matrix_1.7-0           splines_4.4.0         
 [49] igraph_2.0.3           tidyselect_1.2.1       abind_1.4-5           
 [52] spatstat.random_3.3-1  codetools_0.2-20       miniUI_0.1.1.1        
 [55] spatstat.explore_3.3-1 listenv_0.9.1          lattice_0.22-6        
 [58] tibble_3.2.1           plyr_1.8.9             shiny_1.8.1.1         
 [61] ROCR_1.0-11            evaluate_0.24.0        Rtsne_0.17            
 [64] future_1.33.2          fastDummies_1.7.3      survival_3.5-8        
 [67] polyclip_1.10-6        fitdistrplus_1.2-1     pillar_1.9.0          
 [70] KernSmooth_2.23-22     plotly_4.10.4          generics_0.1.3        
 [73] RcppHNSW_0.6.0         IRdisplay_1.1          ggplot2_3.5.1         
 [76] munsell_0.5.1          scales_1.3.0           globals_0.16.3        
 [79] xtable_1.8-4           glue_1.7.0             lazyeval_0.2.2        
 [82] tools_4.4.0            data.table_1.15.4      RSpectra_0.16-1       
 [85] pbdZMQ_0.3-10          RANN_2.6.1             leiden_0.4.3.1        
 [88] dotCall64_1.1-1        cowplot_1.1.3          grid_4.4.0            
 [91] tidyr_1.3.1            colorspace_2.1-0       nlme_3.1-164          
 [94] patchwork_1.2.0        repr_1.1.6             cli_3.6.3             
 [97] spatstat.sparse_3.1-0  spam_2.10-0            fansi_1.0.6           
[100] viridisLite_0.4.2      dplyr_1.1.4            uwot_0.2.2            
[103] gtable_0.3.5           digest_0.6.36          progressr_0.14.0      
[106] ggrepel_0.9.5          htmlwidgets_1.6.4      htmltools_0.5.8.1     
[109] lifecycle_1.0.4        httr_1.4.7             mime_0.12             
[112] MASS_7.3-60.2         
@joshuak94 joshuak94 added the bug Something isn't working label Jul 19, 2024
@rsatija
Copy link
Collaborator

rsatija commented Jul 19, 2024

Thank you for sending this, which is very helpful for us to debug.

Can you check if the rownames of your metadata matches the column names of your object? i.e., all(rownames(object@meta.data)==colnames(object)) if your object is called object?

This relates to #9125

Let us know , and we will take a look early next week and get back to you ASAP.

@joshuak94
Copy link
Author

all(rownames(data_seurat_115@meta.data)==colnames(data_seurat_115)) 
all(rownames(data_seurat_135@meta.data)==colnames(data_seurat_135)) 

Both yield TRUE.

@joshuak94
Copy link
Author

Hi @rsatija, I was wondering if there was an update regarding this issue?

@xlucpu
Copy link

xlucpu commented Oct 18, 2024

same issue here but for Xenium data, No idea why and how to resolve it.

xenium.obj <- SCTransform(xenium.obj, assay = "Xenium")
Running SCTransform on assay: Xenium
Running SCTransform on layer: counts
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Variance stabilizing transformation of count matrix of size 377 by 376392
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 376 genes, 5000 cells
Found 2 outliers - those will be ignored in fitting/regularization step

Second step: Get residuals using fitted parameters for 377 genes
Error in asMethod(object) :
(converted from warning) sparse->dense coercion: allocating vector of size 1.1 GiB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants