Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"did not converge" Error on cellbender3 #97

Closed
IrinaVKuznetsova opened this issue Feb 15, 2024 · 8 comments
Closed

"did not converge" Error on cellbender3 #97

IrinaVKuznetsova opened this issue Feb 15, 2024 · 8 comments

Comments

@IrinaVKuznetsova
Copy link

IrinaVKuznetsova commented Feb 15, 2024

Dear scDblFinder developer,

This is a first time I am trying to use your tool. Unfortunately , I am getting an error and not sure how to fix it:
Running on Linux, Ubuntu with 250 RAM, CPU: 64, 3T free space

# 1.0 Validate assay version of the Seurat object -  Assay-v5
cell_bender_seurat[["RNA"]]  # Assay (v5) data with 36601 features for 77863 cell

# 1.1 Convert v5 to v3.
cell_bender_seurat[["RNA3"]] <- as(object = cell_bender_seurat[["RNA"]], Class = "Assay")
cell_bender_seurat
cell_bender_seurat[["RNA3"]]  # Assay  data with 36601 features for 77863 cells

# 1.2 Convert to sce
sce = as.SingleCellExperiment(cell_bender_seurat, assay ="RNA3")
sce

class: SingleCellExperiment
dim: 36601 75331
metadata(0):
assays(2): counts logcounts
rownames(36601): MIR1302-2HG FAM138A ... AC007325.4 AC007325.2
rowData names(0):
colnames(75331): L25_ACGCAGCCAAACAACA-1 L25_CGACCTTTCGATCCCT-1 ...
  S55_GTTAAGCGTCTAGGTT-1 S55_TGCTACCGTCGCGTGT-1
colData names(23): orig.ident nCount_RNA ... clonotype_id ident
reducedDimNames(5): PCA INTEGRATED.CCA INTEGRATED.RPCA UMAP.CCA
  UMAP.SCVI
mainExpName: RNA3
altExpNames(0):


# 1.3 Find doublets (multiple samples x8)
sce.standard <- scDblFinder(sce, samples = "orig.ident", BPPARAM=MulticoreParam(20))   # fails, error message above

_Error in manager$availability[[as.character(result$node)]] <- TRUE :
  wrong args for environment subassignment
Error in serialize(data, node$con, xdr = FALSE) :
  error writing to connection

Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE,  :
  did not converge--results might be invalid!; try increasing work or maxit
Stop worker failed with the error: wrong args for environment subassignment_

I'd appreciate any suggestions.
Thank you

sessionInfo()

R version 4.3.2 (2023-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS/LAPACK: /data/bin/conda_env_location/PDX_manuscript_2023_v2/lib/libopenblasp-r0.3.26.so; LAPACK version 3.12.0

locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] BiocParallel_1.36.0 scDblFinder_1.16.0
[3] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
[5] Biobase_2.62.0 GenomicRanges_1.54.1
[7] GenomeInfoDb_1.38.1 IRanges_2.36.0
[9] S4Vectors_0.40.2 BiocGenerics_0.48.1
[11] MatrixGenerics_1.14.0 matrixStats_1.2.0
[13] Seurat_5.0.1 SeuratObject_5.0.0
[15] sp_2.1-3

loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.22 splines_4.3.2
[3] later_1.3.2 BiocIO_1.12.0
[5] bitops_1.0-7 tibble_3.2.1
[7] polyclip_1.10-6 XML_3.99-0.16.1
[9] fastDummies_1.7.3 lifecycle_1.0.4
[11] edgeR_4.0.2 globals_0.16.2
[13] lattice_0.22-5 MASS_7.3-60
[15] magrittr_2.0.3 limma_3.58.1
[17] plotly_4.10.4 yaml_2.3.8
[19] metapod_1.10.0 httpuv_1.6.14
[21] sctransform_0.4.1 spam_2.10-0
[23] spatstat.sparse_3.0-3 reticulate_1.35.0
[25] cowplot_1.1.3 pbapply_1.7-2
[27] RColorBrewer_1.1-3 abind_1.4-5
[29] zlibbioc_1.48.0 Rtsne_0.17
[31] purrr_1.0.2 RCurl_1.98-1.14
[33] GenomeInfoDbData_1.2.11 ggrepel_0.9.5
[35] irlba_2.3.5.1 listenv_0.9.1
[37] spatstat.utils_3.0-4 goftest_1.2-3
[39] RSpectra_0.16-1 dqrng_0.3.2
[41] spatstat.random_3.2-2 fitdistrplus_1.1-11
[43] parallelly_1.36.0 DelayedMatrixStats_1.24.0
[45] leiden_0.4.3.1 codetools_0.2-19
[47] DelayedArray_0.28.0 scuttle_1.12.0
[49] tidyselect_1.2.0 ScaledMatrix_1.10.0
[51] viridis_0.6.5 spatstat.explore_3.2-6
[53] GenomicAlignments_1.38.0 jsonlite_1.8.8
[55] BiocNeighbors_1.20.0 ellipsis_0.3.2
[57] progressr_0.14.0 ggridges_0.5.6
[59] survival_3.5-7 scater_1.30.1
[61] tools_4.3.2 ica_1.0-3
[63] Rcpp_1.0.12 glue_1.7.0
[65] gridExtra_2.3 SparseArray_1.2.2
[67] dplyr_1.1.4 fastmap_1.1.1
[69] bluster_1.12.0 fansi_1.0.6
[71] digest_0.6.34 rsvd_1.0.5
[73] R6_2.5.1 mime_0.12
[75] colorspace_2.1-0 scattermore_1.2
[77] tensor_1.5 spatstat.data_3.0-4
[79] utf8_1.2.4 tidyr_1.3.1
[81] generics_0.1.3 data.table_1.14.10
[83] rtracklayer_1.62.0 httr_1.4.7
[85] htmlwidgets_1.6.4 S4Arrays_1.2.0
[87] uwot_0.1.16 pkgconfig_2.0.3
[89] gtable_0.3.4 lmtest_0.9-40
[91] XVector_0.42.0 htmltools_0.5.7
[93] dotCall64_1.1-1 scales_1.3.0
[95] png_0.1-8 scran_1.30.0
[97] reshape2_1.4.4 rjson_0.2.21
[99] nlme_3.1-164 zoo_1.8-12
[101] stringr_1.5.1 KernSmooth_2.23-22
[103] parallel_4.3.2 miniUI_0.1.1.1
[105] vipor_0.4.7 restfulr_0.0.15
[107] pillar_1.9.0 grid_4.3.2
[109] vctrs_0.6.5 RANN_2.6.1
[111] promises_1.2.1 BiocSingular_1.18.0
[113] beachmat_2.18.0 xtable_1.8-4
[115] cluster_2.1.6 beeswarm_0.4.0
[117] locfit_1.5-9.8 cli_3.6.2
[119] compiler_4.3.2 Rsamtools_2.18.0
[121] rlang_1.1.3 crayon_1.5.2
[123] future.apply_1.11.1 plyr_1.8.9
[125] ggbeeswarm_0.7.2 stringi_1.8.3
[127] viridisLite_0.4.2 deldir_2.0-2
[129] munsell_0.5.0 Biostrings_2.70.1
[131] lazyeval_0.2.2 spatstat.geom_3.2-8
[133] Matrix_1.6-1.1 RcppHNSW_0.6.0
[135] patchwork_1.2.0 sparseMatrixStats_1.14.0
[137] future_1.33.1 ggplot2_3.4.4
[139] statmod_1.5.0 shiny_1.8.0
[141] ROCR_1.0-11 igraph_1.6.0
[143] xgboost_2.0.3.1

@plger
Copy link
Owner

plger commented Feb 15, 2024

Hi,
I've never seen this error, but this could be a memory and/or multithreading issue.
I'd recommend to check the following:

  1. monitor your RAM usage when running scDblFinder (e.g. using htop).
  2. the package per se is not very memory hungry (it's been ran on much larger datasets), but the object itself can be, in particular earlier versions of as.SingleCellExperiment had a bug that made the object huge (although this should be solved in the version you're using). So check the size (e.g. using format(object.size(x), units="Gb") of both cell_bender_seurat and sce. If you see that sce is much bigger, you can always skip the conversion and run scDblFinder with something like:
sce <- scDblFinder(GetAssayData(cell_bender_seurat, slot="counts", assay="RNA3"), 
                   samples=cell_bender_seurat$orig.ident)
  1. If from htop it does seem to be memory-related, try reducing the number of threads (or eventually using a single one).

@IrinaVKuznetsova
Copy link
Author

thank you for the prompt response

  1. It looks normal (below 1%)
  2. seems ok
format(object.size(cell_bender_seurat), units="Gb") #  "8 Gb"
format(object.size(sce), units="Gb")   #  "2.9 Gb"

A) Could it be something to do with how Seurat v.5 has layers ( 8 sample 8 layers for counts for example), and when I convert it to Array v.3 it becomes one matrix 36601 x 75331?

B) Tried to run without threads:

sce.standard <- scDblFinder(sce, samples = "orig.ident")

Warning messages:
1: In rpois(nrow(x) * length(wAd), as.numeric(as.matrix(x[, wAd]))) :
  NAs produced
2: In value[[3L]](cond) :
  Error in calculating norm factors:Error in .local(x, ...): size factors should be positive

C) Tried this too

sce <- scDblFinder(GetAssayData(cell_bender_seurat, slot="counts", assay="RNA3"),
                   samples=cell_bender_seurat$orig.ident)
Error in .checkSCE(sce) :
  `sce` should be a SingleCellExperiment, a SummarizedExperiment, or an array (i.e. matrix, sparse matric, etc.) of counts.
In addition: Warning message:
The `slot` argument of `GetAssayData()` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the `layer` argument instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

@plger
Copy link
Owner

plger commented Feb 15, 2024

Not sure I understand your question A, the original Seurat object also has dimensions 36601 x 75331...

  • Can you check class(GetAssayData(cell_bender_seurat, layer="counts", assay="RNA3"))?
  • Can you check quantile(colSums(counts(sce)))
  • Can you try this:
    sce.standard <- scDblFinder(sce[VariableFeatures(cell_bender_seurat),], samples = "orig.ident")

@IrinaVKuznetsova
Copy link
Author

class(GetAssayData(cell_bender_seurat, layer="counts", assay="RNA3"))
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"
quantile(colSums(counts(sce)))
   0%   25%   50%   75%  100%
  201   650  2209  5732 81977
  1. It is running since 1 hr - I hope it is a good sign
    sce.standard <- scDblFinder(sce[VariableFeatures(cell_bender_seurat),], samples = "orig.ident")

@plger
Copy link
Owner

plger commented Feb 15, 2024

I'm unsure what's the issue here, but it appears to be related to 1) the fact that you have cells with a very low library size (your 201 is crap, personally I'd have filtered out many) and 2) the feature selection internal to scDblFinder might have resulted in some cells not having reads in those features. This appears to have been solved by using the VariableFeatures (which is a perfectly decent way of doing things), or would most likely also be solved by filtering out cells with a low library size (e.g. taking >=400-500).

If you want you can try again with multithreading, user either of these 2 solutions.

@IrinaVKuznetsova
Copy link
Author

how long in average does it take to run scDblFinder ?

  1. its been ~5 hrs
  2. filtered out data, which eventually crashed
quantile(colSums(counts(sce)))
   0%   25%   50%   75%  100%
  451  1189  3332  6480 81977
sce.standard <- scDblFinder(sce, samples = "orig.ident", BPPARAM=MulticoreParam(8))
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE,  :
  convergence criterion below machine epsilon
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE,  :
  did not converge--results might be invalid!; try increasing work or maxit

Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE,  :
  convergence criterion below machine epsilon
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE,  :
  did not converge--results might be invalid!; try increasing work or maxit

Stop worker failed with the error: wrong args for environment subassignment

@IrinaVKuznetsova
Copy link
Author

I figure out why I was getting that error, few steps back in my analysis:

I removed ambient RNA with Cell Bender v3, which generated negative values in the count matrix, that's why scDblFinder() was not able to process my data. The issue about cell bender generating a negative count matrix is discussed here htps://github.com/broadinstitute/CellBender/issues/306. To fix it run Cellbender v.2 re-run scDblFinder()

all works, quite quickly
Cheers.

@plger
Copy link
Owner

plger commented Feb 29, 2024

Hi,
Great that we have an explanation, thanks for coming back on this.
I've now added in the devel version a check of that so that a more useful error message is provided.
Best,
Pierre-Luc

@plger plger changed the title Error in serialize(data, node$con, xdr = FALSE) "did not converge" Error on cellbender3 Mar 12, 2024
@plger plger closed this as completed Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants