Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory crash with CreateFragmentObject #723

Closed
strawberry789 opened this issue Jul 20, 2021 · 5 comments
Closed

memory crash with CreateFragmentObject #723

strawberry789 opened this issue Jul 20, 2021 · 5 comments

Comments

@strawberry789
Copy link

Hi,

When I ran CreateFragmentObject on Jupyter notebook, the kernel dies, even when I set the memory to:
options(future.globals.maxSize = 150000 * 1024^2) # for 150 Gb RAM

Then I tried R in Mobaxterm without Jupyter, and I get:
*** caught segfault *** address (nil), cause 'memory not mapped'

How can I resolve this issue?

Thank you.

@timoast
Copy link
Collaborator

timoast commented Jul 21, 2021

Please include the full code and output of sessionInfo()

@strawberry789
Copy link
Author

strawberry789 commented Jul 22, 2021

Here's the session info:

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] future_1.20.1 GenomicRanges_1.42.0 GenomeInfoDb_1.26.1
[4] IRanges_2.24.0 S4Vectors_0.28.0 BiocGenerics_0.36.0
[7] SeuratObject_4.0.2 Seurat_4.0.3 Signac_1.1.1

loaded via a namespace (and not attached):
[1] reticulate_1.18 tidyselect_1.1.0
[3] RSQLite_2.2.1 AnnotationDbi_1.52.0
[5] htmlwidgets_1.5.2 grid_4.0.2
[7] BiocParallel_1.24.1 Rtsne_0.15
[9] munsell_0.5.0 codetools_0.2-16
[11] ica_1.0-2 pbdZMQ_0.3-3.1
[13] miniUI_0.1.1.1 colorspace_2.0-0
[15] OrganismDbi_1.32.0 Biobase_2.50.0
[17] knitr_1.30 uuid_0.1-4
[19] rstudioapi_0.13 ROCR_1.0-11
[21] tensor_1.5 listenv_0.8.0
[23] MatrixGenerics_1.2.0 repr_1.1.0
[25] GenomeInfoDbData_1.2.4 polyclip_1.10-0
[27] farver_2.0.3 bit64_4.0.5
[29] parallelly_1.21.0 vctrs_0.3.5
[31] generics_0.1.0 xfun_0.19
[33] biovizBase_1.38.0 BiocFileCache_1.14.0
[35] lsa_0.73.2 ggseqlogo_0.1
[37] R6_2.5.0 AnnotationFilter_1.14.0
[39] reshape_0.8.8 bitops_1.0-6
[41] spatstat.utils_2.2-0 DelayedArray_0.16.0
[43] assertthat_0.2.1 promises_1.1.1
[45] scales_1.1.1 nnet_7.3-14
[47] gtable_0.3.0 globals_0.14.0
[49] goftest_1.2-2 ggbio_1.38.0
[51] ensembldb_2.14.0 rlang_0.4.9
[53] RcppRoll_0.3.0 splines_4.0.2
[55] rtracklayer_1.50.0 lazyeval_0.2.2
[57] dichromat_2.0-0 spatstat.geom_2.2-0
[59] checkmate_2.0.0 BiocManager_1.30.16
[61] reshape2_1.4.4 abind_1.4-5
[63] GenomicFeatures_1.42.1 backports_1.2.0
[65] httpuv_1.5.4 Hmisc_4.4-2
[67] RBGL_1.66.0 tools_4.0.2
[69] ggplot2_3.3.2 ellipsis_0.3.1
[71] spatstat.core_2.2-0 RColorBrewer_1.1-2
[73] ggridges_0.5.2 Rcpp_1.0.5
[75] plyr_1.8.6 base64enc_0.1-3
[77] progress_1.2.2 zlibbioc_1.36.0
[79] purrr_0.3.4 RCurl_1.98-1.2
[81] prettyunits_1.1.1 rpart_4.1-15
[83] openssl_1.4.3 deldir_0.2-3
[85] pbapply_1.4-3 cowplot_1.1.0
[87] zoo_1.8-9 SummarizedExperiment_1.20.0
[89] ggrepel_0.8.2 cluster_2.1.0
[91] magrittr_2.0.1 data.table_1.13.2
[93] scattermore_0.7 lmtest_0.9-38
[95] RANN_2.6.1 SnowballC_0.7.0
[97] ProtGenerics_1.22.0 fitdistrplus_1.1-1
[99] matrixStats_0.57.0 hms_0.5.3
[101] patchwork_1.1.0 mime_0.9
[103] evaluate_0.14 xtable_1.8-4
[105] XML_3.99-0.5 jpeg_0.1-8.1
[107] gridExtra_2.3 compiler_4.0.2
[109] biomaRt_2.46.0 tibble_3.0.4
[111] KernSmooth_2.23-17 crayon_1.3.4
[113] htmltools_0.5.1.1 mgcv_1.8-31
[115] later_1.1.0.1 Formula_1.2-4
[117] tidyr_1.1.2 DBI_1.1.0
[119] tweenr_1.0.1 dbplyr_2.0.0
[121] MASS_7.3-51.6 rappdirs_0.3.1
[123] Matrix_1.3-4 igraph_1.2.6
[125] pkgconfig_2.0.3 GenomicAlignments_1.26.0
[127] foreign_0.8-80 IRdisplay_0.7.0
[129] plotly_4.9.2.1 spatstat.sparse_2.0-0
[131] xml2_1.3.2 XVector_0.30.0
[133] stringr_1.4.0 VariantAnnotation_1.36.0
[135] digest_0.6.27 sctransform_0.3.2
[137] RcppAnnoy_0.0.18 graph_1.68.0
[139] spatstat.data_2.1-0 Biostrings_2.58.0
[141] leiden_0.3.5 fastmatch_1.1-0
[143] htmlTable_2.1.0 uwot_0.1.9
[145] curl_4.3 shiny_1.5.0
[147] Rsamtools_2.6.0 lifecycle_0.2.0
[149] nlme_3.1-148 jsonlite_1.7.1
[151] viridisLite_0.3.0 askpass_1.1
[153] BSgenome_1.58.0 pillar_1.4.7
[155] GGally_2.0.0 lattice_0.20-41
[157] fastmap_1.0.1 httr_1.4.2
[159] survival_3.1-12 glue_1.4.2
[161] png_0.1-7 bit_4.0.4
[163] ggforce_0.3.2 stringi_1.5.3
[165] blob_1.2.1 latticeExtra_0.6-29
[167] memoise_1.1.0 IRkernel_1.1.1
[169] dplyr_1.0.2 irlba_2.3.3
[171] future.apply_1.6.0

And here's the code:

frags.x <- CreateFragmentObject( path = "path/fragments.tsv.gz", cells = rownames(md.x) )

where md.x is:

md.x <- read.table( file = "path/singlecell.csv", stringsAsFactors = FALSE, sep = ",", header = TRUE, row.names = 1 )[-1, ]

@timoast
Copy link
Collaborator

timoast commented Jul 22, 2021

This should be fixed in the latest Signac release, please update the package and reopen if you still have issues when using the latest version

@timoast timoast closed this as completed Jul 22, 2021
@ghuls
Copy link

ghuls commented Sep 13, 2022

We found another memory crash.

Our fragments files contain merged CBs in CB1_CB2_CB3_..._CBn format, which lead to way longer lines than the code in validate.cpp assumes.

atac_tmp <- CreateSeuratObject(atac_tmp_data_subset, assay='ATAC')


    # annotation=annotations)tmp, slot = "counts", assay='ATAC'),)),

Attaching SeuratObject
Attaching sp
Registered S3 method overwritten by 'SeuratDisk':
  method            from
  as.sparse.H5Group Seurat
Class: loom
Filename: BIO_ddseq_1.FIXEDCELLS__cto.scrublet0-4.fmx.singlets.loom
Access type: H5F_ACC_RDONLY
Attributes: last_modified
Listing:
       name    obj_type   dataset.dims dataset.type_class
      attrs   H5I_GROUP           <NA>               <NA>
  col_attrs   H5I_GROUP           <NA>               <NA>
 col_graphs   H5I_GROUP           <NA>               <NA>
     layers   H5I_GROUP           <NA>               <NA>
     matrix H5I_DATASET 5942 x 1706245        H5T_INTEGER
  row_attrs   H5I_GROUP           <NA>               <NA>
 row_graphs   H5I_GROUP           <NA>               <NA>
Reading in /matrix

Storing /matrix as counts
Saving /matrix to assay 'ATAC'
Computing hash
Checking for 5942 cell barcodes

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: validateCells(fragments = filepath, cells = cells, find_n = find_n,     max_lines = max.lines, verbose = verbose)
 2: ValidateCells(object = frags, verbose = verbose, ...)
 3: CreateFragmentObject(path = fragments, cells = cells, validate.fragments = validate.fragments,     verbose = verbose, ...)
 4: CreateChromatinAssay(counts = GetAssayData(atac_tmp, slot = "counts",     assay = "ATAC"), fragments = f_frag, ranges = regions, verbose = TRUE)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
Selection:
Selection:

I think somewhere in validate outdated variables are used after the vectors are reallocated if they are too small.

The following solves the problem for the files we had, but does not solve the core issue, I believe.

diff --git a/src/validate.cpp b/src/validate.cpp
index ea31b25..4907d26 100644
--- a/src/validate.cpp
+++ b/src/validate.cpp
@@ -29,7 +29,7 @@ bool validateCells(
   char* cb_char;
   size_t line_counter {1};
   size_t total_seen {0};
-  uint32_t buffer_length = 256;
+  uint32_t buffer_length = 4096;
   char *buffer = new char[buffer_length];

   // Hash Map storing the barcodes to look for
@@ -47,7 +47,7 @@ bool validateCells(

   // char * to string extraction
   std::string cb_seq, line_seq;
-  cb_seq.reserve(32);
+  cb_seq.reserve(4096);
   line_seq.reserve(buffer_length);

   // skip header if present

timoast added a commit that referenced this issue Sep 13, 2022
@timoast
Copy link
Collaborator

timoast commented Sep 13, 2022

Hi @ghuls, thanks for reporting. This should now be fixed on the develop branch, and I added some additional error checking to avoid a crash if the buffer size is exceeded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants