Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash problem #1

Open
rson920 opened this issue Feb 18, 2018 · 18 comments
Open

crash problem #1

rson920 opened this issue Feb 18, 2018 · 18 comments

Comments

@rson920
Copy link

rson920 commented Feb 18, 2018

When I started to run the function counts<-readCounts(features,bam,targets,readLength=100L,maxISize=50000). After a few minutes, the R session was abnormally terminated due to unexpected crash. Did you know how to solve this problem?

sessionInfo():
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_3.4.2 tools_3.4.2

@estepi
Copy link
Owner

estepi commented Feb 19, 2018

Hi!
How much memory do you have? How many BAMs are you overlapping with your "features" object?
Did you try with the "toy" dataset, as explained in the vignette/help?
Thanks in advance!

@rson920
Copy link
Author

rson920 commented Feb 28, 2018 via email

@estepi
Copy link
Owner

estepi commented Feb 28, 2018

Hi Renhua,
Great. The toy dataset is small, you are overlapping few reads against a small portion of the genome.
I guess is something related to memory.
Which organism are you using and how big are your original bams?
It could be useful if you try to overlap only 1 bam file. Subset your bam object (it is a list) like this: bams[[1]]
let me know how it goes
Estefi

@bellenger-l
Copy link

bellenger-l commented Feb 13, 2019

Hello,

I encounter the same issue. I originally work on my local machine but when I used readCounts I had :

> counts <- readCounts(features, bams, targets, cores = 1, readLength = 50, maxISize = 50000)
Read summarization by gene completed
Read summarization by bin completed
Read summarization by ei1 region completed
Read summarization by ie2 region completed
Error: vector memory exhausted (limit reached?)

I have 4 bam files and I am currently working on the mm10 genome.
So I tried on a server and in this time it's never end (the last print is Read summarization by ie2 region completed) and I eventually get a broken pipe.

When I tried to overlap only 1 bam file (as suggested) I have these errors :
On my local machine :

> counts <- readCounts(features, bams[[1]], cores = 1, readLength = 50, maxISize = 50000)
Error: vector memory exhausted (limit reached?)

And on the Ubuntu server :

> counts <- readCounts(features, bams[[1]], cores = 1, readLength = 50, maxISize = 50000)
Error: C stack usage  7969844 is too close to the limit

I am using : ASpli_1.8.1 and GenomicFeatures_1.34.3.

What can I do to overcome this issue ?

Best,
Lea

@estepi
Copy link
Owner

estepi commented Feb 14, 2019

Hi all, thanks for the comments and really sorry for the inconvenience.
The problem arises at the step of summarizing junctions. When your genome is big and has a lot of introns (~junctions) it is a high demanding step. At this point, known and new junctions are summarized and compared against annotation. We aim to fix the bug and include in the next release (April)

@CriticalPeriod
Copy link

Hi estepi,
I also wanted to run your program !
I have the same error as bellenger above. Do you know if there is something I can do to overcome it ?
I'm totally new in R, maybe there's some set-up I did wrong ? I have 16gb ram on mac mini (2 cores i7, 16gb ram, 500gb DD, OS mojave), and I want to run with 4 bam files (39,4 gb in total). For other programs, it works fine. Maybe there is a setup to do to allow more memory to R ?

bamFiles <- c("DATA_RNA_SEQ_P30/filtered_mapper_results_p30+--1.bam","DATA_RNA_SEQ_P30/filtered_mapper_results_p30+--2.bam","DATA_RNA_SEQ_P30/filtered_mapper_results_p30++-1.bam", "DATA_RNA_SEQ_P30/filtered_mapper_results_p30++-2.bam")
targets <- data.frame( row.names = c("hetero_rep1", "hetero_rep2", "homo_rep1", "homo_rep2"),

  •           bam = bamFiles,
    
  •           genotype = c("hetero","hetero", "homo", "homo"),
    
  •           stringsAsFactors = FALSE )
    

targets
bam genotype
hetero_rep1 DATA_RNA_SEQ_P30/filtered_mapper_results_p30+--1.bam hetero
hetero_rep2 DATA_RNA_SEQ_P30/filtered_mapper_results_p30+--2.bam hetero
homo_rep1 DATA_RNA_SEQ_P30/filtered_mapper_results_p30++-1.bam homo
homo_rep2 DATA_RNA_SEQ_P30/filtered_mapper_results_p30++-2.bam homo
bam <- loadBAM(targets)
counts <- readCounts (

  •  features,
    
  •  bam,
    
  •  targets,
    
  •  cores = 1,
    
  •  readLength = 50L,
    
  •  maxISize = 50000,
    
  •  minAnchor = 10 )
    

Read summarization by gene completed
Read summarization by bin completed
Read summarization by ei1 region completed
Read summarization by ie2 region completed
Erreur : vecteurs de mémoire épuisés (limite atteinte ?)
sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] ASpli_1.8.1 edgeR_3.24.3 limma_3.38.3 GenomicFeatures_1.34.6
[5] AnnotationDbi_1.44.0 Biobase_2.42.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.2
[9] IRanges_2.16.0 S4Vectors_0.20.1 BiocGenerics_0.28.0

loaded via a namespace (and not attached):
[1] ProtGenerics_1.14.0 bitops_1.0-6 matrixStats_0.54.0
[4] bit64_0.9-7 RColorBrewer_1.1-2 progress_1.2.0
[7] httr_1.4.0 tools_3.5.3 backports_1.1.3
[10] R6_2.4.0 rpart_4.1-13 Hmisc_4.2-0
[13] DBI_1.0.0 lazyeval_0.2.2 Gviz_1.26.5
[16] colorspace_1.4-1 nnet_7.3-12 gridExtra_2.3
[19] prettyunits_1.0.2 curl_3.3 bit_1.1-14
[22] compiler_3.5.3 htmlTable_1.13.1 DelayedArray_0.8.0
[25] rtracklayer_1.42.2 scales_1.0.0 checkmate_1.9.1
[28] stringr_1.4.0 digest_0.6.18 Rsamtools_1.34.1
[31] foreign_0.8-71 rmarkdown_1.12 XVector_0.22.0
[34] base64enc_0.1-3 dichromat_2.0-0 pkgconfig_2.0.2
[37] htmltools_0.3.6 ensembldb_2.6.7 BSgenome_1.50.0
[40] htmlwidgets_1.3 rlang_0.3.2 rstudioapi_0.10
[43] RSQLite_2.1.1 BiocParallel_1.16.6 acepack_1.4.1
[46] VariantAnnotation_1.28.13 RCurl_1.95-4.12 magrittr_1.5
[49] GenomeInfoDbData_1.2.0 Formula_1.2-3 Matrix_1.2-17
[52] Rcpp_1.0.1 munsell_0.5.0 stringi_1.4.3
[55] yaml_2.2.0 SummarizedExperiment_1.12.0 zlibbioc_1.28.0
[58] plyr_1.8.4 grid_3.5.3 blob_1.1.1
[61] crayon_1.3.4 lattice_0.20-38 Biostrings_2.50.2
[64] splines_3.5.3 hms_0.4.2 locfit_1.5-9.1
[67] knitr_1.22 pillar_1.3.1 biomaRt_2.38.0
[70] XML_3.98-1.19 evaluate_0.13 biovizBase_1.30.1
[73] latticeExtra_0.6-28 data.table_1.12.0 BiocManager_1.30.4
[76] gtable_0.2.0 assertthat_0.2.1 ggplot2_3.1.0
[79] xfun_0.5 AnnotationFilter_1.6.0 survival_2.43-3
[82] tibble_2.1.1 GenomicAlignments_1.18.1 memoise_1.1.0
[85] cluster_2.0.7-1 BiocStyle_2.10.0

Thanks a lot !
David

@estepi
Copy link
Owner

estepi commented Mar 27, 2019

Hi, thanks for your comment. Maybe memory is not enough. I used to work with more 64Gb.
The problem arises at the step of overall summarization of junctions, which is a high demanding step.We know and we are trying to fix it.
Are you able to run readCounts in 1 bam ?

@CriticalPeriod
Copy link

CriticalPeriod commented Mar 27, 2019 via email

@CriticalPeriod
Copy link

CriticalPeriod commented Mar 27, 2019 via email

@estepi
Copy link
Owner

estepi commented Mar 27, 2019

Hi! Great! (at least 1 issue solved :P ) In the meantime you can export your counts and start analyzing with many tools.
I'm curious about which aligner did you use. It is a "splice aware aligner"? Do you have "junctions" in your BAM? You can check with samtools or visually with a sashimi plot. Let me know

Thanks!

@CriticalPeriod
Copy link

CriticalPeriod commented Mar 27, 2019 via email

@estepi
Copy link
Owner

estepi commented Mar 27, 2019

First, just check if you have junctions in your BAMs. Use simple sashimi plot in one of your BAM, your preferred gene. If dont, for the analysis of alternatitve splicing you should re align, yes.
I use STAR

@CriticalPeriod
Copy link

CriticalPeriod commented Apr 1, 2019 via email

@estepi
Copy link
Owner

estepi commented Apr 4, 2019

Hi, Im sorry for the delay.
here are my parameters, I put OK if they are the same, in other case, I put my values:

--outFilterType BySJout OK
--outFilterMultimapNmax 20 OK
--alignSJoverhangMin 8 OK
--alignSJDBoverhangMin 1 -> 5
--alignIntronMin 20 OK
--alignIntronMax 1000000 OK
--alignMatesGapMax 1000000 Ok
--outSAMattributes Standard OK
--outSAMunmapped Within Ok
--outSAMtype BAM SortedByCoordinate OK
cheers
Estefi

@CriticalPeriod
Copy link

CriticalPeriod commented Apr 12, 2019 via email

@qicaibiology
Copy link

Hello,

I encounter the same issue. I originally work on my local machine but when I used readCounts I had :

> counts <- readCounts(features, bams, targets, cores = 1, readLength = 50, maxISize = 50000)
Read summarization by gene completed
Read summarization by bin completed
Read summarization by ei1 region completed
Read summarization by ie2 region completed
Error: vector memory exhausted (limit reached?)

I have 4 bam files and I am currently working on the mm10 genome.
So I tried on a server and in this time it's never end (the last print is Read summarization by ie2 region completed) and I eventually get a broken pipe.

When I tried to overlap only 1 bam file (as suggested) I have these errors :
On my local machine :

> counts <- readCounts(features, bams[[1]], cores = 1, readLength = 50, maxISize = 50000)
Error: vector memory exhausted (limit reached?)

And on the Ubuntu server :

> counts <- readCounts(features, bams[[1]], cores = 1, readLength = 50, maxISize = 50000)
Error: C stack usage  7969844 is too close to the limit

I am using : ASpli_1.8.1 and GenomicFeatures_1.34.3.

What can I do to overcome this issue ?

Best,
Lea

Hi:
I used mm10.gtf also takes a lot of memory and did not finish (I ran it on server and used 400G). Later I tried to use "TxDb.Mmusculus.UCSC.mm10.knownGene" from bioconductor as TxDb object and it went through.

Hope it will help,

@estepi
Copy link
Owner

estepi commented Aug 9, 2019

Hi all.
Can you confirm you can reads 50bp long?
From my experience, recently I did run ASpli with Mmu standard annotation, using less than 100Gb RAm, 12 samples and it was possible.
I need this 2 infos:

-Can you confirm you have junctions in your bam? I see your reads are 50pb. I never did an alignment with reads shorter than 100. Which aligner did you use and against which file?

-Can you repeat this command, using this notation:

counts <- readCounts(features, bams[1], cores = 1, readLength = 50, maxISize = 50000)

(I changed bam[[1]] by bam[1])

Did you run the example code?

Thanks in advance

@qicaibiology
Copy link

The example code works well. For my situation, I did 75bp paired-end sequencing so in my command the readLength =75. I used STAR to align my fastq files using mm10 reference genome from illunima website (https://support.illumina.com/sequencing/sequencing_software/igenome.html)

Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants