Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strange segfault #6

Closed
ekg opened this issue Mar 15, 2018 · 12 comments
Closed

strange segfault #6

ekg opened this issue Mar 15, 2018 · 12 comments

Comments

@ekg
Copy link

ekg commented Mar 15, 2018

I'm testing minia using ~400 input fastq.gz files. I observed a strange segfault and was immediately curious if there might be something dependent on the number of input files. The input is ~60G or so, around what might be normal for a lower coverage human assembly, but derived from a reduced representation of the genome (this is for Capsicum, and we're using "genotyping-by-sequencing" data).

Here's the error log:

Minia 3, git commit efef7c7                                                                                                                                    [907/1955]
bglue_algo params, prefix:dummy.unitigs.fa k:5 threads:32
debug: not deleting glue files
setting storage type to hdf5
[Approximating frequencies of minimizers ]  100  %   elapsed:   1 min 34 sec   remaining:   0 min 0  sec   cpu:  99.8 %   mem: [  36,   36,  123] MB
[DSK: nb solid kmers found : 84023707    ]  100  %   elapsed:  49 min 34 sec   remaining:   0 min 0  sec   cpu: 330.7 %   mem: [1866, 6918, 6952] MB
bcalm_algo params, prefix:pepper_pangenome_k51_m3.unitigs.fa k:51 a:3 minsize:10 threads:32 mintype:1
DSK used 1 passes and 608 partitions
prior to queues allocation                      14:29:18     memory [current, maxRSS]: [1863, 6952] MB
Starting BCALM2                                 14:29:18     memory [current, maxRSS]: [1863, 6952] MB
[Iterating DSK partitions                ]  0    %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec
Iterated 711514 kmers, among them 47212 were doubled

In this superbucket (containing 2872 active minimizers),
                  sum of time spent in lambda's: 5177.6 msecs
                                 longest lambda: 21.5 msecs
         tot time of best scheduling of lambdas: 5177.6 msecs
                       best theoretical speedup: 240.6x
Done with partition 0                           14:29:19     memory [current, maxRSS]: [1926, 6952] MB
[Iterating DSK partitions                ]  9.87 %   elapsed:   0 min 35 sec   remaining:   5 min 17 sec
Iterated 332394 kmers, among them 19685 were doubled
Loaded 14123 doubled kmers for partition 61

In this superbucket (containing 539 active minimizers),
                  sum of time spent in lambda's: 2474.9 msecs
                                 longest lambda: 32.3 msecs
         tot time of best scheduling of lambdas: 2474.9 msecs
                       best theoretical speedup: 76.6x
Done with partition 61                          14:29:53     memory [current, maxRSS]: [2018, 6952] MB
[Iterating DSK partitions                ]  19.7 %   elapsed:   0 min 51 sec   remaining:   3 min 29 sec
Iterated 221450 kmers, among them 15343 were doubled
Loaded 13248 doubled kmers for partition 122

In this superbucket (containing 603 active minimizers),
                  sum of time spent in lambda's: 1531.1 msecs
                                 longest lambda: 21.7 msecs
         tot time of best scheduling of lambdas: 1531.1 msecs
                       best theoretical speedup: 70.6x
Done with partition 122                         14:30:10     memory [current, maxRSS]: [2075, 6952] MB
[2]    20638 segmentation fault  minia -in kept_fastqs.txt -kmer-size 51 -abundance-min 3 -out
minia -in kept_fastqs.txt -kmer-size 51 -abundance-min 3 -out   10587.55s user 172.51s system 314% cpu 56:59.59 total
@ekg
Copy link
Author

ekg commented Mar 15, 2018

I should note that this machine has 256G of RAM and I don't appear to have any limitations as far as disk space.

@rchikhi
Copy link
Member

rchikhi commented Mar 15, 2018

Hi Erik!

Curious, and that's definitely not tied to the number of fastq files, as at this stage minia only considers the counted kmers and no longer the input files. That's the first time I see this stage fail, it could be due to some special unhandled case in the graph structure.

I'd be happy to assist with debugging.

  1. Can the data be sent by any chance?
  2. Does it complete with a different kmer size?

@ekg
Copy link
Author

ekg commented Mar 16, 2018

It does complete with a shorter kmer size (41) and the same abundance. I would love to share the data but I will need to ask my collaborators. I will send you an email if I can share. I definitely understand that's the only way to really resolve this problem. Interestingly the k=41 assembly has very few contigs (in the order of a few hundred kb) while the unitig set is several hundred mb. Also this data is a little strange, it's going to consist of many small fragments by library design (GBS/RadSeq).

@rchikhi
Copy link
Member

rchikhi commented Mar 16, 2018

I see. Well, I'll keep an eye for that email, otherwise, just let me know if you encounter that bug again in another dataset. I'd like to get a sense of whether this is a one-of-a-kind thing.

@Mozart1776
Copy link

Hello,
I am also experiencing a segmentation fault of minia at the Iterating DSK partitions step at k41 (see the log file below). I am using a computer cluster with nearly 2TB of RAM and a large disk space. Minia runs without an issue with the test GATB dataset. Any help?
Thanks!

(2018-08-13 22:18:53) GATB-pipeline starting
(2018-08-13 22:18:53) Command line: /home/pavel/bin/gatb-minia-pipeline/gatb -1 /home/pavel/Run_1837_Tyrophagus_putrescentiae_Mex/Sample_79070/2a-removeDuplicates_clumpify/79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_Results/79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.fastq.gz -2 /home/pavel/Run_1837_Tyrophagus_putrescentiae_Mex/Sample_79070/2a-removeDuplicates_clumpify/79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_Results/79070_TGACCA_S1_L003_R2_001_bbduk_no_adapters_deduplicated.fastq.gz -o 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly

(2018-08-13 22:18:53) Setting maximum kmer length to: 151 bp
(2018-08-13 22:18:53) Multi-k values and cutoffs: [(21, 2), (41, 2), (61, 2), (81, 2), (101, 2), (121, 2), (141, 2)]

(2018-08-13 22:18:53) Minia assembling at k=21 min_abundance=2
(2018-08-13 22:18:53) Execution of 'minia/minia'. Command line:
/home/pavel/bin/gatb-minia-pipeline/tools/memused /home/pavel/bin/gatb-minia-pipeline/minia/minia -in 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly.list_reads -kmer-size 21 -abundance-min 2 -out 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly_k21
(2018-08-14 03:36:01) Finished Minia k=21

(2018-08-14 03:36:01) Minia assembling at k=41 min_abundance=2
(2018-08-14 03:36:01) Execution of 'minia/minia'. Command line:
/home/pavel/bin/gatb-minia-pipeline/tools/memused /home/pavel/bin/gatb-minia-pipeline/minia/minia -in 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly.list_reads -kmer-size 41 -abundance-min 2 -out 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly_k41
(2018-08-14 05:06:33) Execution of 'minia/minia' failed. Command line:
/home/pavel/bin/gatb-minia-pipeline/tools/memused /home/pavel/bin/gatb-minia-pipeline/minia/minia -in 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly.list_reads -kmer-size 41 -abundance-min 2 -out 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly_k41
pavel@sagarana:~/Run_1837_Tyrophagus_putrescentiae_Mex/Sample_79070/5d-gatb-pipeline/79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated> cat 0-gatb_79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.pbs.e108512
[Approximating frequencies of minimizers ] 100 % elapsed: 0 min 48 sec remaining: 0 min 0 sec cpu: 99.9 % mem: [ 20, 20, 20] MB
[DSK: nb solid kmers found : 595537654 ] 212 % elapsed: 71 min 3 sec remaining: 0 min 0 sec cpu: 1813.0 % mem: [7845, 8032, 8050] MB
[Iterating DSK partitions ] 99.7 % elapsed: 21 min 6 sec remaining: 0 min 4 sec
[Building BooPHF] 100 % elapsed: 0 min 11 sec remaining: 0 min 0 sec
[removing tips, pass 1 ] 100 % elapsed: 8 min 49 sec remaining: 0 min 0 sec cpu: 25282.3 % mem: [28646, 28646, 33965] MB
[removing tips, pass 2 ] 100 % elapsed: 0 min 39 sec remaining: 0 min 0 sec cpu: 24498.1 % mem: [24143, 28675, 33965] MB
[removing tips, pass 3 ] 100 % elapsed: 0 min 21 sec remaining: 0 min 0 sec cpu: 7850.8 % mem: [24082, 24134, 33965] MB
[removing tips, pass 4 ] 100 % elapsed: 0 min 22 sec remaining: 0 min 0 sec cpu: 8396.4 % mem: [24082, 24082, 33965] MB
[removing tips, pass 5 ] 100 % elapsed: 0 min 21 sec remaining: 0 min 0 sec cpu: 7820.0 % mem: [24082, 24082, 33965] MB
[removing bulges, pass 1 ] 100 % elapsed: 10 min 29 sec remaining: 0 min 0 sec cpu: 22194.7 % mem: [24194, 24194, 33965] MB
[removing bulges, pass 2 ] 100 % elapsed: 9 min 7 sec remaining: 0 min 0 sec cpu: 23147.3 % mem: [23158, 24203, 33965] MB
[removing bulges, pass 3 ] 100 % elapsed: 9 min 10 sec remaining: 0 min 0 sec cpu: 23453.0 % mem: [22105, 23163, 33965] MB
[removing bulges, pass 4 ] 100 % elapsed: 9 min 12 sec remaining: 0 min 0 sec cpu: 23583.9 % mem: [21399, 22098, 33965] MB
[removing bulges, pass 5 ] 100 % elapsed: 9 min 44 sec remaining: 0 min 0 sec cpu: 23581.3 % mem: [20617, 21390, 33965] MB
[removing ec, pass 1 ] 100 % elapsed: 5 min 10 sec remaining: 0 min 0 sec cpu: 24824.4 % mem: [21343, 21343, 33965] MB
[removing ec, pass 2 ] 100 % elapsed: 1 min 4 sec remaining: 0 min 0 sec cpu: 20397.3 % mem: [20668, 21375, 33965] MB
[removing ec, pass 3 ] 100 % elapsed: 0 min 27 sec remaining: 0 min 0 sec cpu: 22962.5 % mem: [20167, 20698, 33965] MB
[removing ec, pass 4 ] 100 % elapsed: 0 min 27 sec remaining: 0 min 0 sec cpu: 23310.6 % mem: [19924, 20170, 33965] MB
[removing ec, pass 5 ] 100 % elapsed: 0 min 28 sec remaining: 0 min 0 sec cpu: 23301.5 % mem: [19860, 19925, 33965] MB
[removing tips, pass 6 ] 100 % elapsed: 0 min 55 sec remaining: 0 min 0 sec cpu: 25142.7 % mem: [19974, 19974, 33965] MB
[removing bulges, pass 6 ] 100 % elapsed: 1 min 36 sec remaining: 0 min 0 sec cpu: 22626.6 % mem: [19948, 20002, 33965] MB
[removing ec, pass 6 ] 100 % elapsed: 0 min 27 sec remaining: 0 min 0 sec cpu: 22708.7 % mem: [19920, 19947, 33965] MB
[removing tips, pass 7 ] 100 % elapsed: 0 min 21 sec remaining: 0 min 0 sec cpu: 24399.3 % mem: [19909, 19921, 33965] MB
[removing bulges, pass 7 ] 100 % elapsed: 1 min 37 sec remaining: 0 min 0 sec cpu: 23080.5 % mem: [19853, 19911, 33965] MB
[removing ec, pass 7 ] 100 % elapsed: 0 min 26 sec remaining: 0 min 0 sec cpu: 23476.6 % mem: [19837, 19848, 33965] MB
[removing tips, pass 8 ] 100 % elapsed: 0 min 16 sec remaining: 0 min 0 sec cpu: 8285.7 % mem: [19837, 19841, 33965] MB
[removing bulges, pass 8 ] 100 % elapsed: 1 min 37 sec remaining: 0 min 0 sec cpu: 23144.2 % mem: [19833, 19837, 33965] MB
[removing ec, pass 8 ] 100 % elapsed: 0 min 26 sec remaining: 0 min 0 sec cpu: 23530.7 % mem: [19827, 19828, 33965] MB
[removing tips, pass 9 ] 100 % elapsed: 0 min 15 sec remaining: 0 min 0 sec cpu: 8222.7 % mem: [19826, 19827, 33965] MB
[removing bulges, pass 9 ] 100 % elapsed: 1 min 36 sec remaining: 0 min 0 sec cpu: 23186.3 % mem: [19831, 19831, 33965] MB
[removing ec, pass 9 ] 100 % elapsed: 0 min 27 sec remaining: 0 min 0 sec cpu: 23680.5 % mem: [19825, 19826, 33965] MB
[removing tips, pass 10 ] 100 % elapsed: 0 min 16 sec remaining: 0 min 0 sec cpu: 8274.7 % mem: [19825, 19826, 33965] MB
[removing bulges, pass 10 ] 100 % elapsed: 1 min 36 sec remaining: 0 min 0 sec cpu: 23208.0 % mem: [19829, 19829, 33965] MB
[removing ec, pass 10 ] 100 % elapsed: 0 min 28 sec remaining: 0 min 0 sec cpu: 23634.8 % mem: [19825, 19825, 33965] MB
[Minia : assembly ] 100 % elapsed: 8 min 6 sec remaining: 0 min 0 sec cpu: 100.2 % mem: [19798, 19798, 33965] MB
[Approximating frequencies of minimizers ] 100 % elapsed: 1 min 17 sec remaining: 0 min 0 sec cpu: 99.8 % mem: [ 28, 28, 28] MB
[DSK: nb solid kmers found : 678092331 ] 201 % elapsed: 88 min 9 sec remaining: 0 min 0 sec cpu: 1630.6 % mem: [7642, 7848, 7871] MB
[Iterating DSK partitions ] 0 % elapsed: 0 min 0 sec remaining: 0 min 0 sec/home/pavel/bin/gatb-minia-pipeline/tools/memused: line 18: 249751 Segmentation fault "$@"

@rchikhi
Copy link
Member

rchikhi commented Aug 26, 2018

Hi Mozart, thanks for reporting it. I'm assuming you cannot share the data either, therefore I'd be curious to see if the problem occurs with a different k-mer combinations. Can you please try the following command line? ./gatb --kmer-sizes 31,51,71 -1 [..] -2 [..] -o [..]

@Mozart1776
Copy link

Mozart1776 commented Aug 26, 2018 via email

@rchikhi
Copy link
Member

rchikhi commented Aug 30, 2018

Hi Pavel,
Very nice! Please email me at rayan.chikhi@univ-lille.fr. Any way to get the data is fine, if you can share it on your end. Otherwise I'll send a shared dropbox link.
Rayan

@adigenova
Copy link

HI Rayan,
I have a similar problem with DSK(core dumped), the log is the following:

Minia 3, git commit 4b32fec
setting storage type to hdf5
[Approximating frequencies of minimizers ] 100 % elapsed: 2 min 45 sec remaining: 0 min 0 sec cpu: 99.9 % mem: [ 25, 25, 25] MB
[DSK: Pass 1/1, Step 2: counting kmers ] 73 % elapsed: 145 min 33 sec remaining: 53 min 49 sec cpu: 553.2 % mem: [36580, 36580, 36580] MB /home/adigenova/binaries/gatb-minia-pipeline/tools/memused: line 18: 33019 Segmentation fault (core dumped) "$@"
maximal memory used: 41023 MB
(2018-09-18 19:12:07) Execution of 'minia/minia' failed. Command line:
/home/adigenova/binaries/gatb-minia-pipeline/tools/memused /home/adigenova/binaries/gatb-minia-pipeline/minia/minia -in MMK.list_reads -kmer-size 61 -abundance-min 2 -out MMK_k61 -max-memory 35000

The command line that I used was the following:
~/binaries/gatb-minia-pipeline/gatb -l sequences.txt --nb-cores 20 --max-memory 35000 -o MMK --kmer-sizes 31,61,91 --abundance-mins 2,2,2

and the sequences.txt file contains the following reads:
stLFR.split_read.1.fq.gz
stLFR.split_read.2.fq.gz

the reads are from GIAB:
ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/stLFR/stLFR.split_read.1.fq.gz
ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/stLFR/stLFR.split_read.2.fq.gz

Let me know if you need more information to reproduce the error.
Thank in advance.

The best
Alex

@rchikhi
Copy link
Member

rchikhi commented Oct 29, 2018

Hi Alex,
Thanks for the very detailed bug report and sorry for the answer delay. Can you reproduce this problem on another machine? Unfortunately I cannot, the pipeline finished without crashing on my server using the command line you provided, on the master branch of gatb-pipeline repo.

@adigenova
Copy link

adigenova commented Oct 29, 2018 via email

@rchikhi
Copy link
Member

rchikhi commented Nov 25, 2018

After offline discussion with Alex it seems that the problem he reported was related to higher memory usage than what was available on the system.
I'm closing this thread as it's aggregating a bunch of unrelated segfault problems.

@rchikhi rchikhi closed this as completed Nov 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants