Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metabuli v1.0.0 segfault #10

Closed
sjaenick opened this issue Jun 5, 2023 · 8 comments
Closed

Metabuli v1.0.0 segfault #10

sjaenick opened this issue Jun 5, 2023 · 8 comments

Comments

@sjaenick
Copy link
Contributor

sjaenick commented Jun 5, 2023

Hi there,

Metabuli looks very interesting, but I'm encountering segmentation faults at two different stages (see below).
Interestingly, I'm seeing three different outcomes for the same input data, and even just changing the location
of the input file seems to be sufficient to arrive at a different result (one of: 1. it just works 2. segfault in metamer
extraction, 3. segfault in metamer comparison).

Metabuli v.1.0.0 compiled from source (-DCMAKE_BUILD_TYPE=Debug), with your pre-built database obtained as per
metabuli databases RefSeq refseq tmp.

/vol/mgx-sw/bin/metabuli classify --seq-mode 1 xxxy.in /vol/biodb/local_databases/MGX/metabuli/refseq/ outdir jobid --threads 1 --max-ram 64

System is a Intel(R) Xeon(R) CPU E5-4627 with 1TB of memory, running Ubuntu 20.04.6 LTS.

  1. Segfault during metamer extraction
Number of threads: 1
Query file: xxxy.in
Database directory: /vol/biodb/local_databases/MGX/metabuli/refseq/
Output directory: outdir
Job ID: jobid
Loading nodes file ... Done, got 2497728 nodes
Loading merged file ... Done, added 71172 merged nodes.
Loading names file ... Done
Init RMQ ...Done
The rest RAM: 68585259008
Indexing query file ...Done
Total number of sequences: 100000
Total read length: 16830253nt
Extracting query metamers ... 
Segmentation fault
(gdb) bt
#0  0x000055d9a8ce76ce in kseq_buffer_reader (inBuffer=0x7ffc8d6bf8c0, outBuffer=0x55d9b3b01f00 "p\343Ko\340\177", nbyte=16384)
    at /vol/mgx-sw/src/tools/Metabuli-1.0.0/lib/mmseqs/src/commons/KSeqBufferReader.h:30
#1  0x000055d9a8d22ac0 in ks_getc (ks=0x55d9aaf6d650) at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/commons/SeqIterator.h:25
#2  0x000055d9a8d231af in kseq_read (seq=0x55d9aaf6d5c0) at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/commons/SeqIterator.h:25
#3  0x000055d9a8d30cf9 in Classifier::_ZN10Classifier27fillQueryKmerBufferParallelER15QueryKmerBufferR10MmapedDataIcERKSt6vectorI13SequenceBlockSaIS6_EERS5_I5QuerySaISB_EERKSt4pairImmERK15LocalParameters._omp_fn.0(void) ()
    at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/commons/Classifier.cpp:354
#4  0x00007fe06f5158e6 in GOMP_parallel () from /lib/x86_64-linux-gnu/libgomp.so.1
#5  0x000055d9a8d25d43 in Classifier::fillQueryKmerBufferParallel (this=0x55d9aaf6c340, kmerBuffer=..., seqFile=..., 
    seqs=std::vector of length 100000, capacity 131072 = {...}, queryList=std::vector of length 100001, capacity 100001 = {...}, 
    currentSplit={...}, par=...) at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/commons/Classifier.cpp:345
#6  0x000055d9a8d25585 in Classifier::startClassify (this=0x55d9aaf6c340, par=...)
    at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/commons/Classifier.cpp:260
#7  0x000055d9a8dab1a7 in classify (argc=10, argv=0x7ffc8d6c1958, command=...)
    at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/workflow/classify.cpp:47
#8  0x000055d9a8db8ea4 in runCommand (p=0x55d9aaf571e0, argc=10, argv=0x7ffc8d6c1958)
    at /vol/mgx-sw/src/tools/Metabuli-1.0.0/lib/mmseqs/src/commons/Application.cpp:40
#9  0x000055d9a8db9ecc in main (argc=12, argv=0x7ffc8d6c1948) at /vol/mgx-sw/src/tools/Metabuli-1.0.0/lib/mmseqs/src/commons/Application.cpp:203
(gdb) p i
$1 = 0
(gdb) p inBuffer
$2 = (kseq_buffer_t *) 0x7ffc8d6bf8c0
(gdb) p index
$3 = 0
(gdb)
  1. Segfault during Comparing qeury and reference metamers stage
Number of threads: 1
Query file: /vol/sge-tmp/metab_test/xxxy.in
Database directory: /vol/biodb/local_databases/MGX/metabuli/refseq/
Output directory: outdir
Job ID: jobid
Loading nodes file ... Done, got 2497728 nodes
Loading merged file ... Done, added 71172 merged nodes.
Loading names file ... Done
Init RMQ ...Done
The rest RAM: 68585259008
Indexing query file ...Done
Total number of sequences: 100000
Total read length: 16830253nt
Extracting query metamers ... 
Time spent for metamer extraction: 1
Sorting query metamer list ...
Time spent for sorting query metamer list: 6
Comparing qeury and reference metamers...
Segmentation fault (core dumped)
(gdb) bt
#0  Classifier::getNextTargetKmer (lookingTarget=589166591814214271, diffIdxBuffer=0x7f77f313b010, diffBufferIdx=@0x7ffdcc68b000: 80091129, 
    totalPos=@0x7ffdcc68b010: 25564673655) at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/commons/Classifier.h:394
#1  0x0000557e854dc587 in Classifier::_ZN10Classifier20linearSearchParallelEP9QueryKmerRmRNS_6BufferI5MatchEERK15LocalParameters._omp_fn.0(void)
    () at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/commons/Classifier.cpp:723
#2  0x00007f77fdcfe8e6 in GOMP_parallel () from /lib/x86_64-linux-gnu/libgomp.so.1
#3  0x0000557e854d0a35 in Classifier::linearSearchParallel (this=0x557e87dd4320, queryKmerList=0x7f77ad3a2010, 
    queryKmerCnt=@0x7ffdcc68cb38: 28860420, matchBuffer=..., par=...) at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/commons/Classifier.cpp:570
#4  0x0000557e854cf72f in Classifier::startClassify (this=0x557e87dd4320, par=...)
    at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/commons/Classifier.cpp:281
#5  0x0000557e855551a7 in classify (argc=10, argv=0x7ffdcc68d318, command=...)
    at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/workflow/classify.cpp:47
#6  0x0000557e85562ea4 in runCommand (p=0x557e87dbf1e0, argc=10, argv=0x7ffdcc68d318)
    at /vol/mgx-sw/src/tools/Metabuli-1.0.0/lib/mmseqs/src/commons/Application.cpp:40
#7  0x0000557e85563ecc in main (argc=12, argv=0x7ffdcc68d308) at /vol/mgx-sw/src/tools/Metabuli-1.0.0/lib/mmseqs/src/commons/Application.cpp:203
(gdb)

I'm attaching the sample input file I've been using for this.
input.fas.txt

@sjaenick
Copy link
Contributor Author

sjaenick commented Jun 5, 2023

With the official binary AVX2 binaries:

$ md5sum /vol/sge-tmp/xxxy.in /vol/sge-tmp/in.fas 
27fe50b9a8a5829aefc66168ec5a783f  /vol/sge-tmp/xxxy.in
27fe50b9a8a5829aefc66168ec5a783f  /vol/sge-tmp/in.fas

~/metabuli_test/metabuli/bin/metabuli classify --seq-mode 1 /vol/sge-tmp/in.fas /vol/biodb/local_databases/MGX/metabuli/refseq/ outdir jobid works as intended, while ~/metabuli_test/metabuli/bin/metabuli classify --seq-mode 1 /vol/sge-tmp/xxxy.in /vol/biodb/local_databases/MGX/metabuli/refseq/ outdir jobid crashes (reproducibly).

Core was generated by `/homes/sjaenick/metabuli_test/metabuli/bin/metabuli classify --seq-mode 1 /vol/'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000044d843 in kseq_buffer_reader (nbyte=16384, outBuffer=0xb0f9b90 "\320N\340\001", inBuffer=0x7ffdf3ef8140)
    at /home/vsts/work/1/s/lib/mmseqs/src/commons/KSeqBufferReader.h:30
30      /home/vsts/work/1/s/lib/mmseqs/src/commons/KSeqBufferReader.h: No such file or directory.
[Current thread is 1 (Thread 0x2597940 (LWP 3753118))]
(gdb) bt
#0  0x000000000044d843 in kseq_buffer_reader (nbyte=16384, outBuffer=0xb0f9b90 "\320N\340\001", inBuffer=0x7ffdf3ef8140)
    at /home/vsts/work/1/s/lib/mmseqs/src/commons/KSeqBufferReader.h:30
#1  ks_getc (ks=0xb0f9b50) at /home/vsts/work/1/s/src/commons/SeqIterator.h:25
#2  kseq_read (seq=seq@entry=0xb0f9ac0) at /home/vsts/work/1/s/src/commons/SeqIterator.h:25
#3  0x000000000044efbf in Classifier::_ZN10Classifier27fillQueryKmerBufferParallelER15QueryKmerBufferR10MmapedDataIcERKSt6vectorI13SequenceBlockSaIS6_EERS5_I5QuerySaISB_EERKSt4pairImmERK15LocalParameters._omp_fn.0(void) () at /home/vsts/work/1/s/src/commons/Classifier.cpp:354
#4  0x0000000000a6ff06 in GOMP_parallel ()
#5  0x000000000045b619 in Classifier::fillQueryKmerBufferParallel (par=..., currentSplit=..., queryList=..., seqs=..., seqFile=..., kmerBuffer=..., this=0x25aecf0)
    at /home/vsts/work/1/s/src/commons/Classifier.cpp:345
#6  Classifier::startClassify (this=this@entry=0x25aecf0, par=...) at /home/vsts/work/1/s/src/commons/Classifier.cpp:260
#7  0x0000000000494c7c in classify (argc=<optimized out>, argv=<optimized out>, command=...) at /home/vsts/work/1/s/src/workflow/classify.cpp:47
#8  0x000000000049d62a in runCommand (p=0x25b2f60, argc=argc@entry=6, argv=argv@entry=0x7ffdf3ef97b8) at /home/vsts/work/1/s/lib/mmseqs/src/commons/Application.cpp:40
#9  0x0000000000414c26 in main (argc=8, argv=0x7ffdf3ef97a8) at /home/vsts/work/1/s/lib/mmseqs/src/commons/Application.cpp:203
(gdb)

@sjaenick
Copy link
Contributor Author

sjaenick commented Jun 5, 2023

Even more strange things:

$ id -a
uid=1096(sjaenick) gid=1000(cb) groups=1000(cb),1022(mgx),....

Running as myself (sjaenick), this works - file owned by myself:
-rw-r--r-- 1 sjaenick cb 17619148 Jun 5 19:00 /vol/sge-tmp/xxxy.in

... while this segfaults, even though the file is readable via its group (and world) permissions:
-rw-r--r-- 1 mgxserv mgx 17619148 Jun 5 19:02 /vol/sge-tmp/xxxy.in

Performing exactly the same using the mgxserv account, I get segfaults for
both input files (even though one of the input files even is owned by this account).
To be honest, I have no idea what might be going on.. .

@jaebeom-kim
Copy link
Collaborator

jaebeom-kim commented Jun 6, 2023

Thank you for sharing the issue with great details! It will help us provide more stable software.

  1. Segfault during Comparing qeury and reference metamers stage
    Could you first try using multiple threads?
    I could reproduce the Segfault during Comparing qeury and reference metamers stage only with --threads 1.
    Using more than one thread didn't make any problems.
    It will be very helpful if you try with multiple threads and share the result.

  2. Segfault during metamer extraction stage
    I couldn't reproduce Segfault during metamer extraction yet, so it may take more time.
    Please use Metabuli with working setting until we solve this issue.

@martin-steinegger
Copy link
Collaborator

Thank you so much for the detailed issue description and experiments. I also tried to reproduce the issue as well but failed. Just to clarify, if you run metabuli under your own account it does not crash (at all), but if you use another user it crashes?

@sjaenick
Copy link
Contributor Author

sjaenick commented Jun 6, 2023

Thank you so much for the detailed issue description and experiments. I also tried to reproduce the issue as well but failed. Just to clarify, if you run metabuli under your own account it does not crash (at all), but if you use another user it crashes?

It crashes both under my own account as well as with another one; the strange thing here is that for an input file readable by both accounts, the outcome reproducibly differs. Also, copying the input file to a different location makes a difference (and I have no idea what might cause this).

-rw-r--r-- 1 mgxserv mgx 17619148 Jun 5 19:02 /vol/sge-tmp/xxxy.in
/vol/mgx-sw/bin/metabuli classify --seq-mode 1 /vol/sge-tmp/xxxy.in /vol/biodb/local_databases/MGX/metabuli/refseq/ outdir jobid

When executed under my own account, this segfaults in the metamer extraction step; when run under the mgxserv account, it segfaults in the 'Comparing qeury and reference metamers' step.

@sjaenick
Copy link
Contributor Author

sjaenick commented Jun 6, 2023

1. **Segfault during `Comparing qeury and reference metamers` stage**
   Could you first try using multiple threads?
   I could reproduce the _Segfault during `Comparing qeury and reference metamers` stage_ only with `--threads 1`.
   Using more than one thread didn't make any problems.
   It will be very helpful if you try with multiple threads and share the result.

I just tested this; indeed, singlethreaded usage leads to the segfault, while specifying more than one thread works fine.

EDIT: It still segfaults, even with more than one thread:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  Classifier::linearSearchParallel (this=0x557195984800, queryKmerList=0x154714aee010, queryKmerCnt=@0x7ffff48b4b18: 28860420, matchBuffer=..., par=...)
    at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/commons/Classifier.cpp:501
501             if (diffIdxSplits.data[i].ADkmer == 0 || diffIdxSplits.data[i].ADkmer == UINT64_MAX) {
(gdb) bt
#0  Classifier::linearSearchParallel (this=0x557195984800, queryKmerList=0x154714aee010, queryKmerCnt=@0x7ffff48b4b18: 28860420, matchBuffer=..., par=...)
    at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/commons/Classifier.cpp:501
#1  0x0000557192d0272f in Classifier::startClassify (this=0x557195984800, par=...) at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/commons/Classifier.cpp:281
#2  0x0000557192d881a7 in classify (argc=8, argv=0x7ffff48b52f8, command=...) at /vol/mgx-sw/src/tools/Metabuli-1.0.0/src/workflow/classify.cpp:47
#3  0x0000557192d95ea4 in runCommand (p=0x5571959701e0, argc=8, argv=0x7ffff48b52f8)
    at /vol/mgx-sw/src/tools/Metabuli-1.0.0/lib/mmseqs/src/commons/Application.cpp:40
#4  0x0000557192d96ecc in main (argc=10, argv=0x7ffff48b52e8) at /vol/mgx-sw/src/tools/Metabuli-1.0.0/lib/mmseqs/src/commons/Application.cpp:203
(gdb) p i
$1 = 1
(gdb) p diffIdxSplits.data[i]
Cannot access memory at address 0x17

@sjaenick
Copy link
Contributor Author

sjaenick commented Jun 6, 2023

2. **Segfault during metamer extraction stage**
   I couldn't reproduce _Segfault during metamer extraction_ yet, so it may take more time.
   Please use Metabuli with working setting until we solve this issue.

I have a testcase here: If the input file is writable (e.g. permissions 644), it works. Change it to
readonly (chmod 444 file.fas), the segfault occurs.

https://github.com/steineggerlab/Metabuli/blob/master/src/commons/Classifier.cpp#L161 creates
a R/W mapping of the input file (see https://github.com/steineggerlab/Metabuli/blob/master/src/commons/Mmap.h#L30)
since no value for mode is specified. If you change this to

queryFile = mmapData<char>(queryPath_1.c_str(), 2); // mmap readonly

the segfault during metamer extraction seems to be fixed.

@sjaenick
Copy link
Contributor Author

sjaenick commented Jun 6, 2023

i think the second segfault (during 'Comparing qeury and reference metamers') might be due to a similar
problem:

https://github.com/steineggerlab/Metabuli/blob/master/src/commons/Classifier.cpp#L485 creates a R/W mapping
of one of the database files and later attempts to modify its data (e.g. https://github.com/steineggerlab/Metabuli/blob/master/src/commons/Classifier.cpp#L502); in my case, the database files are not writable by the mgxserv
account, hence the segfault.

A readonly mapping, i.e. mmapData<DiffIdxSplit>(diffIdxSplitFileName.c_str(), 2); won't work here (since it's being
modified in memory). I'll see if i can add a R/W mode with MAP_PRIVATE.

jaebeom-kim pushed a commit that referenced this issue Jun 12, 2023
* add mmap mode with MAP_PRIVATE

* fix mmap() for input and database files

* minor typo fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants