Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when running in docker container #104

Closed
darcyabjones opened this issue Oct 26, 2018 · 7 comments
Closed

Segmentation fault when running in docker container #104

darcyabjones opened this issue Oct 26, 2018 · 7 comments

Comments

@darcyabjones
Copy link

Hi there,

I've been trying to run hhblits in a container against the precompiled uniclust30 database.
I find that it often segfaults in the first iteration some time after the first "Alternative alignment" log shows up.
The same issue occurs with hhsearch.

I've used your docker container (https://hub.docker.com/r/soedinglab/hh-suite/), and also my own which is heavily based on your alpine container (https://github.com/darcyabjones/pclust/blob/master/Dockerfiles/hhblits.Dockerfile) using hhblits v3.0-beta.3.

The VM I'm running the containers in is running Ubuntu 18.04, 16 vcpus, 48G ram.

The search completes when searching the precompiled scop90 database.
The search against uniclust30 will also complete if I compile & run on the VM directly.

Here's a small example:

ubuntu@darcy-pclust:~/pclust$ cat test.faa 
>Sn15.NS.00005 
MRFVLVVLLGLLLSVRSDVSAHHVDAAIPDSSQISNLIFPAHVARPGGENSTVISHKRRW
NGPPPAPAADDVWEKMKCKGRKFMAQMSYSDFDAGQMLPVPQNTAQSPWYLAHLYSWAYV
ISSVGEVYRSLGPGGYWGVSDFFRHISISDKCVEEGGKWIAAVITHYQQGTLVDGQRYTS
PNGEVKRASGAYFYMAVNPQGGIIVQNTLGPREAANKVYPGNYPDTELPALQKLSDMMWM
MWEYYVPAAQRTNLDFVMSLSISNPTSLSIIRRAFDSQGQVLTATPYKFDPNSDGGLALL
GSPNGARVAHFLIQRKPQVGLKTVIGIYGFESQAKSRAPCLMFKLGNLAAATPRPPVQRS
ELGPSSGAEQNMPVEETSVKRVLEQRNFVRTHIFRFDGNVTLPSEYM
ubuntu@darcy-pclust:~/pclust$ docker run --rm -it -v $(pwd):/data:rw soedinglab/hh-suite hhblits -i /data/test.faa -d /data/databases/hhuniref/uniclust30_2018_08 -o /data/test.hhr
- 06:44:25.547 INFO: Searching 15161831 column state sequences.
- 06:44:25.726 INFO: /data/test.faa is in A2M, A3M or FASTA format
- 06:44:25.728 INFO: Iteration 1
- 06:44:26.959 INFO: Prefiltering database
- 06:46:26.638 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment)  : 294315
- 06:46:31.326 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment)   : 294
- 06:46:31.326 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 294
- 06:46:31.326 INFO: Scoring 294 HMMs using HMM-HMM Viterbi alignment
- 06:46:31.498 INFO: Alternative alignment: 0

ubuntu@darcy-pclust:~/pclust$ docker run --rm -it -v $(pwd):/data:rw soedinglab/hh-suite hhsearch -i /data/test.faa -d /data/databases/hhuniref/uniclust30_2018_08 -o /data/test.hhr
- 06:49:07.435 INFO: /data/test.faa is in A2M, A3M or FASTA format
- 06:49:10.530 INFO: Searching 15161831 database HHMs without prefiltering
- 06:49:46.817 INFO: Iteration 1
- 06:49:48.316 WARNING: database contains sequences that exceeds maximum allowed size (maxres = 20001). Maxres can be increased with parameter -maxres.
- 06:49:48.432 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment)   : 15161831
- 06:49:48.433 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 15161831
- 06:49:48.433 INFO: Scoring 15161831 HMMs using HMM-HMM Viterbi alignment
- 06:49:49.462 INFO: Alternative alignment: 0

There is no output file.
Unfortunately the error doesn't propagate out of the docker container but this is the error message:

Segmentation fault (core dumped)

Any thoughts why this would be?

Cheers, Darcy

PS. Thanks so much for your work.
I've been evangelising mmseqs and hhblits to my colleages.

@narsapuramvijaykumar
Copy link

narsapuramvijaykumar commented Oct 29, 2018

Hello Team,

I'm facing the same issue @darcyabjones when trying to run using docker image
Below are my stdout for the program
Available resource : cpu - 4; memory - 16GB; database(pfam) size - 5GB.
hhblits -cpu 2 -i data/query/query.a3m -d data/pfam -o data/outputs/query.hhr

  • 10:31:57.668 INFO: Searching 17929 column state sequences.

  • 10:31:57.749 INFO: data/query/query.a3m is in A2M, A3M or FASTA format

  • 10:31:57.770 INFO: Iteration 1

  • 10:31:58.125 INFO: Prefiltering database

  • 10:31:58.527 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 784

  • 10:31:58.536 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 272

  • 10:31:58.536 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 272

  • 10:31:58.536 INFO: Scoring 272 HMMs using HMM-HMM Viterbi alignment

  • 10:31:58.640 INFO: Alternative alignment: 0

  • 10:32:01.738 INFO: 272 alignments done

  • 10:32:01.739 INFO: Alternative alignment: 1

  • 10:32:01.771 INFO: 48 alignments done

  • 10:32:01.771 INFO: Alternative alignment: 2

  • 10:32:01.781 INFO: 3 alignments done

  • 10:32:01.781 INFO: Alternative alignment: 3

  • 10:32:01.790 INFO: 1 alignments done

  • 10:32:01.846 INFO: Realigning 10 HMM-HMM alignments using Maximum Accuracy algorithm

Segmentation fault (core dumped)

And hhsearch as below

hhsearch -cpu 4 -i data/query/query.a3m -d data/pfam -o data/outputs/query.hhr

  • 10:25:17.320 INFO: data/query/query.a3m is in A2M, A3M or FASTA format

  • 10:25:17.343 INFO: Searching 17929 database HHMs without prefiltering

  • 10:25:17.356 INFO: Iteration 1

  • 10:25:17.543 WARNING: database contains sequences that exceeds maximum allowed size (maxres = 20001). Maxres can be increased with parameter -maxres.

  • 10:25:17.583 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 17929

  • 10:25:17.583 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 17929

  • 10:25:17.583 INFO: Scoring 17929 HMMs using HMM-HMM Viterbi alignment

  • 10:25:17.793 INFO: Alternative alignment: 0

Segmentation fault (core dumped)

Thanks in advance.

Regards,
Vijay N

@milot-mirdita
Copy link
Member

Can you check if this issue still happens in the new release?

If so, check on the host machine please with sysctl vm.overcommit_memory if memory overcommitment is enabled and set the value to 1 with sysctl vm.overcommit_memory=1 if not.

@sabyUWO
Copy link

sabyUWO commented Feb 28, 2019

  • 21:25:59.268 INFO: Iteration 1

  • 21:25:59.506 INFO: Prefiltering database

  • 21:27:31.160 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 171339

  • 21:27:31.962 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 185

  • 21:27:31.962 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 185

  • 21:27:31.962 INFO: Scoring 185 HMMs using HMM-HMM Viterbi alignment

  • 21:27:32.093 INFO: Alternative alignment: 0

Segmentation fault

I am still having this error

@sabyUWO
Copy link

sabyUWO commented Feb 28, 2019

I cant set sysctl vm.overcommit_memory=1 as i dont have rights, is there any other way?

@darcyabjones
Copy link
Author

Sorry for my delayed response and thanks for getting back to me.

Running hhblits and hhsearch with the same commands i sent originally works fine now.
The overcommit_memory option was set to 0 (thanks for that tip!), but it worked with both 1 and 0.

The current docker image i used was from git commit a0ca99d62d57.

$ sudo docker run --rm -it -v $(pwd):/data:rw soedinglab/hh-suite /usr/bin/time -v hhblits -i /data/test.faa -d /data/data/uniclust30_2018_08/uniclust30_2018_08 -o /data/test.hhr -v 1
	Command being timed: "hhblits -i /data/test.faa -d /data/data/uniclust30_2018_08/uniclust30_2018_08 -o /data/test.hhr -v 1"
	User time (seconds): 504.82
	System time (seconds): 6.07
	Percent of CPU this job got: 188%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 4m 30.83s
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 27683024
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 222
	Minor (reclaiming a frame) page faults: 891229
	Voluntary context switches: 3442
	Involuntary context switches: 10899
	Swaps: 0
	File system inputs: 299040
	File system outputs: 1400
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

I have found that specifying -cpu > 2 does still gives me a segfault once it hits the alignment stage.

$ sudo docker run --rm -it -v $(pwd):/data:rw soedinglab/hh-suite /usr/bin/time -v hhblits -i /data/test.faa -d /data/data/uniclust30_2018_08/uniclust30_2018_08 -o /data/test.hhr -cpu 3
- 06:16:35.906 INFO: Searching 15161831 column state sequences.

- 06:16:36.030 INFO: /data/test.faa is in A2M, A3M or FASTA format

- 06:16:36.031 INFO: Iteration 1

- 06:16:36.664 INFO: Prefiltering database

- 06:17:52.568 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment)  : 294315

- 06:17:56.102 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment)   : 294

- 06:17:56.102 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 294

- 06:17:56.102 INFO: Scoring 294 HMMs using HMM-HMM Viterbi alignment

- 06:17:56.371 INFO: Alternative alignment: 0

Command terminated by signal 11
	Command being timed: "hhblits -i /data/test.faa -d /data/data/uniclust30_2018_08/uniclust30_2018_08 -o /data/test.hhr -cpu 3"
	User time (seconds): 240.64
	System time (seconds): 4.72
	Percent of CPU this job got: 264%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1m 32.79s
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 27460848
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 728488
	Voluntary context switches: 68
	Involuntary context switches: 803
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

The same thing happens with hhsearch, it just skips the prefiltering and segfaults.
Running directly in the host VM will happily use the -cpu option, and has a much smaller max resident set size in the time output.

$ /usr/bin/time -v hhblits -i test.faa -d data/uniclust30_2018_08/uniclust30_2018_08 -o test.hhr -cpu 3 -v 1
	Command being timed: "hhblits -i test.faa -d data/uniclust30_2018_08/uniclust30_2018_08 -o test.hhr -cpu 3 -v 1"
	User time (seconds): 358.94
	System time (seconds): 4.88
	Percent of CPU this job got: 261%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 2:18.93
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 6976848
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 446
	Minor (reclaiming a frame) page faults: 806549
	Voluntary context switches: 828
	Involuntary context switches: 2481
	Swaps: 0
	File system inputs: 191472
	File system outputs: 1400
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

I had a hunch that the difference was due to musl or some other alpine peculiarity, so I just made a quick ubuntu version and it seems to work like native.

$ cat Dockerfile
FROM ubuntu:latest as builder

RUN apt-get update \
    && apt-get install -y gcc g++ cmake vim build-essential \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /opt/hh-suite
ADD . .

WORKDIR /opt/hh-suite/build
RUN cmake -DHAVE_SSE2=1 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local/hh-suite .. \
    && make \
    && make install

FROM ubuntu:latest
RUN apt-get update \
    && apt-get install -y libstdc++6 libgomp1 time \
    && rm -rf /var/lib/apt/lists/*

COPY --from=builder /usr/local/hh-suite /usr/local/hh-suite

ENV HHLIB=/usr/local/hh-suite
ENV PATH="/usr/local/hh-suite/bin:/usr/local/hh-suite/scripts:${PATH}"

CMD ["hhblits"]



$ sudo docker run --rm -it -v $(pwd):/data:rw myhhsuite /usr/bin/time -v hhblits -i /data/test.faa -d /data/data/uniclust30_2018_08/uniclust30_2018_08 -o /data/test.hhr -cpu 3 -v 1
	Command being timed: "hhblits -i /data/test.faa -d /data/data/uniclust30_2018_08/uniclust30_2018_08 -o /data/test.hhr -cpu 3 -v 1"
	User time (seconds): 484.41
	System time (seconds): 5.93
	Percent of CPU this job got: 270%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 3:01.16
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 7040696
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 3374
	Minor (reclaiming a frame) page faults: 803010
	Voluntary context switches: 3797
	Involuntary context switches: 1431
	Swaps: 0
	File system inputs: 819640
	File system outputs: 1400
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

I understand that time isn't necessarily the best way to watch memory use.
But from this it seems like there's something about the alpine compilation or runtime libs that causes hhblits/hhsearch to chew through memory or do weird things with allocations/frees.
I could definitely be wrong, I'm not a C or Docker expert.

Anyway, thanks for looking at this and sorry for this monstrously long comment.

Version 3 looks really nice so far :).

@milot-mirdita
Copy link
Member

Thanks for the thorough testing, I replaced the alpine base image with debian stable-slim. Would you mind trying it out again?

@darcyabjones
Copy link
Author

The new version seems to run perfectly.
It happily ran with 16 cpus and used about 3GB RAM on average for uniclust 30.

Thanks :)

Hopefully this fixes the issues faced by others in this thread and is helpful.
Closing it for now.

Thanks again, Darcy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants