Segmentation fault (core dumped) #21

jaclyn-taroni · 2019-06-05T14:39:42Z

For context: I am attempting to create an augmented FASTA file to add decoy sequence to a Salmon index as noted in the release notes in the most recent version of Salmon (0.14.0): https://github.com/COMBINE-lab/salmon/releases/tag/v0.14.0

The authors provide a script that makes use of MashMap to do so here: https://github.com/COMBINE-lab/SalmonTools/blob/master/scripts/generateDecoyTranscriptome.sh

I get Segmentation fault (core dumped) when the script reaches the MashMap step at this line https://github.com/COMBINE-lab/SalmonTools/blob/23eac847decf601c345abd8527eed5dc1b382573/scripts/generateDecoyTranscriptome.sh#L105

This can be reproduced from the command line:

mashmap -r reference.masked.genome.fa -q Homo_sapiens.GRCh38.cdna.all.fa -t 8 --pi 80 -s 500
>>>>>>>>>>>>>>>>>>
Reference = [reference.masked.genome.fa]
Query = [Homo_sapiens.GRCh38.cdna.all.fa]
Kmer size = 16
Window size = 5
Segment length = 500 (read split allowed)
Alphabet = DNA
Percentage identity threshold = 80%
Mapping output file = mashmap.out
Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
Execution threads  = 8
>>>>>>>>>>>>>>>>>>
INFO, skch::Sketch::build, minimizers picked from reference = 985533927
Segmentation fault (core dumped)

Where the relevant input to generateDecoyTranscriptome.sh to generate reference.masked.genome.fa and the transcript fasta are:

Input	File Download
GTF	ftp://ftp.ensembl.org/pub/release-96/gtf/homo_sapiens/Homo_sapiens.GRCh38.96.gtf.gz
Genome FASTA	ftp://ftp.ensembl.org/pub/release-96/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
Transcript FASTA	ftp://ftp.ensembl.org/pub/release-96/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz

I'm using a Docker image with the v2.0 release of MashMap. (It can be pulled from jtaroni/2019-chi-training and MashMap is installed like so: https://github.com/AlexsLemonade/RNA-Seq-Exercises/blob/d6e5f8627c75e55e572e9061f0498388ebb7d212/Dockerfile#L91).

This also occurs running on my Ubuntu 18.04 machine w/ 64GB RAM outside the container.

Any ideas about what may be happening would be appreciated. Thank you!

The text was updated successfully, but these errors were encountered:

cjain7 · 2019-06-06T03:06:15Z

Would it be possible to re-run mashmap with /usr/bin/time utility to report its memory usage. Comparing the peak memory-usage with the RAM size would help. My first guess is that it's running out of memory with the parameters --pi 80 -s 500

lpantano · 2019-06-06T19:00:10Z

Hi,

I got the same error when running in a cluster, and the job was killed by the scheduler becaouse of memory and it showed the same error.

@cjain7, do you know how much memory it needs to run this kind of alignments? it would be the transcriptome against the genome?

I set up the limit to 200GB and it wasn't enough.

Thanks!

k3yavi · 2019-06-06T21:38:34Z

I've just finished running it on human gencode data and annotation. It took ~80G of memory for me to finish.

lpantano · 2019-06-07T01:18:11Z

Thank you for testing that! I was using ensembl and maybe that is the difference. Can I ask for the tool version and the size of the files ( maybe number of characters ) so I can compare with my transcriptome? thanks so much!!!

…

On June 6, 2019 at 17:38:35, Avi Srivastava ***@***.***) wrote: I've just finished running it on human gencode data and annotation. It took ~80G of memory for me to finish. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#21?email_source=notifications&email_token=AAML6HDKUTYLWNSH7MFXBGLPZF7VXA5CNFSM4HTZ5JQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXEHTIA#issuecomment-499677600>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAML6HDULGFDEWA4LW6SQ3DPZF7VXANCNFSM4HTZ5JQQ> .

k3yavi · 2019-06-07T01:31:50Z

No problem @lpantano .
Not to swarm the issue with salmon related files but gentrome.fa for genocde human comes out to be around 477 MB while ensembl one is around 431 MB. If you are looking for human ensembl decoys, we have uploaded them here. You can also follow up or raise a request for creating decoys for non model organism here COMBINE-lab/SalmonTools#5 , we would be happy to create that for you.

jaclyn-taroni · 2019-06-07T11:47:37Z

Thanks all for the replies. I am out of the office today, but I will run this with GNU time when I get back in early next week and see if that gives us any additional insight.

lpantano · 2019-06-07T20:53:19Z

@k3yavi , thanks. All good, it was enough 100GB, I messed up the configuration, sorry about that, but good to know about the resources, thanks so much for your time!!! really appreciate the help!

jaclyn-taroni · 2019-06-12T16:00:49Z

Hi @cjain7,

When I run /usr/bin/time with --verbose, the output is:

Command terminated by signal 11
    Command being timed: "mashmap -r reference.masked.genome.fa -q Homo_sapiens.GRCh38.cdna.all.fa -t 8 --pi 80 -s 500"
    User time (seconds): 1269.39
    System time (seconds): 64.50
    Percent of CPU this job got: 273%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 8:07.51
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 48309816
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 1
    Minor (reclaiming a frame) page faults: 56777549
    Voluntary context switches: 195530
    Involuntary context switches: 332377
    Swaps: 0
    File system inputs: 106068536
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

Thank you!

antonkulaga · 2019-07-06T10:58:18Z

Guys, you claim " Mashmap can map a human genome assembly to the human reference genome in about one minute total execution time and < 4 GB memory using just 8 CPU threads, ", why then it should take >64GB RAM to make human alignment in salmon tools decoy script with Masmap?

cjain7 · 2019-07-07T19:48:03Z

The performance is highly dependent on the length [-s] and identity [--pi] requirements provided to Mashmap...
When looking for long approximate matches that are highly similar, the algorithm would compute sparse LSH sketch to execute the computation. This was the case when comparing two human genome assemblies (--pi 95 -s 5000).

When looking for short divergent matches (--pi 80 -s 500, i.e., segment length 500 and 20% error rate here in your application), it will need dense sketch to identify those. Hence large memory-use and runtime in your specific case.. (Mashmap paper is a good reference for a verbose discussion on this).

One possible suggestion is to see if relaxing (i.e., increasing) the minimum identity/length requirements makes sense for the application.. If it is do-able, then the algorithm will execute much faster, with much less memory.

The other way-around this problem would be to partition the reference into smaller chunks, and run those independently, but this pipeline will require a bit more engineering to aggregate the results..

jaclyn-taroni mentioned this issue Jun 5, 2019

Segmentation fault on MashMap step of generateDecoyTranscriptome.sh COMBINE-lab/SalmonTools#5

Open

k3yavi mentioned this issue Jul 5, 2019

generateDecoyTranscriptome.sh gets 21 killed COMBINE-lab/SalmonTools#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault (core dumped) #21

Segmentation fault (core dumped) #21

jaclyn-taroni commented Jun 5, 2019

cjain7 commented Jun 6, 2019

lpantano commented Jun 6, 2019

k3yavi commented Jun 6, 2019

lpantano commented Jun 7, 2019 via email

k3yavi commented Jun 7, 2019

jaclyn-taroni commented Jun 7, 2019

lpantano commented Jun 7, 2019

jaclyn-taroni commented Jun 12, 2019

antonkulaga commented Jul 6, 2019

cjain7 commented Jul 7, 2019 •

edited

Loading

Segmentation fault (core dumped) #21

Segmentation fault (core dumped) #21

Comments

jaclyn-taroni commented Jun 5, 2019

cjain7 commented Jun 6, 2019

lpantano commented Jun 6, 2019

k3yavi commented Jun 6, 2019

lpantano commented Jun 7, 2019 via email

k3yavi commented Jun 7, 2019

jaclyn-taroni commented Jun 7, 2019

lpantano commented Jun 7, 2019

jaclyn-taroni commented Jun 12, 2019

antonkulaga commented Jul 6, 2019

cjain7 commented Jul 7, 2019 • edited Loading

cjain7 commented Jul 7, 2019 •

edited

Loading