freebayes-parallel not fully utilising CPUs #479

MarkDunne · 2018-06-21T10:35:15Z

I'm using the freebayes-parallel script with freebayes version 1.2.0 and GNU Parallel version 20180522, and the CPU utilisation seems to randomly flip between all or almost none of the available cores.

In the above image after about 5 hours of running, the number of processes being run (checking via htop) dropped from 30 to just 1. From other runs where the same thing has happened - I know that after another day or two, this will randomly change again and run with all 30 cores for another few hours. This is running on a 32 core machine and freebayes-parallel with ncpus=30.

The only indicative error I see is from the output of GNU Parallel

parallel: Warning: No more file handles.
parallel: Warning: Raising ulimit -n or /etc/security/limits.conf may help.

But I'm only using 2 (albeit large) BAM files, so number of file handles seems like a weird issue. I have tried raising the ulimit while the script was running, but it didn't appear to have any effect. If it's relevant, the input for GNU parallel is the output of fasta_generate_regions.py with region size 100000.

Any help appreciated.

The text was updated successfully, but these errors were encountered:

ekg · 2018-06-21T10:52:16Z

How are you choosing the regions to run? You may want to ensure that you have approximately the same number of alignments in each region, otherwise you will find your runtime dominated by the longest-running segment.

This could be scripted out pretty easily with bedtools coverage or something similar. Unfortunately, I don't think I ever pushed a script to do this to this repo, but contributions are more than welcome.

The warning about file handles is concerning. You should find out what is accessing the files. It doesn't make sense to me that it's freebayes or the parallel script, but it could be that the way the number of CPUs is specified is not being read and it's trying to run all the jobs at once. This does seem like a strange possibility. Without access to the system it will be impossible for me to debug so hopefully you can get some idea of what's going on. This seems like a likely cause of the problem, but only because it's an unusual error.

MarkDunne · 2018-06-21T21:53:54Z

Hi Erik,

Thanks for getting back to me so quickly.

To address your points

You may want to ensure that you have approximately the same number of alignments in each region, otherwise you will find your runtime dominated by the longest-running segment.

I thought about this one. At first it made a lot of sense that skewed coverage could be the cause. My understanding is that running parallel with the -k parameter (--keep-order) as in the freebayes-parallel script would necessitate that a process buffer until it is its turn to flush to stdout. It is easy to imagine if there is one region that has a particularly high number of reads, it would block all of the other processes that need to follow it. This makes a lot of sense, however under this scenario I would expect to see 30 processes open in htop, but I don't. I only see one or two processes running in htop during the slow down period. No processes appeared to be blocked from htop at least.

it could be that the way the number of CPUs is specified is not being read and it's trying to run all the jobs at once.

Also unlikely if htop is to be believed. As I said, I'm running freebayes-parallel with ncpus=30 on a 32 core machine and indeed, when it is running at the correct utilization level, I see 30 processes open. If it was trying to run all the jobs at once I would expect to see much more than 30 processes open in htop.

ekg · 2018-06-22T12:45:21Z

This makes a lot of sense, however under this scenario I would expect to see 30 processes open in htop, but I don't. I only see one or two processes running in htop during the slow down period. No processes appeared to be blocked from htop at least.

Actually, you'll only see the parallel process and the freebayes process that hasn't finished. parallel has gathered all the results of the jobs that have finished, so they won't be running anymore.

I think this is exactly what we're seeing.

Use the coverage_to_regions.py script to generate target regions with an even number of reads. You can do so like this:

samtools depth aln.bam | python2.7 ./coverage_to_regions.py ref.fa.fai 10000 >targets.bed

You can then convert targets.bed to the "region" format using vcflib's script bed2region.

Please let me know if this resolves the issue. You obviously need to choose an appropriate number of chunks. 10k may be too many.

ekg · 2018-06-22T12:58:26Z

As a caveat, note that the coverage_to_regions.py script is going to use a lot of RAM. One solution is to downsample the BAM by piping it through awk or breaking it up by chromosome.

This should be more deeply integrated into freebayes, but I haven't got the time to do it.

MarkDunne · 2018-06-22T13:44:24Z

Thanks Erik. Very interesting that parallel doesn't start new processes while waiting for one to complete.

I might also try running fasta_generate_regions.py with a larger region size as that should harmonise the runtime a bit.

ekg · 2018-06-22T15:11:23Z

@MarkDunne well it's specifically because of the -k flag to parallel. It seems to block the process. We can double check in code if you'd like. Another way to check is to run everything without -k and then sort and deduplicate records afterwards. Does this make sense?

Setting things with even numbers of reads in them will absolutely help in harmonizing runtimes. The same could be said for raw variant density (it's a good proxy for read depth and freebayes runtime) and you can use vcfevenregions from vcflib to generate that. I'd be cautious about increasing the region size. That is more likely to make things worse as now the longest running region will take even longer. It'd be better to make the regions much smaller. Your total runtime is bounded by the longest running chunk, unless the longest runtime is much shorter than the total.

tseemann · 2018-06-22T17:28:41Z

@ekg if you want to subsample the BAM file, modern samtools view has an option:

  -s FLOAT subsample reads (given INT.FRAC option value, 0.FRAC is the
           fraction of templates/read pairs to keep; INT part sets seed)

Also, might be worth consideting mosdepth - it uses the BAMI index file for approximate depth, so is very fast: https://github.com/brentp/mosdepth

MarkDunne · 2018-06-28T14:51:30Z

We've tackled this problem without needing to calculate genome coverage. The trick was to split the script in two.

Step one is to run freebayes with parallel while dumping the many small vcfs to a temporary directory. This frees us from the --keep-order flag and doesn't block anymore. Step two is to do cat *.vcf on that directory and feed it through the same vcflib pipeline. As long as the filenames are ordered, everything works out exactly the same. This solves the issue for us.

ekg · 2018-06-28T15:05:33Z

@MarkDunne thanks for reporting, this is very helpful to know. I'll close the issue.

sagitaninta · 2021-05-24T15:11:30Z

Hi @MarkDunne is there an exact implementation of splitting the script somewhere in the repo? I have this problem with freebayes currently.

marta-antunes · 2023-10-12T19:10:08Z

Hi @MarkDunne. How did you make the split?

ekg closed this as completed Jun 28, 2018

jgarbe256 mentioned this issue Apr 18, 2019

ultra high depth makes freebayes run indefinitely #529

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

freebayes-parallel not fully utilising CPUs #479

freebayes-parallel not fully utilising CPUs #479

MarkDunne commented Jun 21, 2018 •

edited

ekg commented Jun 21, 2018

MarkDunne commented Jun 21, 2018 •

edited

ekg commented Jun 22, 2018

ekg commented Jun 22, 2018

MarkDunne commented Jun 22, 2018

ekg commented Jun 22, 2018

tseemann commented Jun 22, 2018 •

edited

MarkDunne commented Jun 28, 2018 •

edited

ekg commented Jun 28, 2018

sagitaninta commented May 24, 2021

marta-antunes commented Oct 12, 2023

freebayes-parallel not fully utilising CPUs #479

freebayes-parallel not fully utilising CPUs #479

Comments

MarkDunne commented Jun 21, 2018 • edited

ekg commented Jun 21, 2018

MarkDunne commented Jun 21, 2018 • edited

ekg commented Jun 22, 2018

ekg commented Jun 22, 2018

MarkDunne commented Jun 22, 2018

ekg commented Jun 22, 2018

tseemann commented Jun 22, 2018 • edited

MarkDunne commented Jun 28, 2018 • edited

ekg commented Jun 28, 2018

sagitaninta commented May 24, 2021

marta-antunes commented Oct 12, 2023

MarkDunne commented Jun 21, 2018 •

edited

MarkDunne commented Jun 21, 2018 •

edited

tseemann commented Jun 22, 2018 •

edited

MarkDunne commented Jun 28, 2018 •

edited