Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

freebayes-parallel not fully utilising CPUs #479

Closed
MarkDunne opened this issue Jun 21, 2018 · 11 comments
Closed

freebayes-parallel not fully utilising CPUs #479

MarkDunne opened this issue Jun 21, 2018 · 11 comments

Comments

@MarkDunne
Copy link

MarkDunne commented Jun 21, 2018

I'm using the freebayes-parallel script with freebayes version 1.2.0 and GNU Parallel version 20180522, and the CPU utilisation seems to randomly flip between all or almost none of the available cores.

screen shot 2018-06-21 at 10 46 32

In the above image after about 5 hours of running, the number of processes being run (checking via htop) dropped from 30 to just 1. From other runs where the same thing has happened - I know that after another day or two, this will randomly change again and run with all 30 cores for another few hours. This is running on a 32 core machine and freebayes-parallel with ncpus=30.

The only indicative error I see is from the output of GNU Parallel

parallel: Warning: No more file handles.
parallel: Warning: Raising ulimit -n or /etc/security/limits.conf may help.

But I'm only using 2 (albeit large) BAM files, so number of file handles seems like a weird issue. I have tried raising the ulimit while the script was running, but it didn't appear to have any effect. If it's relevant, the input for GNU parallel is the output of fasta_generate_regions.py with region size 100000.

Any help appreciated.

@ekg
Copy link
Collaborator

ekg commented Jun 21, 2018

How are you choosing the regions to run? You may want to ensure that you have approximately the same number of alignments in each region, otherwise you will find your runtime dominated by the longest-running segment.

This could be scripted out pretty easily with bedtools coverage or something similar. Unfortunately, I don't think I ever pushed a script to do this to this repo, but contributions are more than welcome.

The warning about file handles is concerning. You should find out what is accessing the files. It doesn't make sense to me that it's freebayes or the parallel script, but it could be that the way the number of CPUs is specified is not being read and it's trying to run all the jobs at once. This does seem like a strange possibility. Without access to the system it will be impossible for me to debug so hopefully you can get some idea of what's going on. This seems like a likely cause of the problem, but only because it's an unusual error.

@MarkDunne
Copy link
Author

MarkDunne commented Jun 21, 2018

Hi Erik,

Thanks for getting back to me so quickly.

To address your points

You may want to ensure that you have approximately the same number of alignments in each region, otherwise you will find your runtime dominated by the longest-running segment.

I thought about this one. At first it made a lot of sense that skewed coverage could be the cause. My understanding is that running parallel with the -k parameter (--keep-order) as in the freebayes-parallel script would necessitate that a process buffer until it is its turn to flush to stdout. It is easy to imagine if there is one region that has a particularly high number of reads, it would block all of the other processes that need to follow it. This makes a lot of sense, however under this scenario I would expect to see 30 processes open in htop, but I don't. I only see one or two processes running in htop during the slow down period. No processes appeared to be blocked from htop at least.

it could be that the way the number of CPUs is specified is not being read and it's trying to run all the jobs at once.

Also unlikely if htop is to be believed. As I said, I'm running freebayes-parallel with ncpus=30 on a 32 core machine and indeed, when it is running at the correct utilization level, I see 30 processes open. If it was trying to run all the jobs at once I would expect to see much more than 30 processes open in htop.

@ekg
Copy link
Collaborator

ekg commented Jun 22, 2018

This makes a lot of sense, however under this scenario I would expect to see 30 processes open in htop, but I don't. I only see one or two processes running in htop during the slow down period. No processes appeared to be blocked from htop at least.

Actually, you'll only see the parallel process and the freebayes process that hasn't finished. parallel has gathered all the results of the jobs that have finished, so they won't be running anymore.

I think this is exactly what we're seeing.

Use the coverage_to_regions.py script to generate target regions with an even number of reads. You can do so like this:

samtools depth aln.bam | python2.7 ./coverage_to_regions.py ref.fa.fai 10000 >targets.bed

You can then convert targets.bed to the "region" format using vcflib's script bed2region.

Please let me know if this resolves the issue. You obviously need to choose an appropriate number of chunks. 10k may be too many.

@ekg
Copy link
Collaborator

ekg commented Jun 22, 2018

As a caveat, note that the coverage_to_regions.py script is going to use a lot of RAM. One solution is to downsample the BAM by piping it through awk or breaking it up by chromosome.

This should be more deeply integrated into freebayes, but I haven't got the time to do it.

@MarkDunne
Copy link
Author

Thanks Erik. Very interesting that parallel doesn't start new processes while waiting for one to complete.

I might also try running fasta_generate_regions.py with a larger region size as that should harmonise the runtime a bit.

@ekg
Copy link
Collaborator

ekg commented Jun 22, 2018

@MarkDunne well it's specifically because of the -k flag to parallel. It seems to block the process. We can double check in code if you'd like. Another way to check is to run everything without -k and then sort and deduplicate records afterwards. Does this make sense?

Setting things with even numbers of reads in them will absolutely help in harmonizing runtimes. The same could be said for raw variant density (it's a good proxy for read depth and freebayes runtime) and you can use vcfevenregions from vcflib to generate that. I'd be cautious about increasing the region size. That is more likely to make things worse as now the longest running region will take even longer. It'd be better to make the regions much smaller. Your total runtime is bounded by the longest running chunk, unless the longest runtime is much shorter than the total.

@tseemann
Copy link
Contributor

tseemann commented Jun 22, 2018

@ekg if you want to subsample the BAM file, modern samtools view has an option:

  -s FLOAT subsample reads (given INT.FRAC option value, 0.FRAC is the
           fraction of templates/read pairs to keep; INT part sets seed)

Also, might be worth consideting mosdepth - it uses the BAMI index file for approximate depth, so is very fast: https://github.com/brentp/mosdepth

@MarkDunne
Copy link
Author

MarkDunne commented Jun 28, 2018

We've tackled this problem without needing to calculate genome coverage. The trick was to split the script in two.

Step one is to run freebayes with parallel while dumping the many small vcfs to a temporary directory. This frees us from the --keep-order flag and doesn't block anymore. Step two is to do cat *.vcf on that directory and feed it through the same vcflib pipeline. As long as the filenames are ordered, everything works out exactly the same. This solves the issue for us.

@ekg
Copy link
Collaborator

ekg commented Jun 28, 2018

@MarkDunne thanks for reporting, this is very helpful to know. I'll close the issue.

@sagitaninta
Copy link

Hi @MarkDunne is there an exact implementation of splitting the script somewhere in the repo? I have this problem with freebayes currently.

@marta-antunes
Copy link

Hi @MarkDunne. How did you make the split?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants