Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kevlar novel multithreading #384

Open
ghost opened this issue Jul 6, 2020 · 2 comments
Open

Kevlar novel multithreading #384

ghost opened this issue Jul 6, 2020 · 2 comments

Comments

@ghost
Copy link

ghost commented Jul 6, 2020

Hello,
I launched kevlar as such

kevlar novel --out output.augfastq --case ../H5A3.cleanup.reads.fa --control ../ARC_ancestor.cleanup.reads.fa -t 48 --max-fpr 1.2 -M 10G

It has been running for 10 hours now. But I never see more than 1 thread in use. Is it normal? Is this step not multithreaded?
Thank you :)

[kevlar::novel] Case samples loaded in 2203.54 sec
[kevlar::novel] All samples loaded in 4405.89 sec
[kevlar::novel] Iterating over reads from 1 case sample(s)
[kevlar::novel]     processed 1000000 reads (149.16 seconds elapsed)
[kevlar::novel]     processed 2000000 reads (155.06 seconds elapsed)
[kevlar::novel]     processed 3000000 reads (161.09 seconds elapsed)
[kevlar::novel]     processed 4000000 reads (167.57 seconds elapsed)
[kevlar::novel]     processed 5000000 reads (173.63 seconds elapsed)
[kevlar::novel]     processed 6000000 reads (377.72 seconds elapsed)
[kevlar::novel]     processed 7000000 reads (794.66 seconds elapsed)
[kevlar::novel]     processed 8000000 reads (1272.67 seconds elapsed)
[kevlar::novel]     processed 9000000 reads (1741.69 seconds elapsed)
[kevlar::novel]     processed 10000000 reads (2070.23 seconds elapsed)
[kevlar::novel]     processed 20000000 reads (7007.82 seconds elapsed)
[kevlar::novel]     processed 30000000 reads (13155.10 seconds elapsed)
[kevlar::novel]     processed 40000000 reads (19243.20 seconds elapsed)
[kevlar::novel]     processed 50000000 reads (24312.45 seconds elapsed)
[kevlar::novel]     processed 60000000 reads (28102.23 seconds elapsed)
[kevlar::novel]     processed 70000000 reads (34284.96 seconds elapsed)
@standage
Copy link
Collaborator

standage commented Jul 8, 2020

Hi @aderzelle! The initial k-mer counting steps should be multithreaded, but the main procedure to identify novel k-mers is not multithreaded.

@mpinese
Copy link

mpinese commented Jul 21, 2021

Hi @standage, thanks for a great tool which we've been working to integrate into our human disease trio work. We've hit an issue with the kevlar novel step though, which I think is similar to the issue @aderzelle raised. Basically, for some samples kevlar novel takes > 48h to complete. This is our HPC walltime limit, so effectively we can't process these samples with Kevlar right now. This is for ~40X human WGS.

Is there some way to parallelise the novel step, perhaps by splitting the case k-mer input? Alternatively can we tweak the config to improve speed without too much effect on sensitivity? Any ideas you have would be much appreciated, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants