-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging SV from Oxford Nanopore - expected runtime #41
Comments
Hi @agolicz, I had a quick scan of the code, and it looks like there might be a scaling issue if there is a complex region of the genome with lots of SVs that overlap each other or lots of diversity - this situation is common near centromeric regions in humans, for example. Merging will essentially be an all vs all comparison in these regions, which might give rise to the high run time. However, dysgu usually gives these types of rearrangements low probability, so its possible you have filtered those out with the flt_vcf.py script? |
Hi,
Yes, flt_vcf.py only keeps the variants with PASS.
11hrs is not too bad (we're used to that in plants :)). I was just surprised because merging from 100 short read samples was much quicker. If you are interested in testing merging for long reads I am planning to run SVJedi to genotype and can report if there any issues, sites with too many missing genotypes etc. minimap2+dysgu have done very well in our in-house comparisons for Brassica napus! :) |
Glad it finished! I think the runtime is probably caused by high genome complexity in that case. Would be very interested to hear how you get on - feed back from users is very valuable! If you have not come across it already, jasmine could be a useful tool for merging also: https://github.com/mkirsche/Jasmine |
Hi,
I am trying to merge SVs discovered from Oxford Nanopore data (plant species, 60 samples, 20,000-50,000 SVs/sample). It's been running for 10 hours now. Is that an expected runtime? Would it make sense to try to merge step wise? For example, find most closely related samples, merge those first (say groups 5-10) and then merge the merged files to get final non-redundant SVs?
The text was updated successfully, but these errors were encountered: