-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long unitigging overlapper step #770
Comments
This is the same as the trimming jobs running slowly, see issue #521 for suggestions. Your best option is probably to run the assembly with |
Did it run fine through the trimming step?
Can you share the *.histogram files in the trimming/0-mercounts and unitigging/0-mercounts directories? What is the number in *.ms22.estMerThresh.out? Kmers with count above this are not used for seeding overlaps. It could be slow because it's aligning a loooooong overlap, or it could be slow because it's looking too hard for repeat overlaps. And if not, just close the issue. |
Hi Brian The number I got for the trimming/0-mercounts/basmati334.ms22.estMerThresh.out: The histogram files are: |
Its been a while but I thought I would share my results with the community on using the
But when I ran canu with default params it look almost 10times as longer ie. about a month but the assembly stats looks as following:
Seems like for my genome I can't cut corners and will have to just do it the default way. |
Thanks for the update, when you ran the fast option did you use just |
Hello
Sorry to ask a non-issue related question but I wanted to ask a general question with using canu for plant genome assembly using nanopore data.
We have a~100X coverage for our plant using nanopore and I used canu with the following default paramters:
canu -p plant -d plant_default_param genomeSize=380m maxMemory=400g maxThreads=48 corMaxEvidenceErate=0.15 -nanopore-raw plant.fastq.gz
I haven't had problems and right now its on the unitigging overlapper step and there's a couple of jobs that are basically hanging for close to 2 weeks. I'm hoping they will eventually finish at some point but I have another 100X nanopore data from a different individual and was wondering how to avoid the long hang of the overlapper step.
I was thinking reads with low quality may be a problem, or non-sense reads those that are tandem repeats (close to 10kb in length) of short K-mers that are not biological but noise generated from the sequencing reaction may be causing the hang? Are there any recommended pre quality control steps one should do on the nanopore reads before using canu to assemble them?
Thank you
The text was updated successfully, but these errors were encountered: