-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assemble genome with high heterozygosity using canu #88
Comments
The first would be to check you average read length and coverage input and after correction. Assuming you have sufficient coverage of corrected reads (25X+) and the length distribution is similar before and after correction, you can try to optimize the assembly. The bubbles caused by heterozygosity can be confused with repeats causing unnecessary splitting in the assembly. Generally this means you will end up with a lot of unassembled contigs (the split part). We're working on algorithm improvements but in the meantime, you can run a parameter sweep to optimize the assembly. In the 4-unitigger directory there is a unitigger.sh script. You can take that file and turn it into a loop. For example, I have one that says:
So what you’d want to do is (where the max ovl is up to 1/2 average corrected read length)
|
Hi Sergey, |
Hi Sergey, |
Did you try corOutCoverage=80 in the Canu command? This may help get more than 22x after trimming. |
Also, what is the exact Canu command you used? If you set errorRate=0.025 in the Canu command, it might work out better to remove that parameter and let Canu use its defaults. |
Hi John, In the spec.txt: I did use errorRate=0.025 for initially assembly. Another question, should I use corOutCoverage=80 in the correction step or Thanks! |
For now, I would not break Canu up into steps manually by specifying [-correct | -trim | -assemble]. If you leave those off, it will run the entire pipeline. You also do not need a spec file for Canu - you can just include it all in the command (unless you like it better in the spec file). For example, try:
|
Thanks, John! |
I see that
Does parameter sweeping still apply now? If so, what parameters can be tweaked? I am also assembling a genome with high heterozygosity. Thanks! |
I've hopefully addressed the original issue and parameter sweeping in the FAQ. |
Hi,
I am assembling a plant genome with high heterozygosity and I got 70X raw Pacbio reads.
I tried canu with error rate 0.025, but got a very poor N50.
I am curious that is there any suggestion to improve this assembly?
Thanks for any comments and suggestions.
The text was updated successfully, but these errors were encountered: