-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the majority reads were uncorrected after running canu -correct #2109
Comments
The default is to correct only the longest 40x of data and, in your case, 44 reads are > 40x so no more are corrected. If you want to correct more, you can use corOutCoverage. However, the assembly shouldn't complain about too few reads if you have 40x so can you post the output from your assembly run? Last, I wanted to check what your data input is, the command doesn't look correct as Canu doesn't support a |
Thank you for the clarifications! I have confirmed my data input is fastq file and the pipeline was run using -pacbio (I didn't copy-paste correctly, my apology). There was no output generated from Assembly run with error "Abort: partitioning failed; increase redMemory).
|
That error message: The categories mean what the names imply, reads with overlaps are those reads which have overlaps to other reads and w/o are those that don't. They shouldn't match the input read distribution exactly as the reads w/o overlaps are typically short or noisy reads and are a small subset of your total data which is why the have lower coverage and shorter length. |
Thank you for your comment.
|
Those correction stats are using estimated corrected read length. A corrected read length is the bases that have sufficient support from other reads. Since those reads have no overlaps (or more accurately have insufficient overlaps to be corrected), they have very short corrected read estimates. |
Hi Sergey, I'm not fully understanding the answer regarding the histogram. So there are > 7k reads that passed the initial filters (one of them is >=7kbp, a semi-arbitrary number). Are we misunderstanding something? |
They should but any dataset has some noisy reads and sequencing artifacts. You had 298 reads input, of these 282 look like the above, and are at least 8kb estimated after correction. The other 16 reads are split up into >7k pieces where the shortest is 0 bp and mean is 19 bp. Essentially there are 16 bad reads in the input. The corrected reads maintain their IDs so you can pull up the reads that were input (in the seqStore) that were not corrected and try to align them to a mito. I expect they will not align well or at all. |
I am trying to assemble human mitochondrial reads (average genome length ~16k). After running 'canu -correct' on 7519 mitochondrial reads, 7425 reads were uncorrected and only 44 were corrected. This led to 'canu -assemble' failing since too few reads were passed. I'm surprised that so many reads were filtered/uncorrected and curious to know what's the reason behind this?
Thank you!
parameters are the following:
please see the following for log:
The text was updated successfully, but these errors were encountered: