-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid Overlap while running canu correction on RNASeq files with minimap2 #30
Comments
The display code from
It seems a bit odd that it's |
This does seem like a minimap2 bug. However, I need your help to move further. Could you add the following lines to if (!(r->rs < r->re && r->re <= mi->seq[r-rid].len)) {
fprintf(stderr, "%s <=> %s\n", t->name, mi->seq[r->rid].name);
abort();
} This essentially raises an assertion failure and prints out violating sequence pairs if coordinates are incorrect. By the way, there is no Thanks! |
Output from canu (from a different failed run):
Edit: More interesting is that this modification of minimap2 allows canu to continue because the offending lines don't appear in the output (even though the output is incomplete...). |
Could you extract the sequence pair "370985" and "192001", or could you send me the two blocks? I have run minimap2 on some pacbio reads, but could not reproduce the issue on my dataset. Without the sequences that trigger the bug, it is quite difficult to fix it. Thanks. |
This is weird. It's not failing when I run with just those two sequences, but it is for the whole block (both indexed and non-indexed). Anyway, here are the two sequences:
The entire indexed block can be found temporarily in my Dropbox code_scratch folder here. This index causes the issue when paired with the above highly-repetitive sequence. |
I have fixed the bug via 95eb1de. Thanks a lot for your example. It would be almost impossible without proper test cases. By the way, usually I wouldn't recommend to perform alignment (option -c) in the overlapping mode. First, this is much slower. Second, it introduces false overlaps. On pacbio C. elegans reads, doing alignment leads to a few more misassemblies. I am closing this issue. If you the see same problem again, feel free to reopen it. |
Looking at the fix, does this mean that all reverse mappings were given the wrong ID? I'm just wondering if I should remap my sequences with the fixed minimap2. |
It is inversion, not reverse. |
I'm getting 'INVALID OVERLAP' messages from canu when running read correction on RNASeq files with minimap2. Here's one of the outputs:
The offending minimap2 line for this overlap step seems to be the following (headers from canu source code,
minimap/mmapConvert.C
have been added):I can't easily work out if this is a minimap2 issue or a canu issue. It seems a bit odd that the start and end of a match region exceed the sequence length (i.e. blen 1596, bgn 4806; end 5049), but that could just be an unexpected file format change between minimap and minimap2.
Metadata
Canu v1.6 command:
The
minimap2
executable was copied into the canu directory:Running on Debian Linux:
Linux elegans 4.9.0-3-amd64 #1 SMP Debian 4.9.25-1 (2017-05-02) x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: