-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ultra long reads mapping to 1 nt #43
Comments
What htsbox version you are using and what tool you used to sort the bam? For 700kb reads, minimap2 is likely to produce a cigar string with >65535 operators. In this case, the cigar is written to a tag. It seems that the tag has been lost during processing. Long cigar is something I am actively working on to convince others. For now, please use precompiled samtools-1.6 here. It works with long cigar. Let me know if that works. |
Download link to the modified samtools-1.6: http://lh3lh3.users.sourceforge.net/data/cigar-64k/samtools-1.6+CG_x64-linux.tar.bz2 |
Upgrading my rusty apps solved it, thanks! (forgot to mention I used minimap2's -L option) |
For the time being, only the latest htsbox and the samtools at the above link fully work with extra long reads. For older tools, sorting, merging and flagstat all work, but depth/coverage and retrieval by region will silently ignore these extra long reads. I am actively pushing the same solution to all tools such that they all work properly. |
Forgot to mention: adding |
I've mapped some ONT reads using -k15 and discovered that some of the (very long) reads don't align well (see below). minimap2 aligns them to the right place, but with a score of 'inf' and only at one nucleotide (!). Also, the read positions start and stop at 0 (just like when the reads don't map).
Manual inspection (i.e. BLAST) suggests that these reads should map to the chromosomes (albeit at around 82% identity). Other ultra long reads form the same run with similar base calling scores map correctly.
Is this normal/expected behaviour? I tried changing kmer and chaining parameters to no avail.
Sample output:
htsbox samview -p minimap2.srtd.bam | grep inf
9cb8f5d7-ab9c-48ea-88bd-2477e02d7b60 702394 0 0 - chr20 64444167 17799575 17799575 inf 60 mm:125885;ins:0,0;del:0,0;score:718346
eb5d9390-96a6-42ce-ae87-b99e649e6d80 658312 658312 658312 + chr4 190214555 94057298 94057298 inf 60 mm:115078;ins:0,0;del:0,0;score:695835
The text was updated successfully, but these errors were encountered: