New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pileup outputs incorrect bases for insertions following deletions #59
Comments
Should it really be:
|
Yes my mistake. s4 is *G so *+1G would be correct. |
If we replace sam.c resolve_cigar2()
with:
the results for the example are:
(and ~ 35 failures for samtools test/mpileup) |
Looks promising and I think it is definitely along the right lines. It may need an s->k > 0 check too so that s->k-1 doesn't fail. This would prevent problems when cigar strings start with an insertion. Admittedly this is odd, but it can be valid in a multiple alignment. I also wonder if it solves issues with 10M1D1P1I10M. Ie a pad operator between the D and the I. I haven't checked yet if this gives incorrect pileup, but most likely. Perhaps therefore it just needs a flag being tracked to indicate "last_base_is_del" which is set on D and unset of M=XI etc (not not P). This would also solve starting on an insertion. I need to think of more corner cases for testing. |
Alas I see more problems that I didn't spot originally. Look at s5 in isolation:
The cigar string shows s5 having insertion followed by deletion followed by insertion. Position 2 therefore should probably read T+3GGG-5AATAA and position 7 is *+3GGG (correct). |
We need another int in (sam.h) bam_pileup1_t struct, say mismatchdellen
with:
and augment bam_plcmd.c pileup_seq() p->indel condition with:
the results for the example are:
P.S. corrections to previous proposal:
|
Fixes #496. Also fixed the tests for c1#pad1.bam and c1#pad2.bam; they should have been known failures instead of know passes (see issues samtools/htslib#57 and samtools/htslib#59).
It may be annoying to manually search binary expansions of decimal integers to get the corresponding flags. Hat tip @karel-brinda. Add footnote to SAM example noting the flag meanings of each bit in the example, as this is the first time the reader sees the bitwise FLAGs. Add footnote link to Wikipedia discussion of bitwise manipulation. Add decimal bit values to flag table: non-programmers can construct bit masks by adding powers of two together, but they don't know hexadecimal. Fix 0x20 description: the mate is reverse-COMPLEMENTED. Add hex equivalents to multi-segment annotation example, and reformat as a table for space and to emphasize the interesting parts. Fix typo in GenBank entry. Closes samtools#59.
More dumb examples, this time with "N".
Reported as:
Some oddities. We can even dump out chunks of internal memory before a read starts, eg with a single read with cigar 50000N1M and I have a fix for this and the above problems, being tested now. |
Fixed by samtools/samtools#847 |
I think this is an HTSlib issue rather than samtools issue. It was initially reported in 2010, but has only partially been fixed:
https://sourceforge.net/mailarchive/forum.php?thread_name=424A790A-3AC0-4FC2-8981-5E3F6671C0DD%40sanger.ac.uk&forum_name=samtools-devel
For example, the following SAM file:
displays for position 5
It should be:
The text was updated successfully, but these errors were encountered: