-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P CIGAR operator causes mpileup to stutter #57
Comments
Ugh I give up. uuencode broke too. Much hate for web 2.0! I've no idea how to attach raw data here so just email me if you need it, but this happens to already be a file in the test suite so no need for this time. |
Putting a line containing ``` before and after puts a paragraph into |
We may wish to see if more recent |
It still fails. Mpileup and tview demonstrate the problem. Tview shows this:
That penultimate sequence was s5 with cigar 4M1D2P2I2P1D4M and seq AACCTAGGTT. The insertion TA has become AG instead. This also demonstrates that the P operator still isn't supported by samtools either (so the example in the original SAM paper is still not displayed correctly). The last sequence is also displayed incorrectly. The insertion is GTTAAC not TTAACT. Basically it's off by one if the previous base was a deletion as, I think, it's incremented the sequence offset too early. |
Fixes #496. Also fixed the tests for c1#pad1.bam and c1#pad2.bam; they should have been known failures instead of know passes (see issues samtools/htslib#57 and samtools/htslib#59).
When given a BAM file using the P operator, mpileup can sometimes skip bases and duplicate others. Additionally there is no way to actually use the P operator properly, to know which bases align with which in an insertion region, but that is a different fault and a fundamental flaw in the output format.
An example:
@sq SN:c1 LN:10
s0a 0 c1 1 0 10M * 0 0 AACCGCGGTT *
s0b 0 c1 1 0 10M * 0 0 AACCGCGGTT *
s0c 0 c1 1 0 10M * 0 0 AACCGCGGTT *
s1 0 c1 1 0 5M6I5M * 0 0 AACCGGTTAACCGGTT *
s2 0 c1 1 0 5M1P4I1P5M * 0 0 AACCGTTAACGGTT *
s3 0 c1 1 0 5M3I3P5M * 0 0 AACCGGTTCGGTT *
s4 0 c1 1 0 5M3P3I5M * 0 0 AACCGAACCGGTT *
s5 0 c1 1 0 4M1D2P2I2P1D4M * 0 0 AACCTAGGTT *
s6 0 c1 1 0 2M3D6I3D2M * 0 0 AAGTTAACTT *
jkb@seq3a[work/samtools...] ./samtools mpileup c1#pad1.bam
[mpileup] 1 samples in 1 input files
Set max per-file depth to 8000
c1 1 N 9 ^!A^!A^!A^!A^!A^!A^!A^!A^!A ~~~~~~~~~
c1 2 N 9 AAAAAAAAA-3NNN ~~~~~~~~~
c1 3 N 9 CCCCCCCC* ~~~~~~~~~
c1 4 N 9 CCCCCCCC-1N* ~~~~~~~~~
c1 5 N 9 GGGG+6GTTAACG+4TTAAG+3GTTG+3AAC_+2AG_+6TTAACT ~~~~~~~~~
c1 6 N 9 CCCCCCC** ~~~~~~~~~
c1 7 N 9 GGGGGGGG* ~~~~~~~~~
c1 8 N 9 GGGGGGGG* ~~~~~~~~~
c1 9 N 9 TTTTTTTTT ~~~~~~~~~
c1 10 N 9 T$T$T$T$T$T$T$T$T$ ~~~~~~~~~
Note that position 5 should end +2TA+6GTTAAC
Uuencoded SAM file to demonstrate the problem. (It's also c1#pad1.sam)
The text was updated successfully, but these errors were encountered: