Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P CIGAR operator causes mpileup to stutter #57

Open
jkbonfield opened this issue Feb 6, 2014 · 4 comments
Open

P CIGAR operator causes mpileup to stutter #57

jkbonfield opened this issue Feb 6, 2014 · 4 comments

Comments

@jkbonfield
Copy link
Contributor

When given a BAM file using the P operator, mpileup can sometimes skip bases and duplicate others. Additionally there is no way to actually use the P operator properly, to know which bases align with which in an insertion region, but that is a different fault and a fundamental flaw in the output format.

An example:

@sq SN:c1 LN:10
s0a 0 c1 1 0 10M * 0 0 AACCGCGGTT *
s0b 0 c1 1 0 10M * 0 0 AACCGCGGTT *
s0c 0 c1 1 0 10M * 0 0 AACCGCGGTT *
s1 0 c1 1 0 5M6I5M * 0 0 AACCGGTTAACCGGTT *
s2 0 c1 1 0 5M1P4I1P5M * 0 0 AACCGTTAACGGTT *
s3 0 c1 1 0 5M3I3P5M * 0 0 AACCGGTTCGGTT *
s4 0 c1 1 0 5M3P3I5M * 0 0 AACCGAACCGGTT *
s5 0 c1 1 0 4M1D2P2I2P1D4M * 0 0 AACCTAGGTT *
s6 0 c1 1 0 2M3D6I3D2M * 0 0 AAGTTAACTT *

jkb@seq3a[work/samtools...] ./samtools mpileup c1#pad1.bam
[mpileup] 1 samples in 1 input files
Set max per-file depth to 8000
c1 1 N 9 ^!A^!A^!A^!A^!A^!A^!A^!A^!A ~~~~~~~~~
c1 2 N 9 AAAAAAAAA-3NNN ~~~~~~~~~
c1 3 N 9 CCCCCCCC* ~~~~~~~~~
c1 4 N 9 CCCCCCCC-1N* ~~~~~~~~~
c1 5 N 9 GGGG+6GTTAACG+4TTAAG+3GTTG+3AAC_+2AG_+6TTAACT ~~~~~~~~~
c1 6 N 9 CCCCCCC** ~~~~~~~~~
c1 7 N 9 GGGGGGGG* ~~~~~~~~~
c1 8 N 9 GGGGGGGG* ~~~~~~~~~
c1 9 N 9 TTTTTTTTT ~~~~~~~~~
c1 10 N 9 T$T$T$T$T$T$T$T$T$ ~~~~~~~~~

Note that position 5 should end +2TA+6GTTAAC

Uuencoded SAM file to demonstrate the problem. (It's also c1#pad1.sam)

begin 644 -
M0%-1"5-..F,Q"4Q..C$P"G,P80DP"6,Q"3$),`DQ,$T)*@DP"3`)04%#0T=#
M1T=45`DJ"G,P8@DP"6,Q"3$),`DQ,$T)*@DP"3`)04%#0T=#1T=45`DJ"G,P
M8PDP"6,Q"3$),`DQ,$T)*@DP"3`)04%#0T=#1T=45`DJ"G,Q"3`)8S$),0DP
M"35--DDU30DJ"3`),`E!04-#1T=45$%!0T-'1U14"2H*<S(),`EC,0DQ"3`)
M-4TQ4#1),5`U30DJ"3`),`E!04-#1U1404%#1T=45`DJ"G,S"3`)8S$),0DP
M"35-,TDS4#5-"2H),`DP"4%!0T-'1U140T='5%0)*@IS-`DP"6,Q"3$),`DU
M33-0,TDU30DJ"3`),`E!04-#1T%!0T-'1U14"2H*<S4),`EC,0DQ"3`)-$TQ
M1#)0,DDR4#%$-$T)*@DP"3`)04%#0U1!1T=45`DJ"G,V"3`)8S$),0DP"3)-
<,T0V23-$,DT)*@DP"3`)04%'5%1!04-45`DJ"@``
`
end
@jkbonfield
Copy link
Contributor Author

Ugh I give up. uuencode broke too. Much hate for web 2.0! I've no idea how to attach raw data here so just email me if you need it, but this happens to already be a file in the test suite so no need for this time.

@jmarshall
Copy link
Member

Putting a line containing ``` before and after puts a paragraph into <pre> mode. But with this file already in the repository, that's even better...

@jmarshall
Copy link
Member

We may wish to see if more recent P-related PRs have an effect on this…

@jkbonfield
Copy link
Contributor Author

It still fails. Mpileup and tview demonstrate the problem. Tview shows this:

1               11        21 
NNNNN******NNNNNNNNNNNNN
AACCG      CGGTT
AACCG******CGGTT
AACCG******CGGTT
AACCG******CGGTT
AACCGGTTAACCGGTT
AACCGTTAA**CGGTT
AACCGGTT***CGGTT
AACCGAAC***CGGTT
AACC*AG*****GGTT
AA***TTAACT***TT

That penultimate sequence was s5 with cigar 4M1D2P2I2P1D4M and seq AACCTAGGTT. The insertion TA has become AG instead. This also demonstrates that the P operator still isn't supported by samtools either (so the example in the original SAM paper is still not displayed correctly).

The last sequence is also displayed incorrectly. The insertion is GTTAAC not TTAACT. Basically it's off by one if the previous base was a deletion as, I think, it's incremented the sequence offset too early.

jmarshall pushed a commit to samtools/samtools that referenced this issue May 17, 2016
Fixes #496.  Also fixed the tests for c1#pad1.bam and c1#pad2.bam;
they should have been known failures instead of know passes (see
issues samtools/htslib#57 and samtools/htslib#59).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants