Skip to content
This repository has been archived by the owner on Mar 19, 2019. It is now read-only.

MC:Z: tag #24

Closed
drmjc opened this issue Jul 5, 2016 · 4 comments
Closed

MC:Z: tag #24

drmjc opened this issue Jul 5, 2016 · 4 comments

Comments

@drmjc
Copy link

drmjc commented Jul 5, 2016

Hi,
I've got a situation where bwa mem | bamsormadup can write reads with an empty "MC:Z:" tag. The resulting BAM file passes biobambam2 bamvalidate, but fails picard ValidataSamFile & thus likely GATK tools. Reading the SAM spec it seems that "MC:Z:" is invalid, so can you please clarify? I'm using the latest 2.0.49. Here's some output:

Picard ValidateSamFile output:

ERROR: Record 551, Read name ST-E00118:53:H02GVALXX:1:1113:3172:2135, Mate CIGAR String (MC Attribute) present for a read whose mate is unmapped
ERROR: Record 552, Read name ST-E00118:53:H02GVALXX:1:1113:3172:2135, Mate CIGAR String (MC Attribute) present for a read whose mate is unmapped
ERROR: Record 551, Read name ST-E00118:53:H02GVALXX:1:1113:3172:2135, Mate CIGAR string does not match CIGAR string of mate
ERROR: Record 552, Read name ST-E00118:53:H02GVALXX:1:1113:3172:2135, Mate CIGAR string does not match CIGAR string of mate
ERROR: Record 553, Read name ST-E00118:53:H02GVALXX:1:1206:29541:52380, Mate CIGAR String (MC Attribute) present for a read whose mate is unmapped
ERROR: Record 554, Read name ST-E00118:53:H02GVALXX:1:1206:29541:52380, Mate CIGAR String (MC Attribute) present for a read whose mate is unmapped
ERROR: Record 553, Read name ST-E00118:53:H02GVALXX:1:1206:29541:52380, Mate CIGAR string does not match CIGAR string of mate
ERROR: Record 554, Read name ST-E00118:53:H02GVALXX:1:1206:29541:52380, Mate CIGAR string does not match CIGAR string of mate

bamvalidate output

$ cat NA12878.SPRR.R1.bam | bamvalidate 
NULL

the bad reads

after bwa mem | bamsormadup

$ samtools view NA12878.SPRR.R1.bam | grep "\tMC:Z:$"
ST-E00118:53:H02GVALXX:1:1113:3172:2135 77  *   0   0   *   *   0   0   TGGTGTCCGTGCCCGGTTTCCTTTAGGCTCAACTGTTGTTAGAGTGATGTTTTCGGAGGGGGAGCAGCGGTGGAAGCAGGAGTGGCTACGATAGAGGGATGAGGGGAAGGGAGTGAAGGAGGTTTGTGAGCAAGTAAGTGNNNNNTGTTAN ><=>??-9-<<+--5<===-4><=5==+>--,*7<?=(+36933+/,5==+9=0#(23'(4(,-*-4+*8*).,+6+.6)-89=)+76*#5*+59>>>-+)-7)):--1)(62<@>-;).7*,46?@*.8-/06-,.+::#####<=:-/#AS:i:0   XS:i:0  RG:Z:NA12878.SPRR   ms:i:1872   mc:i:0  MC:Z:
ST-E00118:53:H02GVALXX:1:1113:3172:2135 141 *   0   0   *   *   0   0   TGGTGTCCGTGCCCGGTTTCCTTTAGGCTCAACTGTTGTTAGAGTGATGTTTTCGGAGGGGGAGCAGCGGTGGAAGCAGGAGTGGCTACGATAGAGGGATGAGGGGAAGGGAGTGAAGGAGGTTTGTGAGCAAGTAAGTGNNNNNTGTTAN ><=>??-9-<<+--5<===-4><=5==+>--,*7<?=(+36933+/,5==+9=0#(23'(4(,-*-4+*8*).,+6+.6)-89=)+76*#5*+59>>>-+)-7)):--1)(62<@>-;).7*,46?@*.8-/06-,.+::#####<=:-/#AS:i:0   XS:i:0  RG:Z:NA12878.SPRR   ms:i:1872   mc:i:0  MC:Z:
ST-E00118:53:H02GVALXX:1:1206:29541:52380   77  *   0   0   *   *   0   0   CTTTGAACATCCTCCTGACATCCGTTGGCTCCACTCATCTACTTCGCTGGCCCGCGCGCTTCCCAGGTCTTTGTCCGGGGCTCGAGCCACTCTCCTGTCGCCACCTACCACTTGCCTTCTCCTCCCAGCGTTATNNNNNNNNNCNNCNGNG >>.>+-;>;<??8?>.),8-5=>59:')?.?<,*--,*>,?+?4,#@<?(>:>":#)#=-*=9<48(,>4):?++-$*))+.-$5)55,=.:.@>=)<-6A56>-.++-->:36A??,9/-:.7,@-*,&.-..#########+##.#5#- AS:i:0  XS:i:0  RG:Z:NA12878.SPRR   ms:i:1807   mc:i:0  MC:Z:
ST-E00118:53:H02GVALXX:1:1206:29541:52380   141 *   0   0   *   *   0   0   CTTTGAACATCCTCCTGACATCCGTTGGCTCCACTCATCTACTTCGCTGGCCCGCGCGCTTCCCAGGTCTTTGTCCGGGGCTCGAGCCACTCTCCTGTCGCCACCTACCACTTGCCTTCTCCTCCCAGCGTTATNNNNNNNNNCNNCNGNG >>.>+-;>;<??8?>.),8-5=>59:')?.?<,*--,*>,?+?4,#@<?(>:>":#)#=-*=9<48(,>4):?++-$*))+.-$5)55,=.:.@>=)<-6A56>-.++-->:36A??,9/-:.7,@-*,&.-..#########+##.#5#- AS:i:0  XS:i:0  RG:Z:NA12878.SPRR   ms:i:1807   mc:i:0  MC:Z:

cheers,
Mark

@gt1
Copy link
Owner

gt1 commented Jul 5, 2016

Hi,

yes these empty MC aux fields should not be there. Could you retry with the latest release?

Best,
German

@drmjc
Copy link
Author

drmjc commented Jul 6, 2016

perfect, thanks German. This is passing picard ValidateSamFile now.
cheers, Mark

@drmjc drmjc closed this as completed Jul 6, 2016
@keiranmraine
Copy link

Hi German,

Is this something that affects all previous releases? This has just reared it's head in our system as users manipulate files. Our core sequencing pipeline looks to be using v2.0.37.

Regards,
Keiran

@gt1
Copy link
Owner

gt1 commented Jul 12, 2016

Hi Keiran,

the code for inserting MC tags in bamsormadup was inserted in October 2015, I think it was in version 2.0.21 (libmaus2 version 2.0.94). bamsort still copies empty CIGAR data to mates, I will fix this until the end of this week.

Best,
German

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants