Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
MD tags not processed unless within first four flags #782
I'm trying to use JBrowse to examine BAM files that were aligned with IonTorrent's TMAP aligner, and while INDEL positions show up I'm not seeing SNP positions noted when using the Alignments2 or SNPCoverage display methods.
To investigate this further, I took the reads and re-mapped them with bwa-mem and loaded the resulting BAM into JBrowse, and I do see the SNPs highlighted along with the INDELs. Previous issue #573 that had similar problems pointed to the lack of an MD tag in the BAM file, but I checked both the TMAP and bwa-mem generated files and they both have MD tags with nearly identical values for the handful of reads that I examined. Here are the lines for two reads from both the TMAP derived BAM and the bwa-mem version. The reference is chromosome 1 of hg19 for those that want to test them.
I changed the position of the MD tag for the TMAP generated BAM for the two reads above to be immediately after the quality string and JBrowse does then parse the tag and show SNPs correctly. If I move the MD tag to be the 5th tag of the SAM string, then it's no longer parsed and SNPs aren't displayed.
I think the main issue is connected to this #568
Basically it seems to stop parsing other BAM tags after it emits this message "Unknown BAM tag type 'B', tags may be incomplete", hence why you saw that putting it earlier in the list fixed it
It seems like commenting out this line can fix the issue
Alternatively, parsing the B tag correctly would help as in #568
This is definitely a parsing issue, and does appear to be related to #568. What appears to be occurring is that the ZC tag is what is causing the issue as it has a value flag of B, which indicates that the string is an integer or numeric array, but this is then followed by an i and then the value string. It looks like the parser is reading the B and choking instead of disregarding the tag and moving on.
Ya, it seems to intentionally choke when it doesn't see a tag it can parse. Simply ignoring it as I illustrated doesn't seem like a good idea because then it gets some junk in the next iterations. Just for the sake of actually parsing the 'B' field, here is an example that handles the integer array case of the B tag.
I imagine the 'numeric' case would be similar, but with type == 'f' and readFloat instead of readInt or something.
"The letter can be one of ‘cCsSiIf’, corresponding to int8_t (signed 8-bit integer),