MD tags not processed unless within first four flags #782

Closed
mcnelsonphd opened this Issue Jul 7, 2016 · 5 comments

Comments

Projects
None yet
2 participants
@mcnelsonphd

I'm trying to use JBrowse to examine BAM files that were aligned with IonTorrent's TMAP aligner, and while INDEL positions show up I'm not seeing SNP positions noted when using the Alignments2 or SNPCoverage display methods.

To investigate this further, I took the reads and re-mapped them with bwa-mem and loaded the resulting BAM into JBrowse, and I do see the SNPs highlighted along with the INDELs. Previous issue #573 that had similar problems pointed to the lack of an MD tag in the BAM file, but I checked both the TMAP and bwa-mem generated files and they both have MD tags with nearly identical values for the handful of reads that I examined. Here are the lines for two reads from both the TMAP derived BAM and the bwa-mem version. The reference is chromosome 1 of hg19 for those that want to test them.

TMAP Alignment
ENG7L:02601:06989   0   chr1    981051  60  44M1I10M2I32M1I11M1I38M2D20M1D31M18S    *   0   0   GGAGGAGGTGGGGAAGCCCGTCCAGGTGCGGACATCGCGTTCCTCCCCCGATTTTTCCCCCAAAGCCTGCAGCTGTGATCCCCAAGGCGCCCGTGCGGGATGGACTGTGAGCAGATGACGGGGCTGTGCTCGTGTACGCCGGGTGGCTGGACCCAAGTGTGGCAGTGTCCAGACGGCCGTGCCCTGGGCCCGCGGCTGTGAGCTGGTGG   ;658/5565777)/27991://,/85;89:8=;798;;78076:9;;;0;<<<==1==>>+88+7:4::<;;<><<<<9899*84:0////)//.,,,(,//)/6666::6333766333::2<<;;;<;8777/////)--)-9)//730//)/27;;995:::555848:855056::555/58=3>>4:88/55555558773//)   ZA:i:209    ZG:i:364    ZB:i:30 ZC:B:i,364,359,2,0  ZF:i:25 RG:Z:ENG7L.IonXpress_071    MD:Z:36A94A3^CG20^G31   NM:i:10 AS:i:132    XM:i:189    XA:Z:map4-1 XS:i:-2147483647
ENG7L:08673:09086   16  chr1    981095  91  148M1I17M   *   0   0   CCCCGATTTTCCCCAAAGCCTGCAGCTGTGATCCCCAAGGCGCCGTGCGGGATGACTGTGAGCAGATGACGGGGCTGTGCTCGTGTAAGCCCGGGGTGGCTGGACCCAAGTGTGGGCAGTGTCCAGACGGCCGTGCCCTGGGCCCCGCGGGGCTGTGAAGCTGGTG  &333360::3'333(663.336--...57555.88527089:6877;727787959:=9998<<::::=<2<<;;;;;;<<<<;883<=5=;0;;;;6;;;3882=6076<<<7<;;;<<<<9==>=>9>9<;93.4443:9088:65)77779:;;7<;;?7:::  ZA:i:214    ZG:i:364    ZB:i:30 ZC:B:i,364,359,1,0  ZF:i:28 RG:Z:ENG7L.IonXpress_071    MD:Z:165    NM:i:1  AS:i:158    XM:i:165    XA:Z:map4-1 XS:i:-2147483647

bwa-mem Alignment
ENG7L:02601:06989   0   chr1    981051  60  44M1I10M2I32M1I11M1I38M2D20M1D31M18S    *   0   0   GGAGGAGGTGGGGAAGCCCGTCCAGGTGCGGACATCGCGTTCCTCCCCCGATTTTTCCCCCAAAGCCTGCAGCTGTGATCCCCAAGGCGCCCGTGCGGGATGGACTGTGAGCAGATGACGGGGCTGTGCTCGTGTACGCCGGGTGGCTGGACCCAAGTGTGGCAGTGTCCAGACGGCCGTGCCCTGGGCCCGCGGCTGTGAGCTGGTGG   ;658/5565777)/27991://,/85;89:8=;798;;78076:9;;;0;<<<==1==>>+88+7:4::<;;<><<<<9899*84:0////)//.,,,(,//)/6666::6333766333::2<<;;;<;8777/////)--)-9)//730//)/27;;995:::555848:855056::555/58=3>>4:88/55555558773//)   NM:i:10 MD:Z:36A94A3^CG20^G31   AS:i:132    XS:i:20
ENG7L:08673:09086   16  chr1    981095  60  148M1I17M   *   0   0   CCCCGATTTTCCCCAAAGCCTGCAGCTGTGATCCCCAAGGCGCCGTGCGGGATGACTGTGAGCAGATGACGGGGCTGTGCTCGTGTAAGCCCGGGGTGGCTGGACCCAAGTGTGGGCAGTGTCCAGACGGCCGTGCCCTGGGCCCCGCGGGGCTGTGAAGCTGGTG  &333360::3'333(663.336--...57555.88527089:6877;727787959:=9998<<::::=<2<<;;;;;;<<<<;883<=5=;0;;;;6;;;3882=6076<<<7<;;;<<<<9==>=>9>9<;93.4443:9088:65)77779:;;7<;;?7:::  NM:i:1  MD:Z:165    AS:i:158    XS:i:0

I changed the position of the MD tag for the TMAP generated BAM for the two reads above to be immediately after the quality string and JBrowse does then parse the tag and show SNPs correctly. If I move the MD tag to be the 5th tag of the SAM string, then it's no longer parsed and SNPs aren't displayed.

I'm not experienced at all with javascript so I can't really figure out where this problem is originating from other than the BAM parser isn't checking all tags to see if it finds the MD tag.

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Jul 7, 2016

Contributor

I think the main issue is connected to this #568
In this the IonTorrent is also used, but the specific issue about MD tags that you raise is interesting

Basically it seems to stop parsing other BAM tags after it emits this message "Unknown BAM tag type 'B', tags may be incomplete", hence why you saw that putting it earlier in the list fixed it

It seems like commenting out this line can fix the issue

diff --git a/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js b/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js
index 772736c..6220a91 100644
--- a/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js
+++ b/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js
@@ -334,7 +334,7 @@ var Feature = Util.fastDeclare(
                               +"', tags may be incomplete"
                             );
                 value = undefined;
-                p = blockEnd; // stop parsing tags
+                //p = blockEnd; // stop parsing tags
             }

Alternatively, parsing the B tag correctly would help as in #568

Contributor

cmdcolin commented Jul 7, 2016

I think the main issue is connected to this #568
In this the IonTorrent is also used, but the specific issue about MD tags that you raise is interesting

Basically it seems to stop parsing other BAM tags after it emits this message "Unknown BAM tag type 'B', tags may be incomplete", hence why you saw that putting it earlier in the list fixed it

It seems like commenting out this line can fix the issue

diff --git a/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js b/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js
index 772736c..6220a91 100644
--- a/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js
+++ b/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js
@@ -334,7 +334,7 @@ var Feature = Util.fastDeclare(
                               +"', tags may be incomplete"
                             );
                 value = undefined;
-                p = blockEnd; // stop parsing tags
+                //p = blockEnd; // stop parsing tags
             }

Alternatively, parsing the B tag correctly would help as in #568

@mcnelsonphd

This comment has been minimized.

Show comment
Hide comment
@mcnelsonphd

mcnelsonphd Jul 7, 2016

This is definitely a parsing issue, and does appear to be related to #568. What appears to be occurring is that the ZC tag is what is causing the issue as it has a value flag of B, which indicates that the string is an integer or numeric array, but this is then followed by an i and then the value string. It looks like the parser is reading the B and choking instead of disregarding the tag and moving on.

This is definitely a parsing issue, and does appear to be related to #568. What appears to be occurring is that the ZC tag is what is causing the issue as it has a value flag of B, which indicates that the string is an integer or numeric array, but this is then followed by an i and then the value string. It looks like the parser is reading the B and choking instead of disregarding the tag and moving on.

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Jul 7, 2016

Contributor

Ya, it seems to intentionally choke when it doesn't see a tag it can parse. Simply ignoring it as I illustrated doesn't seem like a good idea because then it gets some junk in the next iterations. Just for the sake of actually parsing the 'B' field, here is an example that handles the integer array case of the B tag.

diff --git a/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js b/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js
index 772736c..a88c5ee 100644
--- a/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js
+++ b/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js
@@ -329,6 +329,21 @@ var Feature = Util.fastDeclare(
                     }
                 }
                 break;
+            case 'b':
+                value = '';
+                var cc = byteArray[p++];
+                var type = String.fromCharCode(cc);
+                if( type == 'i' ) {
+                    var limit = readInt( byteArray, p )
+                    p += 4;
+                    for( var k = 0; k < limit; k++ ) {
+                        value += readInt( byteArray, p );
+                        if( k+1 < limit ) value += ',';
+                        p += 4;
+                    }
+                }
+                break;

I imagine the 'numeric' case would be similar, but with type == 'f' and readFloat instead of readInt or something.

"The letter can be one of ‘cCsSiIf’, corresponding to int8_t (signed 8-bit integer),
uint8_t (unsigned 8-bit integer), int16_t, uint16_t, int32_t, uint32_t and float, respectively."

Contributor

cmdcolin commented Jul 7, 2016

Ya, it seems to intentionally choke when it doesn't see a tag it can parse. Simply ignoring it as I illustrated doesn't seem like a good idea because then it gets some junk in the next iterations. Just for the sake of actually parsing the 'B' field, here is an example that handles the integer array case of the B tag.

diff --git a/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js b/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js
index 772736c..a88c5ee 100644
--- a/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js
+++ b/src/JBrowse/Store/SeqFeature/BAM/LazyFeature.js
@@ -329,6 +329,21 @@ var Feature = Util.fastDeclare(
                     }
                 }
                 break;
+            case 'b':
+                value = '';
+                var cc = byteArray[p++];
+                var type = String.fromCharCode(cc);
+                if( type == 'i' ) {
+                    var limit = readInt( byteArray, p )
+                    p += 4;
+                    for( var k = 0; k < limit; k++ ) {
+                        value += readInt( byteArray, p );
+                        if( k+1 < limit ) value += ',';
+                        p += 4;
+                    }
+                }
+                break;

I imagine the 'numeric' case would be similar, but with type == 'f' and readFloat instead of readInt or something.

"The letter can be one of ‘cCsSiIf’, corresponding to int8_t (signed 8-bit integer),
uint8_t (unsigned 8-bit integer), int16_t, uint16_t, int32_t, uint32_t and float, respectively."

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Jul 7, 2016

Contributor

Posted a potential fix :) #783

Contributor

cmdcolin commented Jul 7, 2016

Posted a potential fix :) #783

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Jul 7, 2016

Contributor

Ref #783 fix

Contributor

cmdcolin commented Jul 7, 2016

Ref #783 fix

@cmdcolin cmdcolin closed this Jul 7, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment