Genotypes wrongly displayed in VCF tracks (JBrowse version 1.11.3) #488

Closed
gitcruz opened this Issue Jun 10, 2014 · 6 comments

Comments

Projects
None yet
3 participants
@gitcruz

gitcruz commented Jun 10, 2014

Hi,

We are currently using JBrowse 1.11.3 to display different features of newly assembled genomes. The current version we use (1.11.3) do not display heterozygote genotypes properly.

This is the first line of the VCF:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT D_A007 S_A005 S_A002 D_A010 S_A001 S_A006 S_A008 D_A003 S_B992 S_B991 D_A009

lp23s00001 1254 . G A 5710.69 PASS AC=14;AF=0.636;AN=22;BaseQRankSum=-2.405;DP=315;Dels=0.00;FS=4.126;HaplotypeScore=0.8621;InbreedingCoeff=0.2139;MLEAC=14;MLEAF=0.636;MQ=51.05;MQ0=23;MQRankSum=1.793;QD=21.88;ReadPosRankSum=0.086;EFF=INTERGENIC(MODIFIER||||||||) GT:AD:DP:GQ:PL 1/1:0,20:21:54:693,54,0 0/1:14,16:30:99:545,0,411 0/0:25,0:25:69:0,69,925 1/1:2,23:26:26:768,26,0 0/1:13,8:21:99:198,0,252 0/0:28,0:28:75:0,75,972 0/1:17,18:35:99:483,0,522 1/1:1,33:34:96:1152,96,0 0/1:17,15:32:99:485,0,481 1/1:0,20:21:48:622,48,0 1/1:2,28:30:69:832,69,0

But in the browser the genotype table says (I added pipes to separate columns here):

variant 9 81.82%
homozygous 5 45.45%
A variant 5 45.45%
heterozygous 4 36.36%
non-variant 2 18.18%
homozygous for reference 2 18.18%
Total 11 100%
Name | GT | AD | DP | GQ | PL
S_A001 | ref (G) | 13 8 | 21 | 99 | 198 0 252
S_A002 | ref (G) | 25 0| 25 | 69 | 0 69 925
S_A005 | ref (G) | 14 16 | 30 | 99 | 545 0 411

Thus, JBrowse counts 4 heterozygous sites but then shows same genotype in the table. The only way to distinguish them is the coverage support of each allele (DP), S_A001 and S_A005 having support for both in the example. However, the three genotypes are labeled as ref(G), instead of G/A.

Do you know how to fix this?

Just in case, gives some more clue:

  • I compressed the VCF using "bgzip" and then was indexed with "tabix" version 0.2.5 (r1005).
  • the track in trackList.json looks like this:
    {
    "tbiUrlTemplate" : "tracks/VCFs/Rubioseq.eff.names.clean_header.vcf.gz
    .tbi",
    "storeClass" : "JBrowse/Store/SeqFeature/VCFTabix",
    "urlTemplate" : "tracks/VCFs/Rubioseq.eff.names.clean_header.vcf.gz",
    "type" : "JBrowse/View/Track/HTMLVariants",
    "label" : "variation_rubioseq",
    "key" : "Variation: SNPs"
    },

Thanks in advance,
Nando

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Jun 11, 2014

Contributor

I see what you mean after looking into it (it's just that the full genotype table doesn't show the alternative alleles correctly if i understand correctly)

Apparently this code actually worked in JBrowse 1.10.6 but has since been changed to a non-working code here
b76357b

This commit I think is made to be account for rendering "multipart" divs inside the genotype table, such as shown in this screenshot

screenshot-localhost 2014-06-11 01-31-13

However, the genotype field is of course the real issue. The code for "_mungeGenotypeValue" was introduced in this version but it loses functionality from older code that it was refactoring from. I propose the following fix

diff --git a/src/JBrowse/View/Track/_VariantDetailMixin.js b/src/JBrowse/View/Track/_VariantDetailMixin.js
index a56186f..c0f9137 100644
--- a/src/JBrowse/View/Track/_VariantDetailMixin.js
+++ b/src/JBrowse/View/Track/_VariantDetailMixin.js
@@ -137,11 +137,14 @@ return declare( [FeatureDetailMixin, NamedFeatureFiltersMixin], {
 _mungeGenotypeVal: function( value, fieldname, alt, underlyingRefSeq ) {
     if( fieldname == 'GT' ) {
         // handle the GT field specially, translating the genotype indexes into the actual ALT strings
+        var value_parse = value.values[0];
+
+        var splitter = value_parse.match(/\D/g)[0];
         var refseq = underlyingRefSeq ? 'ref ('+underlyingRefSeq+')' : 'ref';
-        value = array.map( value.values, function( gtIndex ) {
+        value = array.map( value_parse.split(splitter), function( gtIndex ) {
                                gtIndex = parseInt( gtIndex );
                                return gtIndex ? ( alt ? alt[gtIndex-1] : gtIndex ) : refseq;
-                           });
+                           }).join( ' '+splitter+' ' );
     }
     return value;
 },

Given that I don't think that there are going to be "multipart" genotypes (no need to draw little boxes like in my screenshot), then i think using "var value_parse = value.values[0];" is safe. I'm not actually sure what the code was doing before, because it was receiving data such as array.map(["0|0"],...) for example, and then the line "gtIndex = parseInt( gtIndex );" would receive a string like "0|0" which would just return 0 and thus we lose the genotype information.

I think we need to do the "splitting" and "joining" of the string, and i'm not sure why this disappeared from the code, so that's what my proposed fix does!

Contributor

cmdcolin commented Jun 11, 2014

I see what you mean after looking into it (it's just that the full genotype table doesn't show the alternative alleles correctly if i understand correctly)

Apparently this code actually worked in JBrowse 1.10.6 but has since been changed to a non-working code here
b76357b

This commit I think is made to be account for rendering "multipart" divs inside the genotype table, such as shown in this screenshot

screenshot-localhost 2014-06-11 01-31-13

However, the genotype field is of course the real issue. The code for "_mungeGenotypeValue" was introduced in this version but it loses functionality from older code that it was refactoring from. I propose the following fix

diff --git a/src/JBrowse/View/Track/_VariantDetailMixin.js b/src/JBrowse/View/Track/_VariantDetailMixin.js
index a56186f..c0f9137 100644
--- a/src/JBrowse/View/Track/_VariantDetailMixin.js
+++ b/src/JBrowse/View/Track/_VariantDetailMixin.js
@@ -137,11 +137,14 @@ return declare( [FeatureDetailMixin, NamedFeatureFiltersMixin], {
 _mungeGenotypeVal: function( value, fieldname, alt, underlyingRefSeq ) {
     if( fieldname == 'GT' ) {
         // handle the GT field specially, translating the genotype indexes into the actual ALT strings
+        var value_parse = value.values[0];
+
+        var splitter = value_parse.match(/\D/g)[0];
         var refseq = underlyingRefSeq ? 'ref ('+underlyingRefSeq+')' : 'ref';
-        value = array.map( value.values, function( gtIndex ) {
+        value = array.map( value_parse.split(splitter), function( gtIndex ) {
                                gtIndex = parseInt( gtIndex );
                                return gtIndex ? ( alt ? alt[gtIndex-1] : gtIndex ) : refseq;
-                           });
+                           }).join( ' '+splitter+' ' );
     }
     return value;
 },

Given that I don't think that there are going to be "multipart" genotypes (no need to draw little boxes like in my screenshot), then i think using "var value_parse = value.values[0];" is safe. I'm not actually sure what the code was doing before, because it was receiving data such as array.map(["0|0"],...) for example, and then the line "gtIndex = parseInt( gtIndex );" would receive a string like "0|0" which would just return 0 and thus we lose the genotype information.

I think we need to do the "splitting" and "joining" of the string, and i'm not sure why this disappeared from the code, so that's what my proposed fix does!

@cmdcolin cmdcolin added this to the 1.11.5 milestone Jun 11, 2014

@gitcruz

This comment has been minimized.

Show comment
Hide comment
@gitcruz

gitcruz Jun 11, 2014

Thanks for your quick reponse Colin,

we applied the patch and it works. Now genotypes look fine, in the example:
S_A005 |
ref (G) / A

S_A006
| ref (G) / ref (G)

S_A008
| ref (G) / A

Cheers,

On 11 June 2014 08:41, Colin notifications@github.com wrote:

I see what you mean after looking into it (it's just that the full
genotype table doesn't show the alternative alleles correctly if i
understand correctly)

Apparently this code actually worked in JBrowse 1.10.6 but has since been
changed to a non-working code here
b76357b
b76357b

This commit I think is made to be account for rendering "multipart" divs
inside the genotype table, such as shown in this screenshot

[image: screenshot-localhost 2014-06-11 01-31-13]
https://cloud.githubusercontent.com/assets/6511937/3240769/4ee6f792-f132-11e3-9d0d-8e79f207b864.png

However, the genotype field is of course the real issue. The code was
introduced to "_mungeGenotypeValue" here, it loses functionality from older
code that it was refactoring from. I propose the following fix

diff --git a/src/JBrowse/View/Track/_VariantDetailMixin.js
b/src/JBrowse/View/Track/_VariantDetailMixin.js
index a56186f..c0f9137 100644
--- a/src/JBrowse/View/Track/_VariantDetailMixin.js
+++ b/src/JBrowse/View/Track/_VariantDetailMixin.js
@@ -137,11 +137,14 @@ return declare( [FeatureDetailMixin,
NamedFeatureFiltersMixin], {
_mungeGenotypeVal: function( value, fieldname, alt, underlyingRefSeq ) {
if( fieldname == 'GT' ) {
// handle the GT field specially, translating the genotype indexes into
the actual ALT strings

  • var value_parse = value.values[0]; +
  • var splitter = value_parse.match(/\D/g)[0]; var refseq =
    underlyingRefSeq ? 'ref ('+underlyingRefSeq+')' : 'ref';
  • value = array.map( value.values, function( gtIndex ) {
  • value = array.map( value_parse.split(splitter), function( gtIndex )
    { gtIndex = parseInt( gtIndex ); return gtIndex ? ( alt ? alt[gtIndex-1] :
    gtIndex ) : refseq;
  • });
  • }).join( ' '+splitter+' ' ); } return value; },

Given that I don't think that there are going to be "multipart" genotypes
(no need to draw little boxes like in my screenshot), then i think using
"var value_parse = value.values[0];" is safe. I'm not actually sure what
the code was doing before, because it was receiving data such as
array.map(["0|0"],...) for example, and then the line "gtIndex = parseInt(
gtIndex );" would receive a string like "0|0" which would just return 0 and
thus we lose the genotype information.

I think we need to do the "splitting" and "joining" of the string, and i'm
not sure why this disappeared from the code, so that's what my proposed fix
does!


Reply to this email directly or view it on GitHub
#488 (comment).

Nando

gitcruz commented Jun 11, 2014

Thanks for your quick reponse Colin,

we applied the patch and it works. Now genotypes look fine, in the example:
S_A005 |
ref (G) / A

S_A006
| ref (G) / ref (G)

S_A008
| ref (G) / A

Cheers,

On 11 June 2014 08:41, Colin notifications@github.com wrote:

I see what you mean after looking into it (it's just that the full
genotype table doesn't show the alternative alleles correctly if i
understand correctly)

Apparently this code actually worked in JBrowse 1.10.6 but has since been
changed to a non-working code here
b76357b
b76357b

This commit I think is made to be account for rendering "multipart" divs
inside the genotype table, such as shown in this screenshot

[image: screenshot-localhost 2014-06-11 01-31-13]
https://cloud.githubusercontent.com/assets/6511937/3240769/4ee6f792-f132-11e3-9d0d-8e79f207b864.png

However, the genotype field is of course the real issue. The code was
introduced to "_mungeGenotypeValue" here, it loses functionality from older
code that it was refactoring from. I propose the following fix

diff --git a/src/JBrowse/View/Track/_VariantDetailMixin.js
b/src/JBrowse/View/Track/_VariantDetailMixin.js
index a56186f..c0f9137 100644
--- a/src/JBrowse/View/Track/_VariantDetailMixin.js
+++ b/src/JBrowse/View/Track/_VariantDetailMixin.js
@@ -137,11 +137,14 @@ return declare( [FeatureDetailMixin,
NamedFeatureFiltersMixin], {
_mungeGenotypeVal: function( value, fieldname, alt, underlyingRefSeq ) {
if( fieldname == 'GT' ) {
// handle the GT field specially, translating the genotype indexes into
the actual ALT strings

  • var value_parse = value.values[0]; +
  • var splitter = value_parse.match(/\D/g)[0]; var refseq =
    underlyingRefSeq ? 'ref ('+underlyingRefSeq+')' : 'ref';
  • value = array.map( value.values, function( gtIndex ) {
  • value = array.map( value_parse.split(splitter), function( gtIndex )
    { gtIndex = parseInt( gtIndex ); return gtIndex ? ( alt ? alt[gtIndex-1] :
    gtIndex ) : refseq;
  • });
  • }).join( ' '+splitter+' ' ); } return value; },

Given that I don't think that there are going to be "multipart" genotypes
(no need to draw little boxes like in my screenshot), then i think using
"var value_parse = value.values[0];" is safe. I'm not actually sure what
the code was doing before, because it was receiving data such as
array.map(["0|0"],...) for example, and then the line "gtIndex = parseInt(
gtIndex );" would receive a string like "0|0" which would just return 0 and
thus we lose the genotype information.

I think we need to do the "splitting" and "joining" of the string, and i'm
not sure why this disappeared from the code, so that's what my proposed fix
does!


Reply to this email directly or view it on GitHub
#488 (comment).

Nando

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Jun 11, 2014

Contributor

Just to be clear, does the table now display all of the genotypes correctly? You only posted 3 genotypes here but your original post mentioned that there were 11

Contributor

cmdcolin commented Jun 11, 2014

Just to be clear, does the table now display all of the genotypes correctly? You only posted 3 genotypes here but your original post mentioned that there were 11

@gitcruz

This comment has been minimized.

Show comment
Hide comment
@gitcruz

gitcruz Jun 11, 2014

Yes, it does. tTe 11 genotypes are properly reported now. Sorry, I just
reported three in my example to pinpoint the problem.

Cheers,

On 11 June 2014 15:05, Colin notifications@github.com wrote:

Just to be clear, does the table now display all of the genotypes
correctly? You only posted 3 genotypes here but your original post
mentioned that there were 11


Reply to this email directly or view it on GitHub
#488 (comment).

Nando

gitcruz commented Jun 11, 2014

Yes, it does. tTe 11 genotypes are properly reported now. Sorry, I just
reported three in my example to pinpoint the problem.

Cheers,

On 11 June 2014 15:05, Colin notifications@github.com wrote:

Just to be clear, does the table now display all of the genotypes
correctly? You only posted 3 genotypes here but your original post
mentioned that there were 11


Reply to this email directly or view it on GitHub
#488 (comment).

Nando

@cmdcolin cmdcolin closed this in 126bdd9 Jun 11, 2014

@biojiangke

This comment has been minimized.

Show comment
Hide comment
@biojiangke

biojiangke Aug 6, 2014

After commit this change, it seems that I still have problems showing the correct genotypes: I don't get the genotypes with bases separated by a slash (/). I wonder whether this is the same problem raised here or a different one. The multi-sample VCF I'm trying to visualize looks like this on JBrowse:

Name GT
sbh.rmdup.bam ref (A)
sla.rmdup.bam T
srs.rmdup.bam ref (A)

Shouldn't those be ref(A)/ref(A), T/T, and ref(A)/ref(A)?

Thanks!

Ke

After commit this change, it seems that I still have problems showing the correct genotypes: I don't get the genotypes with bases separated by a slash (/). I wonder whether this is the same problem raised here or a different one. The multi-sample VCF I'm trying to visualize looks like this on JBrowse:

Name GT
sbh.rmdup.bam ref (A)
sla.rmdup.bam T
srs.rmdup.bam ref (A)

Shouldn't those be ref(A)/ref(A), T/T, and ref(A)/ref(A)?

Thanks!

Ke

@biojiangke

This comment has been minimized.

Show comment
Hide comment
@biojiangke

biojiangke Aug 7, 2014

Apparently, restarting the browser and clearing the caches fixed this problem. Thanks!

Apparently, restarting the browser and clearing the caches fixed this problem. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment