-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Visualization of genome track #441
Changes from 11 commits
1dc7529
f11b82a
cdd8226
c288d42
b7e0d6a
95f6810
e3d7ba4
9627645
637d9f9
cf381ae
f32698c
1746654
8f50dd4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,8 @@ export type Variant = { | |
position: number; | ||
ref: string; | ||
alt: string; | ||
id: string; | ||
significantFrequency: ?number; | ||
vcfLine: string; | ||
} | ||
|
||
|
@@ -41,12 +43,28 @@ function extractLocusLine(vcfLine: string): LocusLine { | |
|
||
function extractVariant(vcfLine: string): Variant { | ||
var parts = vcfLine.split('\t'); | ||
var frequency = null; | ||
if (parts.length>=7){ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if this is a multi-sample VCF and the first sample is not relevant for the user? I am asking this; because I know that mutect can sometimes order the samples in a way that the normal sample comes before the tumor one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As far as I know vcf can contain multiple variants regarding single nucleotide. It can be achieved in two ways:
Regarding the first issue - there is nothing you can do in term of visualization apart from providing information in the popup. We could think about coloring or some fancy way of highlighting the situation, but in my opinion it's not intuitive. When you have two (or more) overlaying variants I would also suggest to put it in the popup. I will fix my PR to handle this situation (right now you have only first element that match click in the popup). If the behaviour is not the one user expected I think the best way to go is suggest user to filter data from vcf file and provide filtered results. Anyway, this reminds me about one other issue: when you present gene variant it should span through the whole modified region. Right now every variant is visualized by rectangle on the first nucleotide only. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. gotcha! I think your solution regarding multiple variants makes sense for the first pass as long as we provide the additional information back to the callback function so users will have a way to know if such is the case. Another option is to parse the variant header and provide the developer a way to only use information from a particular column. For example the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wouldn't go that far. In most cases users know what data they visualize and when they know that there are two overlapping data sets they should separate them in different tracks (at least this is what I would do). |
||
var params = parts[7].split(';'); | ||
for (var i=0;i<params.length;i++) { | ||
var param = params[i]; | ||
if (param.startsWith("AF=")) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I took it from standard definition: http://samtools.github.io/hts-specs/VCFv4.3.pdf There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool - thanks for checking that! Was just curious whether we should be more inclusive, but looks like not 👍 |
||
frequency = 0.0; | ||
var frequenciesStrings = param.substr(3).split(","); | ||
for (var j=0;j<frequenciesStrings.length;j++) { | ||
frequency = Math.max(frequency,parseFloat(frequenciesStrings[j])); | ||
} | ||
} | ||
} | ||
} | ||
|
||
return { | ||
contig: parts[0], | ||
position: Number(parts[1]), | ||
id: parts[2], | ||
ref: parts[3], | ||
alt: parts[4], | ||
significantFrequency: frequency, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we call this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason is that single vcf entry can contain information about more than one gene variant:
The alternative to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That makes sense. Can we name those two alternatives as minor vs major allele frequency just to be in line with the common population biology terminology? It would also be great to mention why pick either of these in the comments right at the top of the code block so that future developers won't get confused. |
||
vcfLine | ||
}; | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
/** | ||
* @flow | ||
*/ | ||
'use strict'; | ||
|
||
import {expect} from 'chai'; | ||
|
||
import pileup from '../../main/pileup'; | ||
import dataCanvas from 'data-canvas'; | ||
import {waitFor} from '../async'; | ||
|
||
import ReactTestUtils from 'react-addons-test-utils'; | ||
|
||
describe('VariantTrack', function() { | ||
var testDiv = document.getElementById('testdiv'); | ||
|
||
beforeEach(() => { | ||
testDiv.style.width = '700px'; | ||
dataCanvas.RecordingContext.recordAll(); | ||
}); | ||
|
||
afterEach(() => { | ||
dataCanvas.RecordingContext.reset(); | ||
// avoid pollution between tests. | ||
testDiv.innerHTML = ''; | ||
}); | ||
var {drawnObjects} = dataCanvas.RecordingContext; | ||
|
||
function ready() { | ||
return testDiv.getElementsByTagName('canvas').length > 0 && | ||
drawnObjects(testDiv, '.variants').length > 0; | ||
} | ||
|
||
it('should render variants', function() { | ||
var popupId = null; | ||
var getPopupTitle = function (id) { | ||
popupId = id; | ||
return "hello world, "+id; | ||
}; | ||
var p = pileup.create(testDiv, { | ||
range: {contig: '17', start: 9386380, stop: 9537390}, | ||
tracks: [ | ||
{ | ||
viz: pileup.viz.genome(), | ||
data: pileup.formats.twoBit({ | ||
url: '/test-data/test.2bit' | ||
}), | ||
isReference: true | ||
}, | ||
{ | ||
data: pileup.formats.vcf({ | ||
url: '/test-data/test.vcf' | ||
}), | ||
viz: pileup.viz.variants(), | ||
options: {getPopupTitleByVariantId: getPopupTitle} | ||
} | ||
] | ||
}); | ||
|
||
return waitFor(ready, 2000) | ||
.then(() => { | ||
var variants = drawnObjects(testDiv, '.variants'); | ||
expect(variants.length).to.be.equal(1); | ||
var canvasList = testDiv.getElementsByTagName('canvas'); | ||
var canvas = canvasList[1]; | ||
expect(popupId).to.be.null; | ||
|
||
//check clicking on variant | ||
ReactTestUtils.Simulate.click(canvas,{nativeEvent: {offsetX: -0.5, offsetY: -15.5}}); | ||
|
||
expect(popupId).to.not.be.null; | ||
p.destroy(); | ||
}); | ||
}); | ||
|
||
}); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
##fileformat=VCFv4.1 | ||
##source=VarScan2 | ||
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases"> | ||
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation"> | ||
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)"> | ||
##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value"> | ||
##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls"> | ||
##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls"> | ||
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand"> | ||
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position"> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> | ||
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> | ||
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)"> | ||
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)"> | ||
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency"> | ||
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev"> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR | ||
20 61795 . G T . PASS DP=81;SS=1;SSC=2;GPV=4.6768E-16;SPV=5.4057E-1;AF=0.7 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:44:22:22:50%:16,6,9,13 0/1:.:37:18:19:51.35%:10,8,10,9 | ||
20 62731 . C A,G . PASS DP=68;SS=1;SSC=1;GPV=1.4855E-11;SPV=7.5053E-1;AF=0.4,0.5 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:32:17:15:46.88%:9,8,9,6 0/1:.:36:21:15:41.67%:8,13,8,7 | ||
20 61731 . C A,G,T . PASS DP=68;SS=1;SSC=1;GPV=1.4855E-11;SPV=7.5053E-1;AF=0.4,0.6,0.3 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:32:17:15:46.88%:9,8,9,6 0/1:.:36:21:15:41.67%:8,13,8,7 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
##fileformat=VCFv4.1 | ||
##source=VarScan2 | ||
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases"> | ||
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation"> | ||
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)"> | ||
##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value"> | ||
##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls"> | ||
##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls"> | ||
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand"> | ||
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position"> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> | ||
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> | ||
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)"> | ||
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)"> | ||
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency"> | ||
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev"> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR | ||
17 9386385 . G T . PASS DP=81;SS=1;SSC=2;GPV=4.6768E-16;SPV=5.4057E-1;AF=0.7 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:44:22:22:50%:16,6,9,13 0/1:.:37:18:19:51.35%:10,8,10,9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove this stylesheet since we are not making use of it anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind - let me do that to save some time.