-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Visualization of genome track #441
Conversation
…o attach dynamically generated data (instead from files it will be taken from a string source that represents content of the file)
* fix on popup when clicking on variant (up to now there was a console error) * popup is a bootstrap modal window (not just old alert function) * if possible (and requested by user), gene variant height depends on the allel frequency of specific variant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this PR, @piotr-gawron! This is a great improvement over the standard variant track and the variant details dialog.
However, pulling in bootstrap seems like an overkill just to show a dialog for the variants. Although it provides nice UI elements, this will come with the need to always include bootstrap CSS in all pileup.js applications, which might not be desired by the developer for the particular tool.
We were originally trying to keep the size of the pileup.js library as small as possible (hence the current custom/barebones UI elements) and bootstrap-react
is not the tiniest UI library.
I know that we should have probably done this earlier, but I think a better way to approach this issue would be to allow users to register callback functions on key events (such as clicks). This way, pileup.js could be more environment-agnostic and more customizable from the developer perspective. If we had this before, you could have just register your own functions that render bootstrap modals with variant info in them without the need to change the pileup.js code.
I am more than happy to implement this if you are not planning to do it.
The custom variant box height is a nice addition, so we can always keep that part. I have a few minor comments regarding the code that deals with the parsing.
Let me know what you think and sorry that it took me so long to get to code review.
Keep these PRs coming, please! We do appreciate them.
ref: parts[3], | ||
alt: parts[4], | ||
significantFrequency: frequency, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we call this significantFrequency
again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that single vcf entry can contain information about more than one gene variant:
20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB
In such situation we need to decide which allel frequency value use for the visualization.
I know about two strategies, but there might be more:
- take the smalles
- take the biggest (this one is currently implemented)
The alternative to significantFrequency
field is to store all the 'allel frequency' values and allow user to decide. But again we have to agree about the possible strategies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Can we name those two alternatives as minor vs major allele frequency just to be in line with the common population biology terminology? It would also be great to mention why pick either of these in the comments right at the top of the code block so that future developers won't get confused.
var params = parts[7].split(';'); | ||
for (var i=0;i<params.length;i++) { | ||
var param = params[i]; | ||
if (param.startsWith("AF=")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think AF
is an almost standard annotation for a majority of the VCF variants out there, but do you know if there are other abbreviations that people use as a synonym? If so, it might worth looking for those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took it from standard definition: http://samtools.github.io/hts-specs/VCFv4.3.pdf
I'm not aware of any other abbreviation that might refer to the frequency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool - thanks for checking that! Was just curious whether we should be more inclusive, but looks like not 👍
@@ -41,12 +43,28 @@ function extractLocusLine(vcfLine: string): LocusLine { | |||
|
|||
function extractVariant(vcfLine: string): Variant { | |||
var parts = vcfLine.split('\t'); | |||
var frequency = null; | |||
if (parts.length>=7){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this is a multi-sample VCF and the first sample is not relevant for the user? I am asking this; because I know that mutect can sometimes order the samples in a way that the normal sample comes before the tumor one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know vcf can contain multiple variants regarding single nucleotide. It can be achieved in two ways:
- one entry in vcf can correspond to more than one variant, take a look at this entry:
20 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
- two entries can refer to the same region in the genome (I think this is what you mentioned in the comment)
Regarding the first issue - there is nothing you can do in term of visualization apart from providing information in the popup. We could think about coloring or some fancy way of highlighting the situation, but in my opinion it's not intuitive.
When you have two (or more) overlaying variants I would also suggest to put it in the popup. I will fix my PR to handle this situation (right now you have only first element that match click in the popup).
If the behaviour is not the one user expected I think the best way to go is suggest user to filter data from vcf file and provide filtered results.
Anyway, this reminds me about one other issue: when you present gene variant it should span through the whole modified region. Right now every variant is visualized by rectangle on the first nucleotide only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha! I think your solution regarding multiple variants makes sense for the first pass as long as we provide the additional information back to the callback function so users will have a way to know if such is the case.
Another option is to parse the variant header and provide the developer a way to only use information from a particular column. For example the NORMAL TUMOR
column names in the example VCF files. AFAIK, those column names are arbitrary, so we can let the developer determine which one we pick for visualization. Do you think this makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't go that far. In most cases users know what data they visualize and when they know that there are two overlapping data sets they should separate them in different tracks (at least this is what I would do).
I understand - when library is light it's much easier to maintain (and it's more flexible). I will reimplement it accordingly - will remove bootstrap from dependencies and correct example with the registered callback function. However, there is still a question of passing options to pileup. Do you agree with the methodology that I used? (passing configuration options from top classes to bottom classes). |
thanks for the clarifications, @piotr-gawron! I think the way the way you implemented things for passing options to visualization tracks look reasonable. If this proves not that useful in the future, we can change the things to better accommodate other use cases. |
* bootstrap library removed * when user click on gene variant instead of poup user defined function will be called (if such method is not defined log information will appear in the console) * allel frequency is parsed in a way that maximum and minimum values are stored in vcf variant entry * user can define strategy that should be used when hight of the gene variant bars should depend on the alle frequency, two defined strategies are: - take max value of allel frequency for vcf entry - take min value of allel frequency for vcf entry
I reimplemented the fucntionality as disscussed. Below you can find my original comment which now is a little obsolete :). Due to some issues with nested default properties I had to modify, and However, I still think that you missuse My original comment:Let me first explain why travis failed: I used get method because I needed to specify default prop value for allel frequency strategy and this is the right way to do. Alternatively you can do something like: http://stackoverflow.com/a/36778923/1127920. However this extracts specificiation of the prop default outside of the class and I don't like the approach. I noticed also one more issue with your code: |
@armish I see that my PR has some pending requestes changes. Can you tell me what is it, or point to the point on github, because I cannot find it... |
@@ -3,6 +3,7 @@ | |||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | |||
<link rel="stylesheet" href="../style/pileup.css" /> | |||
<link rel="stylesheet" href="demo.css" /> | |||
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/latest/css/bootstrap.min.css"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove this stylesheet since we are not making use of it anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind - let me do that to save some time.
@armish |
This PR contains 3 main things:
Few comments regaring changes:
I decided to use bootstrap as a popup implementation because it was the easiest chocie for me. If you don't like the solution then I can use any other if you provide me with links to some examples/documentation. One of the reasons why you might not like it is the fact that it requires bootstrap css to be included (if not the modal popup is useless). Of course this can be included in pileup css if you want to keep the whole css stuff in one file, but I'm not sure what are you rstandards here.
user (programmer) can decide if he wants to use allel frequency as a hight of variant bar by providng options parameter when creating gene variant view. This requires a lot of passing config parameters: pileup->Root->VisualizationWrapper->VariantTrack. I couldn't find better solution, so if you know one please let me know and I will refactor.
This change is![Reviewable](https://camo.githubusercontent.com/23b05f5fb48215c989e92cc44cf6512512d083132bd3daf689867c8d9d386888/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)