Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
flatfile-to-json.pl - insufficient data yields incomplete histograms #612
When loading flat genome data through “flatfile-to-json.pl”, JBrowse uses some form of heuristic method to generate histograms. Unfortunately, in the absence of sufficient data, the method outputs either incomplete or non-existent histograms. In the worst-case scenario, users of the application will wind up at an “infinite” loading box for feature histograms or may even encounter large blank spaces on a given chromosome within the JBrowse genome viewer, which [falsely] suggests that a given track does not have data.
Submitted by Mary Shimoyama
I disucssed this bug via email with Aurash a long time ago. Essentially, I think they just had a bad track configuration.
They said in their email that they also had this error:
Therefore, I think the issue with the histograms is just a red herring for the bad track config
I do not think this issue stems from an allegedly faulty track configuration, as that would [only] hamper actual display of track features within the genome browser. Even when JBrowse attempts to write the actual histogram files following track insertion via "flatfile-to-json.pl", it fails to do so--see below:
Please take special note of how chromosomes 3 through 5 lack histograms entirely--as our team suspects, this may be stemming from a logic issue (or set of issues) within the "flatfile-to-json.pl" script--it is somehow failing to generate histograms on those chromosomes, despite actual features being present.
Further, "flatfile-to-json.pl" is not throwing any warnings or errors while loading these tracks--it makes every indication of a successful track loading process.
So, what does this actually look like in our development instance of JBrowse?
The top-most track is the genes and transcripts track for the rn5 genome, at full feature density. The "DEBUG_10/25/50" tracks represent a 10x, 25x, and 50x reduction in feature density for the top-most track, respectively, to which the bottom-most track possesses only 2% of its original feature density.
As can be seen, the histograms predictably grow sparser and coarser as feature density diminishes, until they drop off completely in "DEBUG_50", but is that track truly devoid of gene features?
No, it is not empty, and JBrowse should still be ideally generating histograms for this track.
GZipped copies of the GFF3s used in this test can be found below:
The genome assembly used as the reference sequence for this test was Rnor v5.0 (e.g. Rat 5) from NCBI.
Finally, please note that this test was performed on the latest public release of JBrowse, version 1.11.6.
I've been spending some time trying to workaround this histogram problem, and I generated BigWigs from the GFF3s in question--but, when using the NCList storeClass of JBrowse, it does not appear (from my use cases in 1.11.6) to support custom histogram specification.
It is well-known that the BAM storeClass allows for custom user specification of histograms, such as BigWigs, and to my delight, the GFF3 storeClass (enabling direct reading of GFF3s without needing to use "flatfile-to-json.pl" apriori) also supports custom histogram specification!
Unfortunately, after attempting to load a production-grade GFF3 (on the order of hundreds of Megabytes), the genome browser crashed even more rapidly than it does with poorly configured BAM data--in other words, the GFF3 parser of JBrowse is non-ideal. It's pretty slow, and even the source commenting of its main method declares that it requires significant refactoring.
In any case, here's an idea that could enable a workaround solution without needing too much effort from the JBrowse team: why not enable custom histogram specification for the NCList storeClass? This sort of code already exists for BAM and GFF3 storeClasses, so why not port this code over to the NCList storeClass as an optional user definition?
That way, users can generate their own Bedgraphs and Wiggles; they can define and take responsibility for their own histograms without requiring an extensive examination and/or re-write of the existing histogram generation method(s) in "flatfile-to-json.pl".
This suggestion is being made with the knowledge in-mind that JBrowse development resources are limited. What do you think?
On Fri, Jul 24, 2015 at 8:36 AM, halfwayBraindead email@example.com
added a commit
Jul 27, 2015
I think I found a patch that makes the histogram section of the config file usable with tracks that were run with flatfile-to-json. You can check it out on the master jbrowse branch
Checked out and tested the "Master" branch, and the histogram storeClass works well now within NCList tracks! A definite plus.
Another significant issue with this approach cropped up, though:
As can be seen, the scaling of BigWig histogram bars seems "off", and even when inserting a basic BigWig track (whole track dedicated to BigWig file) from the configuration guide, the same issue appears:
Turns out that the "autoscale" parameter resolves this scaling issue for BigWig tracks, though--the configuration guide claims that it's set to the value of "local" by default, but it appears not to be in the "Master" branch (default appears to be the "global" value):
Unfortunately, no counterpart method appears to exist for the histogram storeClass within CanvasFeatures tracks such as GFF3/BAM/NCList - would be great to port that sort of code over!
(And, also, to greatly expand the feature capability of the histogram storeClass within CanvasFeatures tracks: as Rob wrote previously, "all you can really change about how it looks is its color". Such a venture would likely tie-in closely with #624.)