New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF3Tabix issues #780

Closed
cmdcolin opened this Issue Jul 6, 2016 · 9 comments

Comments

Projects
None yet
4 participants
@cmdcolin
Contributor

cmdcolin commented Jul 6, 2016

There are some potential issues with the semi-new GFF3Tabix parser that, depending on your use case, may make it unusable

  1. The exons/CDS of a given feature can be missing from a given "View details" popup because of the way Tabix information is downloaded, i.e. on a block by block basis (so if exon is not in block, it is missed)
  2. The features may not render correctly by NeatCanvasFeatures for similar reasons to 1, because NeatCanvasFeatures needs complete information about exons to calculate intron hats
  3. The way that features are sorted in a tabix gff3 file can make the GFF3Tabix parser miss the first exon of a gene. For example, when creating the GFF3 tabix file, you would generally sort the gff3 by coordinate, but this can end up placing subfeatures before the parent feature in the sorting order, (i.e. if the exon and gene both share a start coordinate, sorting programs can arbitrarily place the exon before the gene line). This creates a valid tabix file, but the GFF3Tabix parser fails to find that exon. Just for reference, the standard GFF3 parser in jbrowse does not allow subfeatures occurring before parent feature line either.
  4. The features are not indexed by generate-names.pl. Technically VCFTabix tracks are indexed by generate-names.pl, so it might not be too big of a stretch to index GFF3Tabix as well

The workarounds IMO would be
(1) a custom "view details" box could be made for this case
(2) to not use neatCanvasFeatures, however, NeatCanvasFeatures is enabled on the sample browser, and there's no way to disable it on the specific track.
(3) to sort the GFF3 file carefully so that subfeatures don't occur before the parent feature -or- to make all information about a feature occur on a single line requiring more preprocessing of the original gff file (see #785)
(4) add support to generate-names.pl for gff tabix

Given the drawbacks we could remove gff3 tabix support entirely or address these issues over time.

Feedback welcome. Note: BEDTabix or GFF3Tabix of a file with only single level features wouldn't suffer any problems

@cmdcolin

This comment has been minimized.

Contributor

cmdcolin commented Jul 11, 2016

The third issue can possibly be fixed using this technique

https://github.com/GMOD/jbrowse/tree/update_tabix_sort

@cmdcolin

This comment has been minimized.

Contributor

cmdcolin commented Jul 12, 2016

The first issue could be solved potentially by addressing #559 (i.e. if we don't load all subfeatures until needed, it is not a problem)

@billzt

This comment has been minimized.

Contributor

billzt commented Jul 13, 2016

Well, I still want GFF3 tabix support, even if much workaround required. The traditional flatfile-to-json.pl script would generate huge number of small files, which makes backup of JBrowse data extremely difficult.

@billzt

This comment has been minimized.

Contributor

billzt commented May 9, 2017

Currently the third issue can be resolved by a Perl script: https://github.com/billzt/gff3sort

@cmdcolin

This comment has been minimized.

Contributor

cmdcolin commented May 10, 2017

@billzt awesome I'll check that out! I had used genometools with the linesort option to prepare gff3tabix before, but it actually was not perfect, so I will check out your script

billzt added a commit to billzt/jbrowse that referenced this issue Jun 16, 2017

add support to index names for gff tabix (GMOD#780)
add support to generate-names.pl for gff tabix

@nathandunn nathandunn referenced this issue Jan 19, 2018

Merged

Fix tabix histogram for CanvasFeatuers #956

9 of 9 tasks complete

@nathandunn nathandunn added the ready label Feb 1, 2018

@nathandunn nathandunn self-assigned this Feb 1, 2018

@nathandunn nathandunn changed the title from Potential GFF3Tabix issues to GFF3Tabix issues Feb 1, 2018

@nathandunn nathandunn added this to the 1.12.4 milestone Feb 1, 2018

@nathandunn

This comment has been minimized.

Contributor

nathandunn commented Feb 1, 2018

I have a fix for most of these issues that I can integrate into a PR done on an Apollo projection branch. Fixes missing exons (precomputes appropriate block ranges), and seemingly renders the subfeatures. I haven't tested this for NeatFeatures, but it does fix for HTMLFeautres.

https://github.com/nathandunn/Apollo/blob/project-gff3/client/apollo/js/View/Track/DraggableProjectedHTMLFeatures.js#L1031

The other "workaround" is to remove the gene entry file for a GFF3, but I like seeing the gene in the details section and I think this will be a tractable solution.

@nathandunn

This comment has been minimized.

Contributor

nathandunn commented Feb 1, 2018

If anyone else wants to take a shot, also fine. I just have some ideas of what the fixes are

@rbuels

This comment has been minimized.

Collaborator

rbuels commented Apr 7, 2018

I think all of this is taken care of in the dev branch now, with the GFF3tabix overhaul I just did, plus the new topLevelFeatures config. could you guys have a look to confirm and reopen this if there are still problems?

@rbuels rbuels closed this Apr 7, 2018

@cmdcolin

This comment has been minimized.

Contributor

cmdcolin commented Apr 8, 2018

These all look great! I am experiencing a little bit of slowness with scrolling around but I think that overall the correctness of the data and bugfixes are working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment