Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bigRmsk #1236

Open
jrobinso opened this issue Oct 11, 2022 · 5 comments
Open

Support bigRmsk #1236

jrobinso opened this issue Oct 11, 2022 · 5 comments
Assignees

Comments

@jrobinso
Copy link
Contributor

jrobinso commented Oct 11, 2022

"bigRmsk" files will currently not work in IGV due to negative block starts. See issue #1130. The format is valid BED9, but we need the extra fields. Fix is to properly interpret the negative block starts.

Autosql: http://genome.ucsc.edu/goldenPath/help/examples/bigRmskBed.as
Also see: http://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlignBed.as

@maximilianh
Copy link

maximilianh commented Oct 11, 2022

bigRmsk is a perfectly valid BED9. Does IGV not read the BED Type from the bigBed file? The bigBed header stores the number of defined fields (9 here) and so IGV should not even try to parse fields 10-12…

@jrobinso
Copy link
Contributor Author

jrobinso commented Oct 12, 2022

@maximilianh Perhaps I didn't phrase this correctly, will correct it, yes its perfectly valid but not really useful for display without properly treating the extra fields.

In general we read the number of defined fields and don't try to parse extra fields, but for certain special fields we do. This came up previously, and you had indicated (I think) that the defined names (e.g. blockstarts in this case) are never re-defined and if found in an extra field should be safe to use. So when we see "blockstarts" and "blocksizes" in extra fields we use them. This came up, I think, in "bigCat" format but perhaps for different fields.

In the bigRmsk case fields 10-12 are neccessary to do a reasonable display. Now that I know that blockstarts can be negative to table name bigRmsk I will just make a special case.

So yes I could strictly just look at defined fields but for bigGenePred, bigCat, and bigRmsk this would yield suboptimal displays, e.g. without the blocks. The extra fields are not just comments.

@jrobinso
Copy link
Contributor Author

jrobinso commented Oct 12, 2022

Another one we look for is exonFrames. So the complete set of "extra" fields currently parsed are as follows

blockCount 
blockSizes
blockStarts
exonFrames

If we see these attribute names we assume they are as defined for bed format, or in the case of exonFrames for genePredExt format. This has worked so far except for this special case, I need to dig more deeply into this to determine what IGV will do with the negative blockStarts, thus I opened this ticket for myself, there's no implication that anything is incorrect in the files. Just unexpected.

@maximilianh
Copy link

maximilianh commented Oct 12, 2022 via email

@jrobinso
Copy link
Contributor Author

jrobinso commented Oct 12, 2022

@maximilianh Yes I understand, and fieldCount is used. However its necessary to parse some of the extra fields to support IGV tracks, otherwise the "big" versions of formats such as genePredExt/bigGenePred and rmsk/bigRmsk will be a regression from the point of view of the user, for example for rmsk we will not be able to draw the blocks, for bigGenePred we will not be able to do protein translation without "exonFrames" . This is not a problem in general, this special case of unexpected negative blockStarts just needs to be handled. This is not a major problem.

@jrobinso jrobinso self-assigned this Oct 5, 2023
@jrobinso jrobinso added this to the 2.17.0 milestone Oct 5, 2023
@jrobinso jrobinso removed this from the 3.0.0 milestone Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants