-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support bigRmsk #1236
Comments
bigRmsk is a perfectly valid BED9. Does IGV not read the BED Type from the bigBed file? The bigBed header stores the number of defined fields (9 here) and so IGV should not even try to parse fields 10-12… |
@maximilianh Perhaps I didn't phrase this correctly, will correct it, yes its perfectly valid but not really useful for display without properly treating the extra fields. In general we read the number of defined fields and don't try to parse extra fields, but for certain special fields we do. This came up previously, and you had indicated (I think) that the defined names (e.g. blockstarts in this case) are never re-defined and if found in an extra field should be safe to use. So when we see "blockstarts" and "blocksizes" in extra fields we use them. This came up, I think, in "bigCat" format but perhaps for different fields. In the bigRmsk case fields 10-12 are neccessary to do a reasonable display. Now that I know that blockstarts can be negative to table name bigRmsk I will just make a special case. So yes I could strictly just look at defined fields but for bigGenePred, bigCat, and bigRmsk this would yield suboptimal displays, e.g. without the blocks. The extra fields are not just comments. |
Another one we look for is
If we see these attribute names we assume they are as defined for bed format, or in the case of exonFrames for genePredExt format. This has worked so far except for this special case, I need to dig more deeply into this to determine what IGV will do with the negative blockStarts, thus I opened this ticket for myself, there's no implication that anything is incorrect in the files. Just unexpected. |
Maybe I didn't explain this well: the header of all bigBed files contains
the number of fields that are "parsable".
It's here in the C code:
https://github.com/ucscGenomeBrowser/kent/blob/34d24ed91f2d7a04467840c6763532345e4c0529/src/lib/bbiRead.c#L116
/* Read rest of defined bits of header, byte swapping as needed. */
bbi->version = udcReadBits16(udc, isSwapped);
bbi->zoomLevels = udcReadBits16(udc, isSwapped);
bbi->chromTreeOffset = udcReadBits64(udc, isSwapped);
bbi->unzoomedDataOffset = udcReadBits64(udc, isSwapped);
bbi->unzoomedIndexOffset = udcReadBits64(udc, isSwapped);
bbi->fieldCount = udcReadBits16(udc, isSwapped);
bbi->fieldCode should be "9" here. And IGV should not parse more than 9
fields as a consequence. This is independent of any field names in the
autoSql definition.
I imagine the javascript parser also reads this field?
…On Wed, Oct 12, 2022 at 6:38 AM Jim Robinson ***@***.***> wrote:
Another one we look for is exonFrames. So the complete set of "extra"
fields current parsed are as follows
blockCount
blockSizes
blockStarts
exonFrames
If we see these attribute names we assume they are as defined for bed
format, or in the case of exonFrames for genePredExt format. This has
worked so far except for this special case, I need to dig more deeply into
this to determine what IGV will do with the negative blockStarts, thus I
opened this ticket for myself, there's no implication that anything is
incorrect in the files. Just unexpected.
—
Reply to this email directly, view it on GitHub
<#1236 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACL4TNZZMIKIYZYZ4QNINTWCY6DXANCNFSM6AAAAAARCSW6T4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@maximilianh Yes I understand, and fieldCount is used. However its necessary to parse some of the extra fields to support IGV tracks, otherwise the "big" versions of formats such as genePredExt/bigGenePred and rmsk/bigRmsk will be a regression from the point of view of the user, for example for rmsk we will not be able to draw the blocks, for bigGenePred we will not be able to do protein translation without "exonFrames" . This is not a problem in general, this special case of unexpected negative blockStarts just needs to be handled. This is not a major problem. |
"bigRmsk" files will currently not work in IGV due to negative block starts. See issue #1130. The format is valid BED9, but we need the extra fields. Fix is to properly interpret the negative block starts.
Autosql: http://genome.ucsc.edu/goldenPath/help/examples/bigRmskBed.as
Also see: http://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlignBed.as
The text was updated successfully, but these errors were encountered: