-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store global attributes as HDF5 Datasets #51
Comments
👍 |
Also, straighten out the spec for datatypes of global attributes (especially strings and arrays of strings). |
Hi @slinnarsson, with respective to scverse/anndata#116, is the feature mentioned in this issue available as some API? Even in some un-released form it would be very useful, as I can start porting my conversion layer to using it. Many thanks! |
Not yet, but I'll work on it. I think it's soon time for a loompy 3 release, which will make it possible to make changes to the file spec. |
Fixed in loompy3.0 branch |
Global (file-level) attributes are currently stored as HDF5 Attributes. However, such attributes are limited to be small (no hard limit but the spec says 16 kB) and cannot be sliced.
However, it would be useful to be able to store arbitrarily large amounts of data on the global level, such as pickled objects, images, or other supporting data.
Two options
Add a new API for large global objects (say,
LoomConnection.blobs
) and store them as Datasets (e.g. under/global
) in the file. This would retain backwards compatibility but would require maintaining two different APIs that do almost the same thing. New files will use a mixture of old-style and new-style attributes indefinitely. Only new-style global attributes in new files would be invisible when opened using an older library implementation.Keep the current API but change the Loom file format spec to store global attributes as Datasets (e.g. under
/global
). Implementors would still need to look for attributes both as HDF5 attributes and as Datasets, to ensure old files would still be readable. New files will use a consistent API and consistent file format. For backwards-compatibility, implementors should write global attributes as HDF5 Attributes (in addition to writing them as Datasets) if they are smaller than 16 kB. Larger global attributes in new files would be invisible when opened using an older library implementation.I think option 2 is nicer and should be compatible enough.
The text was updated successfully, but these errors were encountered: