Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store global attributes as HDF5 Datasets #51

Closed
slinnarsson opened this issue May 1, 2018 · 5 comments
Closed

Store global attributes as HDF5 Datasets #51

slinnarsson opened this issue May 1, 2018 · 5 comments
Labels

Comments

@slinnarsson
Copy link
Contributor

Global (file-level) attributes are currently stored as HDF5 Attributes. However, such attributes are limited to be small (no hard limit but the spec says 16 kB) and cannot be sliced.

However, it would be useful to be able to store arbitrarily large amounts of data on the global level, such as pickled objects, images, or other supporting data.

Two options

  1. Add a new API for large global objects (say, LoomConnection.blobs) and store them as Datasets (e.g. under /global) in the file. This would retain backwards compatibility but would require maintaining two different APIs that do almost the same thing. New files will use a mixture of old-style and new-style attributes indefinitely. Only new-style global attributes in new files would be invisible when opened using an older library implementation.

  2. Keep the current API but change the Loom file format spec to store global attributes as Datasets (e.g. under /global). Implementors would still need to look for attributes both as HDF5 attributes and as Datasets, to ensure old files would still be readable. New files will use a consistent API and consistent file format. For backwards-compatibility, implementors should write global attributes as HDF5 Attributes (in addition to writing them as Datasets) if they are smaller than 16 kB. Larger global attributes in new files would be invisible when opened using an older library implementation.

I think option 2 is nicer and should be compatible enough.

@mschilli87
Copy link

@slinnarsson

I think option 2 is nicer and should be compatible enough.

👍

@slinnarsson
Copy link
Contributor Author

Also, straighten out the spec for datatypes of global attributes (especially strings and arrays of strings).

@nh3
Copy link

nh3 commented Apr 12, 2019

Hi @slinnarsson, with respective to scverse/anndata#116, is the feature mentioned in this issue available as some API? Even in some un-released form it would be very useful, as I can start porting my conversion layer to using it. Many thanks!

@slinnarsson
Copy link
Contributor Author

Not yet, but I'll work on it. I think it's soon time for a loompy 3 release, which will make it possible to make changes to the file spec.

@slinnarsson
Copy link
Contributor Author

Fixed in loompy3.0 branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants