Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr Metadata for Data Reading / Writing #5

Closed
camFoltz opened this issue Feb 1, 2021 · 0 comments
Closed

Zarr Metadata for Data Reading / Writing #5

camFoltz opened this issue Feb 1, 2021 · 0 comments

Comments

@camFoltz
Copy link
Contributor

camFoltz commented Feb 1, 2021

As we discussed in the deployment meeting, we are moving towards data storage in zarr or ome-zarr format. As we start to think about optimizing the storage, we should also begin on how to structure subgroups of zarr datastores so that a data reader/writer module will be able to correctly load data into the correct place.

For example, Ivan has different stokes data stored as subgroups in a larger datastore that corresponds to all of the data for a single FOV. These subgroups can have names, such as "Stokes_1_FOV1", and when you need to load this array, you will have to specify the name of this subgroup. For a data reader, we would need a way to standardize the naming format of these subgroups, or assign subgroup attributes, so that we can intuitively scan through a datastore to find the correct subgroups to load.

I suggest that we come up with a standardized way of assigning attributes to these subgroups, so that we do not necessarily need to adhere to strict naming conventions. For example, if we are saving computed stokes channels as different subgroups in a datastore, where the structure is as such: store/Stokes/Stokes0_zstack where the array is Stokes0_zstack. We can assign attributes in the following way:

store_path = '/home/camfoltz2/Stokes_FOV1.zarr'
store = zarr.open(store_path)

store['Stokes']['Stokes0_zstack'].attrs['Type'] = 'S0'

by calling .attrs['Type'] = 'S0' we can create an attribute of the array called 'Type' and then define the 'Type' as 'S0'. You can then search through this dictionary of attributes when loading data. These attributes do not have to be strings, they can also be lists and numbers.

I think a good idea here would be to standardize the attributes of all of our data, and have the data io module assign these standardized attributes to arrays upon saving. @bryantChhun and others what do you think here? I think there will need to be some type of data labeling as we move to the zarr storage format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant