Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spec metadata #140

Closed
wants to merge 4 commits into from
Closed

Add spec metadata #140

wants to merge 4 commits into from

Conversation

sbesson
Copy link
Member

@sbesson sbesson commented Nov 16, 2021

While trying to audit the existing NGFF samples, I tried to extend the ome_zarr info command to start displaying additional information such as version or axes.

As it stands, the ome_zarr.reader module iterates through nodes, discovers specifications and register part of the metadata under the Node.metadata dictionary. My understanding is that the specification of this metadata dictionary has been primarily driven by the napari use case (as the napari plugin used to be part of this repository). The metadata is appending , sometimes as key/value pairs(

node.metadata["axes"] = axes
), sometimes as a nested dictionary (
node.metadata.update({"metadata": {"plate": self.plate_data}})
), sometimes as a combination of both (
node.metadata.update(
{
"visible": node.visible,
"name": name,
"color": colors,
"metadata": {"image": self.lookup("image", {}), "path": name},
)

This dictionary cannot be used in its current form for a generic introspection of OME-Zarr dataset e.g. by ome_zarr info as the keys are not defined and the relationship to the specification is lost.

857f4fb proposes one approach to solve this problem i.e. defining metadata at the Spec level rather than at the Node level. Together with 857f4fb, a proof of concept can be run against the existing samples:

(base) sbesson@ls30630:ome-zarr-py (spec_metadata) $ venv/bin/ome_zarr info https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/9836842.zarr
WARNING:ome_zarr.io:version mismatch: detected:FormatV01, requested:FormatV03
WARNING:ome_zarr.io:version mismatch: detected:FormatV03, requested:FormatV01
https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/9836842.zarr/ [zgroup]
 - metadata
   - Multiscales
     - version: 0.1
     - axes: ('t', 'c', 'z', 'y', 'x')
   - OMERO
     - version: 0.1
 - data
   - (1, 4, 1, 1920, 1920)
   - (1, 4, 1, 960, 960)
   - (1, 4, 1, 480, 480)
   - (1, 4, 1, 240, 240)

An alternative approach would be to keep using the Node.metadata dictionary with a self-describing structure e.g.

>>> metadata
{'multiscales': {'version': '0.1', 'axes': "('t', 'c', 'z', 'y', 'x')"}, 'omero': {'version': '0.1'}}

Independently of the chosen solution, I think we want to ensure some consistent metadata storage and migrate the logic of translating this metadata into the structure expected by napari into the napari-ome-zarr plugin.

@codecov
Copy link

codecov bot commented Nov 16, 2021

Codecov Report

Merging #140 (42fb4a7) into master (9a77877) will increase coverage by 0.02%.
The diff coverage is 80.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #140      +/-   ##
==========================================
+ Coverage   71.04%   71.07%   +0.02%     
==========================================
  Files          11       11              
  Lines        1126     1134       +8     
==========================================
+ Hits          800      806       +6     
- Misses        326      328       +2     
Impacted Files Coverage Δ
ome_zarr/reader.py 59.02% <75.00%> (+0.11%) ⬆️
ome_zarr/utils.py 91.01% <100.00%> (+0.20%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9a77877...42fb4a7. Read the comment docs.

"version", "0.1"
) # should this be matched with Format.version?
datasets = multiscales[0]["datasets"]
# axes field was introduced in 0.3, before all data was 5d
axes = tuple(multiscales[0].get("axes", ["t", "c", "z", "y", "x"]))
if len(set(axes) - axes_values) > 0:
raise RuntimeError(f"Invalid axes names: {set(axes) - axes_values}")
self.metadata["axes"] = axes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the same metadata to self and node feels a bit odd, like it may make other uses downstream less clear. It might point to the internals needing to be simplified/refactored.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're correct. Re-reading this I am unsure of what is supposed to be written there. I'll close and we can get back to this when we need it.

@sbesson sbesson closed this Jan 10, 2022
@sbesson sbesson deleted the spec_metadata branch January 12, 2022 10:03
@sbesson sbesson mentioned this pull request Sep 16, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants