bioformats2raw metadata support #174

joshmoore · 2022-03-02T19:25:33Z

Add Implicit spec to loop over metadata-less "collections"
Add Leaf & Root specs
Support entrypoint-based specs ("ome_zarr.spec")
Use entrypoint to adder suport for bioformats2raw.layout ngff#112
add tests for SHOULD/MAY portions of the spec

Part of the investigation of metadata in ome/ngff#104. This "implicit" group is the cheapest form of collection imaginable.

Currently, only groups within the given group (and not arrays or explicit files) will be further parsed.

codecov · 2022-03-02T19:31:55Z

Codecov Report

Patch coverage: 77.41% and project coverage change: -0.89 ⚠️

Comparison is base (8964374) 84.79% compared to head (28155d5) 83.90%.

❗ Current head 28155d5 differs from pull request most recent head 836dfd2. Consider uploading reports for the commit 836dfd2 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #174      +/-   ##
==========================================
- Coverage   84.79%   83.90%   -0.89%     
==========================================
  Files          13       14       +1     
  Lines        1473     1591     +118     
==========================================
+ Hits         1249     1335      +86     
- Misses        224      256      +32

Impacted Files	Coverage Δ
ome_zarr/reader.py	`83.52% <65.38%> (-3.18%)`	⬇️
ome_zarr/bioformats2raw.py	`86.11% <86.11%> (ø)`

... and 9 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Pointing ome_zarr at a non-root node will now walk-up the hierarchy until the root node is discovered.

This allows the definition of node specs outside of this repository. Primary driver is for experimenting with specs which depend on ome_types, etc. These may eventually be pulled into the mainline.

joshmoore · 2022-03-03T21:04:31Z

See https://github.com/ome/ome-zarr-metadata/releases/tag/0.1.0 for an example of an entrypoint. After creating a fake .zgroup under the output of bioformats2raw a.fake /tmp/a.ome.zarr

$ ome_zarr info /tmp/a.ome.zarr/0/test/
/private/tmp/a.ome.zarr/0/test [zgroup]
 - metadata
   - Implicit (1)
   - Leaf (2)
 - data
/private/tmp/a.ome.zarr/0 [zgroup]
 - metadata
   - Multiscales
   - Leaf (2)
 - data
   - (1, 1, 1, 512, 512)
   - (1, 1, 1, 256, 256)
/private/tmp/a.ome.zarr [zgroup]
 - metadata
   - bioformats2raw (3)
   - Root (2)
 - data

Notice:

the Implicit spec scans groups that have no other metadata
Leaf/Root work their way up and back down a hierarchy
bioformats2raw reads OME/METADATA.ome.xml

will-moore · 2022-03-23T10:23:54Z

ome_zarr/reader.py

+            found.append(Well(self))
+            self.specs.append(found[-1])
+
+        for key, value in entrypoints.get_group_named("ome_zarr.spec").items():


What does this do? I see no file named "ome_zarr.spec" and I've not used entrypoints before. I don't get much idea from https://github.com/takluyver/entrypoints as to what it does, only that "This package is in maintenance-only mode. New code should use the importlib.metadata module".

What does this do?

It's essentially a namespace for the particular entrypoint. It's used at
https://github.com/ome/ome-zarr-metadata/blob/main/setup.cfg#L81

will-moore · 2022-03-23T10:53:55Z

For an Image in a Plate (12 Wells A-C, 1-4, all wells with labels), without this PR I get:

$ ome_zarr info 251.zarr/A/1/0/
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0 [zgroup]
 - metadata
   - Multiscales
   - OMERO
 - data
   - (3, 1024, 1344)
   - (3, 512, 672)
   - (3, 256, 336)
   - (3, 128, 168)
   - (3, 64, 84)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0/labels [zgroup] (hidden)
 - metadata
   - Labels
 - data
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0/labels/0 [zgroup] (hidden)
 - metadata
   - Label
   - Multiscales
 - data
   - (1, 1024, 1344)
   - (1, 512, 672)
   - (1, 256, 336)
   - (1, 128, 168)
   - (1, 64, 84)
   - (1, 32, 42)

and with this PR I get all the sibling A Wells A2, A3, A4, but not B1-B4 or C1-C4. And I don't get labels for those Wells.

$ ome_zarr info 251.zarr/A/1/0/
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0 [zgroup]
 - metadata
   - Multiscales
   - OMERO
   - Leaf
 - data
   - (3, 1024, 1344)
   - (3, 512, 672)
   - (3, 256, 336)
   - (3, 128, 168)
   - (3, 64, 84)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0/labels [zgroup] (hidden)
 - metadata
   - Labels
   - Leaf
 - data
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0/labels/0 [zgroup] (hidden)
 - metadata
   - Label
   - Multiscales
   - Leaf
 - data
   - (1, 1024, 1344)
   - (1, 512, 672)
   - (1, 256, 336)
   - (1, 128, 168)
   - (1, 64, 84)
   - (1, 32, 42)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1 [zgroup]
 - metadata
   - Well
   - Leaf
 - data
   - (3, 1024, 1344)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A [zgroup]
 - metadata
   - Implicit
   - Leaf
 - data
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/2 [zgroup]
 - metadata
   - Well
   - Leaf
 - data
   - (3, 1024, 1344)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/3 [zgroup]
 - metadata
   - Well
   - Leaf
 - data
   - (3, 1024, 1344)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/4 [zgroup]
 - metadata
   - Well
   - Leaf
 - data
   - (3, 1024, 1344)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr [zgroup]
 - metadata
   - Plate
   - Root
 - data
   - (3, 768, 1344)

Without this PR, napari 251.zarr/A/1/0/ gives me just the 1 image + labels:

With this PR, I get everything as for 'info' above: All the A wells, but only labels for A1:

joshmoore · 2022-04-13T10:56:16Z

and with this PR I get all the sibling A Wells A2, A3, A4, but not B1-B4 or C1-C4. And I don't get labels for those Wells.

@will-moore, you approve of getting the siblings (assuming napari can be fixed)? If so, I'll look why it's only for the one row.

Also, where do you get the labels for this plate? Is this the one you had a script for?

joshmoore · 2022-04-13T13:00:05Z

@will-moore, I've reverted the upwards parsing. It seemed like a good strategy but there are currently too many edge cases. I don't have labels on plates for testing at the moment, but I think with the current state along with ome/ome-zarr-metadata@08e12f7#diff-0bb17e0ecb4ac83835ee3800a1af71a12f644b0ce782c623ba97f8917916250eR54 all the following should be true:

	non-bf2raw	bf2raw
HCS	unchanged	unchanged
non-HCS	unchanged	now loads all images

The only other change I can think of is if you pass a group that previously did nothing, it will likely try to load the contents.

joshmoore · 2022-04-18T13:35:54Z

In discussing today with @dgault, @sbesson, @jburel and @melissalinkert, there was a case made for at least adding the flag (Leaf) to make it possible for clients to detect that there is more information that needs loading. Additional methods or parameters should then allow that loading.

joshmoore · 2022-04-18T15:14:43Z

To improve the codecov results, see https://github.com/zarr-developers/numcodecs/pull/300/files#diff-bc37cd9860eec1facdc18a47798e8a1a2c0ef5dabd999deee049de4a48a5d35fR1 for an option of in-repo testing of entrypoints.

will-moore · 2022-04-27T15:23:08Z

@joshmoore To help address the "don't have labels on plates for testing", I created https://gist.github.com/will-moore/0f4cb6b1fdd60a255fcbb956a54a645e which adds labels to a plate (currently assumes images axes are cyx) by segmenting one of the channels.

I don't know if I'm missing something, maybe not using ome_zarr properly, but it feels quite manual to e.g. iterate through Wells on a Plate - manually parsing JSON, joining paths etc and parse_url() for every Well and every Image.

joshmoore · 2022-05-04T09:22:35Z

see a quick use of this functionality:

joshmoore · 2022-09-15T10:03:01Z

Migrated the bf2raw implementation from https://github.com/ome/ome-zarr-metadata :

$ bioformats2raw-0.5.0-SNAPSHOT/bin/bioformats2raw 'my&series=2.fake' test_output
$ ome_zarr info test_output/
/opt/ome-zarr-py/test_output [zgroup]
 - metadata
   - bioformats2raw
 - data
/opt/ome-zarr-py/test_output/0 [zgroup]
 - metadata
   - Multiscales
 - data
   - (1, 1, 1, 512, 512)
   - (1, 1, 1, 256, 256)
/opt/ome-zarr-py/test_output/1 [zgroup]
 - metadata
   - Multiscales
 - data
   - (1, 1, 1, 512, 512)
   - (1, 1, 1, 256, 256)

Initially prepared as a plugin in https://github.com/ome/ome-zarr-metadata this is being moved to the main repository since it is a part of the main spec.

joshmoore · 2022-09-15T12:41:57Z

ome_zarr/bioformats2raw.py

+                    _logger.info("found %s", series)
+                    subnode = node.add(series)
+                    if subnode:
+                        subnode.metadata["ome-xml:index"] = idx


This is probably the biggest lingering question I have since this basically becomes public API e.g. used in napari-ome-zarr. Thoughts welcome. cc: @will-moore @sbesson et al.

I wonder where this public API is documented?
This new module should be added to https://github.com/ome/ome-zarr-py/blob/master/docs/source/api.rst but I think these changes in parsing also need to be described elsewhere, maybe at https://ome-zarr.readthedocs.io/en/stable/python.html#reading-ome-ngff-images

Big 👍 but first do you think what's there is usable, understandable, extensible, etc.?

will-moore · 2022-09-15T14:29:55Z

ome_zarr/bioformats2raw.py

+                    subnode = node.add(series)
+                    if subnode:
+                        subnode.metadata["ome-xml:index"] = idx
+                        subnode.metadata["ome-xml:image"] = image


If you add the following here, then you don't need ome/napari-ome-zarr#47

subnode.metadata["name"] = image.name

But then plugins would/could overwrite the main metadata, right? Which means it becomes a question of the order that the plugins run in. Would:

subnode.plugins["bioformats2raw"].metadata["name"]

be better?

As probably transpired from #140, I am also unclear on the contract and expectation for the node.metadata field. My impression was that this API has been largely driven by the napari use-case.Reading this, it looks like another goal is to introduce some form of metadata consolidation so that a consumer does not have to go up and down the hierarchy to assemble pieces of metadata, is that correct?

If so, I agree what is missing is some form of namespace to differentiate the metadata injected at different levels in the hierarchy. Taking advantage of the fact this is a Python dictonary, another proposal would be to group all the metadata under a top-level key e.g. subnode.metadata["bioformats2raw"] = {"index": idx, "image": image", "name", image.name}

.Reading this, it looks like another goal is to introduce some form of metadata consolidation so that a consumer does not have to go up and down the hierarchy to assemble pieces of metadata, is that correct?

Maybe. But now that we have the json-schema, I could also see just attaching objects at the right node level to prevent re-reading the file. That's essentially what would happen if the ome-types OME object becomes part of the bioformats2raw plugin's public API.

In my mind I might expect something like Josh's original idea or subnode.metadata["ome.xml"] = {"index": idx, "image": image", "name", image.name} since this is metadata coming from the ome.xml?

The reason I like subnode.metadata["name"] = image.name is that any consumer doesn't have to know about bioformats2raw or xml etc to have the correct image name.
But if it's just napari-ome-zarr and we already have ome/napari-ome-zarr#47 then 👍

is that any consumer doesn't have to know about bioformats2raw or xml etc to have the correct image name.

This possibly gets back to the question of whether or not this bf2raw parsing is core or not. If it is, then you're probably right that we should just encode the rules for what "wins" directly. If this is a plugin, though, then how would we handle misbehavior in the plugins?

What is a plugin here? If it's a plugin, then it's not installed by default and you get to choose if you want to add it?

This possibly gets back to the question of whether or not this bf2raw parsing is core or not

Trying to answer this, this is where I would draw the line:

if a specification is published in https://ngff.openmicroscopy.org/, I consider it as core (even if transitional) and my expectation is that this library should support it

a contrario, any specification not defined in the upstream specification should be seen as an extension and should be handled by some plugin/third-party mechanism. This might drive the discussions on the extensibility of this library which is a good thing.

joshmoore · 2022-09-22T06:44:03Z

Should we also discuss the name of the module itself?

for more information, see https://pre-commit.ci

will-moore · 2022-09-22T09:29:05Z

We added an omero block of channel & rendering metadata to the multiscale .zattrs (because it came from omero) but we actually want other tools to read and write this metadata, which may be discouraged by the naming.
In the same way, bioformats2raw.layout is a spec that just happens to be produced originally by bioformats2raw, but it's really a spec that ALL tools should read/write.
I don't know if it's too late to think about a different name there, or if the name has already stuck?

joshmoore · 2022-09-22T10:20:59Z

Other than the string bioformats2raw.layout we're pretty free to change things here. (I'd say we definitely don't want to reproduce what we did with omero and we actually need to think about how to make that "transitional" as well)

will-moore · 2022-09-22T10:46:01Z

Ah - yes, too late to change the "bioformats2raw.layout" key because data generated with this already exists.

imagesc-bot · 2022-09-28T18:25:03Z

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/intermission-ome-ngff-0-4-1-bioformats2raw-0-5-0-et-al/72214/1

for more information, see https://pre-commit.ci

imagesc-bot · 2023-08-30T10:44:10Z

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/saving-volumetric-data-with-voxel-size-colormap-annotations/85537/24

Add Implicit spec to loop over metadata-less "collections"

d132636

joshmoore mentioned this pull request Mar 2, 2022

OME Metadata Support ome/ngff#104

Open

2 tasks

joshmoore added 2 commits March 3, 2022 14:40

Add Leaf & Root specs

7470587

Pointing ome_zarr at a non-root node will now walk-up the hierarchy until the root node is discovered.

Add support for entrypoint ome_zarr.spec

8a4603e

This allows the definition of node specs outside of this repository. Primary driver is for experimenting with specs which depend on ome_types, etc. These may eventually be pulled into the mainline.

joshmoore changed the title ~~Add Implicit spec to loop over metadata-less "collections"~~ New spec parsers for metadata support Mar 3, 2022

Also add entrypoints to requirements-dev.txt

02f88c9

will-moore reviewed Mar 23, 2022

View reviewed changes

joshmoore added 2 commits April 13, 2022 13:36

Minor doc improvements

9824efe

Disable Root/Leaf specifiations

6ae6355

Load all requirements for the zarr-dev build

9ad2e5a

joshmoore force-pushed the implicit branch 2 times, most recently from fd64b1d to 9ad2e5a Compare May 3, 2022 11:01

joshmoore mentioned this pull request Sep 15, 2022

bioformats2raw.layout ome/ngff#112

Merged

joshmoore added 2 commits September 15, 2022 12:03

Migrate bioformats2raw spec

42065e5

Initially prepared as a plugin in https://github.com/ome/ome-zarr-metadata this is being moved to the main repository since it is a part of the main spec.

Add ome-types dependency

ed464ac

joshmoore commented Sep 15, 2022

View reviewed changes

joshmoore mentioned this pull request Sep 15, 2022

Store metadata for use in napari ome/ome-zarr-metadata#1

Closed

joshmoore added 2 commits September 15, 2022 14:52

Remove traces of ome_zarr_metadata

2344751

Add ome-types for pre.yml

8735307

will-moore reviewed Sep 15, 2022

View reviewed changes

joshmoore added 3 commits September 22, 2022 08:42

Store series information

d42b530

Add tests

4a98669

Merge 'origin/master' into implicit

e3d7364

joshmoore and others added 5 commits September 22, 2022 08:48

Add placeholder for bf2raw reading in docs

5c86762

[pre-commit.ci] auto fixes from pre-commit.com hooks

d954ba2

for more information, see https://pre-commit.ci

Fix build issues

5105916

Update documentation

28155d5

Attempt to fix pre.yml

eb12b2b

joshmoore changed the title ~~New spec parsers for metadata support~~ bioformats2raw metadata support Oct 7, 2022

will-moore mentioned this pull request Nov 10, 2022

Support bioformats2raw.layout ome/napari-ome-zarr#71

Open

joshmoore and others added 3 commits December 21, 2022 09:01

TMP implicit

6962d49

Merge remote-tracking branch 'origin/master' into implicit

08b14b8

[pre-commit.ci] auto fixes from pre-commit.com hooks

836dfd2

for more information, see https://pre-commit.ci

joshmoore added a commit to joshmoore/ome-zarr-py that referenced this pull request May 3, 2023

Update zarr-dev.yml from ome#174

fa5021c

joshmoore mentioned this pull request May 3, 2023

Zarr dev build #275

Open

joshmoore added this to the 0.8.0 milestone May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bioformats2raw metadata support #174

bioformats2raw metadata support #174

joshmoore commented Mar 2, 2022 •

edited

Loading

codecov bot commented Mar 2, 2022 •

edited

Loading

joshmoore commented Mar 3, 2022

will-moore Mar 23, 2022

joshmoore Mar 23, 2022

will-moore commented Mar 23, 2022

joshmoore commented Apr 13, 2022 •

edited

Loading

joshmoore commented Apr 13, 2022

joshmoore commented Apr 18, 2022

joshmoore commented Apr 18, 2022

will-moore commented Apr 27, 2022

joshmoore commented May 4, 2022

joshmoore commented Sep 15, 2022

joshmoore Sep 15, 2022 •

edited

Loading

will-moore Sep 15, 2022

joshmoore Sep 15, 2022

will-moore Sep 15, 2022

joshmoore Sep 15, 2022

sbesson Sep 16, 2022

joshmoore Sep 16, 2022

will-moore Sep 16, 2022

joshmoore Sep 22, 2022

will-moore Sep 22, 2022

sbesson Sep 22, 2022

joshmoore commented Sep 22, 2022

will-moore commented Sep 22, 2022

joshmoore commented Sep 22, 2022

will-moore commented Sep 22, 2022

imagesc-bot commented Sep 28, 2022

imagesc-bot commented Aug 30, 2023

bioformats2raw metadata support #174

Are you sure you want to change the base?

bioformats2raw metadata support #174

Conversation

joshmoore commented Mar 2, 2022 • edited Loading

codecov bot commented Mar 2, 2022 • edited Loading

Codecov Report

joshmoore commented Mar 3, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

will-moore commented Mar 23, 2022

joshmoore commented Apr 13, 2022 • edited Loading

joshmoore commented Apr 13, 2022

joshmoore commented Apr 18, 2022

joshmoore commented Apr 18, 2022

will-moore commented Apr 27, 2022

joshmoore commented May 4, 2022

joshmoore commented Sep 15, 2022

joshmoore Sep 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshmoore commented Sep 22, 2022

will-moore commented Sep 22, 2022

joshmoore commented Sep 22, 2022

will-moore commented Sep 22, 2022

imagesc-bot commented Sep 28, 2022

imagesc-bot commented Aug 30, 2023

joshmoore commented Mar 2, 2022 •

edited

Loading

codecov bot commented Mar 2, 2022 •

edited

Loading

joshmoore commented Apr 13, 2022 •

edited

Loading

joshmoore Sep 15, 2022 •

edited

Loading