Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use declarative representation of the spec #258

Open
d-v-b opened this issue Mar 7, 2023 · 14 comments
Open

use declarative representation of the spec #258

d-v-b opened this issue Mar 7, 2023 · 14 comments

Comments

@d-v-b
Copy link

d-v-b commented Mar 7, 2023

I ran into issues viewing my ome-ngff-formatted zarr data using via the napari plugin. Napari was exiting with an error emitted from this line of this package, which is a try...except block that takes ANY exception, and doesn't handle the content of the exception at all.

Unfortunately, swallowing exceptions without a useful error message made it very hard to debug any problems with my data. I suspect this style of coding is a side-effect of taking a procedural approach to the OME-NGFF metadata (i.e., "valid metadata" means "a bunch of code ran without errors")
I think a much better approach would be to define a python class that represents the structure of the metadata and handles parsing / validation of the input. Such a declarative approach (i.e., "valid metadata" means "the program created an object structurally equivalent to the metadata") can produce useful error messages when a single field fails to validate (in my case, the problem was with the version string, but I had to figure that out manually)

For examples of declarative implementations of the OME-NGFF spec, see iohub and pydantic-ome-ngff. As the author of the second project, I would love to see this functionality under the umbrella of the official OME organization.

@joshmoore
Copy link
Member

joshmoore commented Mar 9, 2023

Hi @d-v-b. Thanks for opening this! I'll reiterate the 👍 from my gitter conversation with you as well as the 👍❤️ from the Zarr community meeting last night, and additionally a 💯 to everyone in the community that we find ways to work together on fewer libraries rather than everyone needing to roll their own. (Maybe we need a community meeting ... per language? ... just on that topic.)

I haven't gone through the code in detail but I know we need it. The Java crew that's met at a several hackathons similarly started pushing to https://github.com/ome/ome-ngff-java, so without putting too much thought into the completeness of it, here's a list of TODOs that I'd suggest:

@imagesc-bot
Copy link

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/napari-napari-ome-zarr-plugin-is-not-registered/78482/3

@ziw-liu
Copy link

ziw-liu commented Mar 20, 2023

Late to the discussion, but happy to move this code upstream for v0.4 too!

@will-moore
Copy link
Member

Just starting to look at this again, thinking how ome-zarr-py might use pydantic-ome-ngff...

Not being familiar with pydantic yet... it looks like the easiest way to validate when reading an NGFF image is to parse the json?

from ome_zarr.io import parse_url
from ome_zarr.reader import Reader

from pydantic_ome_ngff.v04.multiscales import Multiscale
from pydantic import ValidationError

reader = Reader(parse_url("6001240.zarr"))
nodes = list(reader())
# first node will be the image pixel data
image_node = nodes[0]
ngff_json = image_node.zarr.root_attrs
try:
    Multiscale.parse_obj(ngff_json["multiscales"][0])
except ValidationError as e:
    print(f"Error found", e.json())

This approach keeps all the pydantic models separate from the ome-zarr-py classes.

But I'm wondering whether we should be thinking about combining them. E.g.

class Multiscales(Spec):
and https://github.com/JaneliaSciComp/pydantic-ome-ngff/blob/176e4521acb5b0e4e9d19f3e15e72fedf87789e4/src/pydantic_ome_ngff/v04/multiscales.py#L67 seem like they're kinda equivalent, but I've not looked too closely at how that might work...?

@ziw-liu
Copy link

ziw-liu commented Apr 21, 2023

This approach keeps all the pydantic models separate from the ome-zarr-py classes.

But I'm wondering whether we should be thinking about combining them. E.g.

If combining them means that the class for array I/O is a child of a pydantic model class, I think it would generally make things cumbersome, since pydantic models are not designed to be stateful.

Edit: not

@giovp
Copy link
Contributor

giovp commented Apr 27, 2023

+1 for consolidating models and schema for ome-ngff with pydantic, both https://github.com/JaneliaSciComp/pydantic-ome-ngff and https://github.com/czbiohub/iohub/blob/main/iohub/ngff_meta.py are really great efforts!

@clbarnes
Copy link

clbarnes commented Jul 9, 2023

If combining them means that the class for array I/O is a child of a pydantic model class, I think it would generally make things cumbersome, since pydantic models are not designed to be stateful.

The reader/writer class could compose over the pydantic class (e.g. at a .metadata instance variable), or pydantic could be used for validating IO and then the user-facing class could be constructed from pydantic models under the hood.

@yarikoptic
Copy link
Contributor

cool kidz are also drinking some https://linkml.io coolaid these days as even higher level than pydantic semantic description of the model, and then producing pydantic and whatnot serializations/exports.

@d-v-b
Copy link
Author

d-v-b commented Feb 21, 2024

@yarikoptic can you link to a repo of said cool kidz doing this? Because while I have heard a lot about linkml from a "this looks cool" perspective, I have seen very little code, and the code I have seen looked like it was doing quite a bit more than we need.

@joshmoore
Copy link
Member

@d-v-b, that's one use of linkml as opposed to anything core. Calling out some names that I know of:

I think we've discussed it elsewhere (Zürich?) but in general I don't mind the starting point of our pipeline being pydantic, but I very much think we need to be careful not to overfit to it because we will need to transform/translate out and into other representations. LinkML does that quite well. And for one of the representations I'm personally interested in (JSON-LD), it's likely one of the best, though I admit that's less critical for the core NGFF types than for additional metadata that I would like to record with/in the NGFF.

@d-v-b
Copy link
Author

d-v-b commented Feb 21, 2024

I think we've discussed it elsewhere (Zürich?) but in general I don't mind the starting point of our pipeline being pydantic, but I very much think we need to be careful not to overfit to it because we will need to transform/translate out and into other representations.

To be clear, the spec is JSON, so modeling the spec can be done by any python library that makes it easy to define JSON-serializable classes. With that constraint in mind, I think there's no risk that modeling the spec with pydantic, or dataclasses, or attrs, or marshmellow, would lead to any overfitting. On the other hand, not modeling the spec with one of the above libraries (or something functionally equivalent) in the reference python implementation of the spec generates friction for users and developers.

@joshmoore
Copy link
Member

Sorry, I likely have mixed a few discussions. 👍 for a declarative representation at the core, I think we're clearly agreeing there, and then we can see how things shake out with pydantic from there.

@satra
Copy link
Contributor

satra commented Feb 21, 2024

@joshmoore - just a note there was more work done during the workshop which brings us closer to defining the entire spec as linkml (https://github.com/linkml/linkml-arrays), which can in turn generate json, jsonld, etc.,.

@melonora
Copy link

@joshmoore - just a note there was more work done during the workshop which brings us closer to defining the entire spec as linkml (https://github.com/linkml/linkml-arrays), which can in turn generate json, jsonld, etc.,.

and work ongoing on named arrays:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants