Managing image segmentation data (mutability of ome.zarr) #42

tischi · 2021-04-23T16:19:43Z

We (ping @cgirardot) have been thinking a bit about a data management with ome.ngff and had a question/ concern.

Let's say you start with an ome.zarr container that only contains the raw data and then you compute a segmentation (label mask image).

If you add this label mask image into the original ome.zarr, you sort of mutate its identity, because its content is changing, which may not be ideal from a data management perspective.

If you instead were to create a new ome.zarr containing both the raw data and the segmentation, you would have to copy the raw data, which may be prohibitive.

So we were wondering if the idea is to create a new ome.zarr container that only contains the label mask data and a link to the raw data, such that viewers would still open it as if it would contain both the raw and segmentation data.

Any thoughts on this?

joshmoore · 2021-04-24T09:08:06Z

#13 should enable that. However, from my point-of-view, there will still be mutation use cases as well, so I would hope we could define an "internal identity" so the community would feel comfortable adding after the fact.

tischi · 2021-04-28T05:22:23Z

@joshmoore Could you elaborate on the idea of an "internal identity"? Do you already have a vision how that could work in practice?
Let's say I have an ome.zarr with only raw data, let's call this image A (only raw). Then I add a label mask to this ome.zarr, let's call this image B (raw and labels). From a data management point-of-view: would image A still exist or does it disappear during the creation of image B?
I think we would be good if we could come up with a solution such that image A in fact does still exist. Because data provenance wise A is the origin of B and it is good to keep track of this. Also it is good to be able to go back to A in case one needs to recompute B.

joshmoore · 2021-04-28T06:57:39Z

Could you elaborate on the idea of an "internal identity"?

my_experiment.zarr/
├── analysis
│   └── segmentation
└── image_data

One of the keys of linked data is the ability to reference entities by name. So here I would think image_data and its graph of data & metadata would have an identifier (e.g. urn:uuid:d9dfa7ca-a7ee-11eb-a679-5f0cec9f8212). The segmentation would as well. The segmentation would talk about the image_data (likely not the other way around) forming a graph. You could refer to either of them externally as well and assert that they are independent, e.g. for defining a DOI. To some extent, this isn't much different than having:

my_experiment_image_data.zarr
my_experiment_segmentation.zarr

and having metadata at yet another level that ties them together, except it would provide a consistent framework for doing so in one fileset if you wanted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Managing image segmentation data (mutability of ome.zarr) #42

Managing image segmentation data (mutability of ome.zarr) #42

tischi commented Apr 23, 2021 •

edited

Loading

joshmoore commented Apr 24, 2021

tischi commented Apr 28, 2021

joshmoore commented Apr 28, 2021

Managing image segmentation data (mutability of ome.zarr) #42

Managing image segmentation data (mutability of ome.zarr) #42

Comments

tischi commented Apr 23, 2021 • edited Loading

joshmoore commented Apr 24, 2021

tischi commented Apr 28, 2021

joshmoore commented Apr 28, 2021

tischi commented Apr 23, 2021 •

edited

Loading