Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing image segmentation data (mutability of ome.zarr) #42

Open
tischi opened this issue Apr 23, 2021 · 3 comments
Open

Managing image segmentation data (mutability of ome.zarr) #42

tischi opened this issue Apr 23, 2021 · 3 comments

Comments

@tischi
Copy link

tischi commented Apr 23, 2021

@joshmoore @constantinpape

We (ping @cgirardot) have been thinking a bit about a data management with ome.ngff and had a question/ concern.

Let's say you start with an ome.zarr container that only contains the raw data and then you compute a segmentation (label mask image).

If you add this label mask image into the original ome.zarr, you sort of mutate its identity, because its content is changing, which may not be ideal from a data management perspective.

If you instead were to create a new ome.zarr containing both the raw data and the segmentation, you would have to copy the raw data, which may be prohibitive.

So we were wondering if the idea is to create a new ome.zarr container that only contains the label mask data and a link to the raw data, such that viewers would still open it as if it would contain both the raw and segmentation data.

Any thoughts on this?

@joshmoore
Copy link
Member

#13 should enable that. However, from my point-of-view, there will still be mutation use cases as well, so I would hope we could define an "internal identity" so the community would feel comfortable adding after the fact.

@tischi
Copy link
Author

tischi commented Apr 28, 2021

@joshmoore Could you elaborate on the idea of an "internal identity"? Do you already have a vision how that could work in practice?
Let's say I have an ome.zarr with only raw data, let's call this image A (only raw). Then I add a label mask to this ome.zarr, let's call this image B (raw and labels). From a data management point-of-view: would image A still exist or does it disappear during the creation of image B?
I think we would be good if we could come up with a solution such that image A in fact does still exist. Because data provenance wise A is the origin of B and it is good to keep track of this. Also it is good to be able to go back to A in case one needs to recompute B.

@joshmoore
Copy link
Member

Could you elaborate on the idea of an "internal identity"?

my_experiment.zarr/
├── analysis
│   └── segmentation
└── image_data

One of the keys of linked data is the ability to reference entities by name. So here I would think image_data and its graph of data & metadata would have an identifier (e.g. urn:uuid:d9dfa7ca-a7ee-11eb-a679-5f0cec9f8212). The segmentation would as well. The segmentation would talk about the image_data (likely not the other way around) forming a graph. You could refer to either of them externally as well and assert that they are independent, e.g. for defining a DOI. To some extent, this isn't much different than having:

my_experiment_image_data.zarr
my_experiment_segmentation.zarr

and having metadata at yet another level that ties them together, except it would provide a consistent framework for doing so in one fileset if you wanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants