Preserve OMERO rendering metadata #82

melissalinkert · 2022-09-21T23:23:31Z

Discussed with @mgheirat, @muhanadz, and @sbesson that it would be nice to keep any calculated min/max values when converting to OME-TIFF. Since there isn't really a place in the OME schema to do that, this PR uses a MapAnnotation linked to each Image. The MapAnnotation has the namespace glencoesoftware.com/ngff/rendering, and a single key (omero) whose value is the JSON described here: https://ngff.openmicroscopy.org/latest/#omero-md.

Namespace, annotation type, etc. all up for debate if anyone has better ideas. Not necessarily expecting this to be merged as-is, it's just a place to start. I thought about using an XMLAnnotation instead, like we do for the JSON annotation described here: https://github.com/glencoesoftware/ome-omero-roitool/blob/master/src/main/java/com/glencoesoftware/roitool/OMEOMEROConverter.java#L199. MapAnnotation seemed less likely to break things that might try to parse an XMLAnnotation value.

It should be possible to convert an input dataset to OME-TIFF with glencoesoftware/bioformats2raw#160 and this PR, and see the linked annotations with showinf -nopix -noflat -omexml on the output OME-TIFF. Importing the OME-TIFF to OMERO should allow the annotations to be queried - get the Image first, then look for linked annotations with the glencoesoftware.com/ngff/rendering namespace.

No tests (yet) because we'd need a released version of bioformats2raw that generates OMERO metadata.

See glencoesoftware/bioformats2raw#160

muhanadz · 2022-10-07T14:59:26Z

@sbesson @melissalinkert Is it possible to move the annotation from a key-value pair to xml annotation like below?:
https://v2-demo-dev.glencoesoftware.com/webclient/?show=image-254153 (under "Others"):

https://v2-demo-dev.glencoesoftware.com/webclient/?show=image-156067 (under "Others"):

sbesson · 2022-10-10T07:33:43Z

@muhanadz a priori, I think the writing code can use any OME structured annotation that suports storing a string value - see the full list here. So MapAnnotation, XmlAnnotation or even CommentAnnotation are viable options.

Since this is not a native XML string, from your side, what is the primary benefit of using XmlAnnotation rather than key/value pairs?

…rendering-annotation

melissalinkert · 2022-10-11T00:35:51Z

Discussed with @sbesson and @muhanadz earlier today. 9938304 now writes the rendering metadata into several different annotations, so that we can evaluate which is best suited to balancing the three main considerations here:

correctness/appropriateness with respect to OME schema
ease of use with an OMERO script
ability to ignore the annotation from a user's perspective in OMERO UIs

This PR has been switched to draft status to reflect the fact that it should not be merged as-is.

The only annotation type I added here that we didn't discuss is to have two DoubleAnnotations linked to each Channel - I forgot earlier that Channel is now annotatable. One DoubleAnnotation is for the min, and one is for the max, with the Description indicating which is which.

The next step is to convert a few files to OME-TIFF using bioformats2raw 0.5.0 and this PR, and then import the OME-TIFFs to OMERO. Test cases should include brightfield, fluorescence with 4-6 or so channels, and multiplex with a large number of channels (60-80?).

Once we decide which annotation strategy to use, this PR should also bump to bioformats2raw 0.5.0 and add test cases.

sbesson · 2022-10-11T15:12:46Z

The only annotation type I added here that we didn't discuss is to have two DoubleAnnotations linked to each Channel - I forgot earlier that Channel is now annotatable. One DoubleAnnotation is for the min, and one is for the max, with the Description indicating which is which.

Nice finding. At least from my side, this is definitely the most elegant solution with regards to respecting all the constraints exposed in #82 (comment).

Assuming channel annotation is the preferred solution, my main caveat is that relying exclusively on Description to differentiate min/max values is potentially brittle.

if we stick with DoubleAnnotation, I could see using different namespaces e.g. glencoesoftware.com/ngff/rendering/{min,max} which would also facilitate the client-side query
alternatively, a namespaced Channel.MapAnnotation using min/max as the keys would be a suitable approach
the latter would have the benefit of extensibility e.g. if we decided to store additional "hidden" rendering metadata at the Channel level

melissalinkert · 2022-10-11T15:33:43Z

94eea04 now gives just two options:

Single MapAnnotation linked to each channel, with namespace glencoesoftware.com/ngff/rendering and keys min and max
Two DoubleAnnotations linked to each channel, with namespaces glencoesoftware.com/ngff/rendering/min and glencoesoftware.com/ngff/rendering/max. The Description value is as it was before, but could be removed if the namespace is enough to distinguish between min and max.

Definitely going to be important to try this out with larger channel counts and an HCS dataset before making a final decision.

sbesson · 2022-10-14T15:53:48Z

Tested using the representative OME-TIFF plate from BBBC017, both in its original form and converted using bioformat2raw following by raw2ometiff with this PR included

The data was imported into OMERO under both format both with the default options and skipping min/max calculation + thumbnail generation steps.

The table below summarizes the time spent in the different steps of the import process (as rerported by omero fs importtime) as well as the number of inserts (as grepped from the log files,

	NIRHTa+001.ome.tiff	NIRHTa+001.ome.tiff (no min/max, no thumbnails)	NIRHTa+001_converted.ome.tiff	NIRHTa+001_converted.ome.tiff (no min/max, no thumbnails)
upload time	32.01s	13.19s	46.46s	21.87s
setId time	11.29s	9.34s	18.13s	17.05s
metadata time	673.88s	682.63s	2624.29	2527.59s
pixels time	719.68s	637.64s	2037.80	1893.49s
rdefs time	1065.52s		1478.75s
thumbnail time	442.69s		852.58s
Number of INSERT	28049	14223	69521	55695

A few comments:

as shown above, there is a non-neglectable impact to including these annotations especially in the case of HCS data. For the 384 wells plates with 9 fields of view / well and 2 channels / image example, the current proposal results in ~40K additional inserts to the database which certainly accounts for the extra metadata time (and possible the extra pixels time?)
arguably, the current state of the proposal is exagerating the impact as it results in the creation of 1 MapAnnotation, 2 DoubleAnnotation and their associated ChannelAnnotation link for each channel (6 x 2 x 9 x 384 = 41472 extra objects).
from a database perspective, the MapAnnotation proposal is probably the most efficient one as it will lead to the generation of 14K objects
in all cases, these metrics should definitely be supplemented by time spent in post-processing steps when min/max and thumbnails are skipped at import time. This is the case where we would be expecting the presence of annotation to make the biggest difference

All in all, it is very possible there is not an universal workflow that will work with all modalities and we will need to make decisions based on the domain. For instance, it is very likely that preserving the rendering metadata when converting single large floating-point images will be valuable but this might be something that we don't want to do for HCS data simply because some of the costs outweigh the benefits.

…rendering-annotation

sbesson

Tested in the context of 3 imaging modalities:

HCS: using the representative BBBC017 plate
fluorescence: using the tubhiswt-4D sample
EM: using the sample available at https://downloads.openmicroscopy.org/images/OME-TIFF/2016-06/sub-resolutions/EM/.

In all three cases, the original OME-TIFF files was converted using bioformats2raw 0.5.0 (including the min/max statistics) followed by raw2ometiff with this PR included.
Both original and converted OME-TIFF datasets were imported into OMERO and tested with post-import scripts comparing the speed of recomputing the min/max values and regenerating the thumbnails with and without the new channel annotations.

Testing reveals the outcome largely depends on the ratio between the number of images and the volume of pixels data per image. More specifically

HCS is the most adverse scenario as the annotations per channel represent a significant increase of objects in the database as mentioned above. For typical HCS acquisitions where individual well samples are only multi-channel, the new annotation does not significantly speed up import time and instead increased it by a factor 1.5x-2x
for all other modalities, the new functionalities is a game changer as it prevents iterating over all the tiles for computing the values either at import time or post-import. The bigger the volume of pixel data / channel, the more advantageous this feature will be
for the large floating-point example, this feature is particularly valuable if the image is imported in a OMERO.server 5.6.5 with https://omero.readthedocs.io/en/stable/sysadmins/config.html#omero.pixeldata.max_plane_float_override set to false as the pixel type dynamic range makes the data almost

It is worth noting that the domain where the annotation is the most disadvantageous is also not the one which drove the development of these conversion utilities. For multi-acquisitions scenarios like HCS, I would argue that large single-file OME-TIFF offer more disadvantages than advantages and I would not recommend using this whole pipeline with the current output.

In summary, I would be inclined to move forward with this new feature but try to be defensive about its scope. One idea would be to disable the channel annotation generation for now when HCS metadata is present in the source NGFF file. I know (and understand) that Melissa is not particularly keen on special-casing so happy to hear other's feedback.

melissalinkert · 2022-11-14T16:07:32Z

I think I'm OK with HCS vs non-HCS being handled separately, since those are already two distinct output formats. We were at one point discussing checking channel counts etc., which is where I'm less comfortable.

Is the idea then to write the channel annotations when any of the following are true:

HCS metadata not present in original data
existing --no-hcs option is used
existing --minmax option is used (maybe needed for plates with stitched floating point wells?)

and so omit the channel annotations when any of the following are true:

HCS metadata present in original data AND --no-hcs not used
existing --no-minmax option is used

sbesson · 2022-11-14T16:37:33Z

I think I'm OK with HCS vs non-HCS being handled separately, since those are already two distinct output formats. We were at one point discussing checking channel counts etc., which is where I'm less comfortable.

I agree. Anything that would force us to define a hardcoded arbitrary maximum channel limit is worth avoiding.

Is the idea then to write the channel annotations when any of the following are true:

HCS metadata not present in original data

existing --no-hcs option is used

existing --minmax option is used

and so omit the channel annotations when any of the following are true:

HCS metadata present in original data AND --no-hcs not used

existing --no-minmax option is used

The second condition makes sense to me but I think this means the channel annotations should be written when:

HCS metadata not present in original data OR existing --no-hcs option is used
AND existing --minmax option is used

(maybe needed for plates with stitched floating point wells?)

Interesting, I could definitely imagine some HCS scenarios which could benefit from storing these annotations. Floating-point wells are one example, long timelapse HCS acquisitions such as https://idr.openmicroscopy.org/webclient/?show=well-1351622 (182 timepoints / well sample) could be another. Thoughts about adding an opt-in flag allowing the user to override the omission conditions described above and unconditionally write the channel annotations if --minmax has been used in the first step?

melissalinkert · 2022-11-14T17:24:00Z

My thinking was --minmax == force the annotations to be written no matter what. --no-minmax == force the annotations to be omitted. Neither --minmax nor --no-minmax == use HCS/no-HCS logic to decide whether to write annotations. I think we are saying the same thing just in a different way?

sbesson · 2022-11-14T17:58:14Z

My thinking was --minmax == force the annotations to be written no matter what. --no-minmax == force the annotations to be omitted. Neither --minmax nor --no-minmax == use HCS/no-HCS logic to decide whether to write annotations. I think we are saying the same thing just in a different way?

Just to clarify, are you talking about a new --[no-]minmax option defined at the raw2ometiff level? I was (probably wrongly) assuming you were referring to the --[no-]minmax option of the bioformats2raw utility.

…hannel-annotations is used

melissalinkert · 2022-11-15T20:34:43Z

Sorry, ignore me. My whole line of thinking yesterday was around turning off the potentially expensive min/max calculation at the bioformats2raw level. No min/max calculation in bioformats2raw then means the channel annotations can't be written here.
That's probably not what we actually want though.

63b5cf4 is one way to do this. If there is plate metadata, there will be no channel annotations by default. If you add the new --force-channel-annotations option, channel annotations will be written even if there is a plate.

joshmoore · 2022-11-21T14:41:09Z

This is a tricky one. Nice issue! 😉

Trying to capture my brainstorming from the formats meeting, though it doesn't help with the lack of an API for retrieving min/max values from arbitrary readers, another possible route for injecting this information is to consider everything that doesn't get converted into OME-XML from the intermediate OME-Zarr to be a file annotation in the OME-TIFF. (That leads us to the secondary issue of having file attachments for OME-TIFF, but that's a problem that would be good to solve regardless).

melissalinkert · 2022-11-30T14:51:24Z

After discussion with @sbesson / @muhanadz / @erindiel / @emilroz / @DavidStirling , closing this in favor of encouraging OME-Zarr usage in cases where min/max data needs to be used.

Agreed it would be nice to explore the file annotation concept, but I don't think we're realistically going to be able to work on that any time soon.

Preserve OMERO rendering metadata

4c98392

See glencoesoftware/bioformats2raw#160

melissalinkert marked this pull request as draft October 10, 2022 16:23

melissalinkert added 2 commits October 10, 2022 18:07

Merge branch 'master' of github.com:glencoesoftware/raw2ometiff into …

ee60e11

…rendering-annotation

Add a bunch of different options for OMERO rendering annotations

9938304

Simplify annotation choices to just Double or Map linked to Channel

94eea04

melissalinkert and others added 4 commits October 25, 2022 09:45

Remove DoubleAnnotations on Channel

e1028f3

Declare unidata-release repository

a0572ee

Merge branch 'master' of github.com:glencoesoftware/raw2ometiff into …

2875b3f

…rendering-annotation

Add test for new channel min/max annotations

255f6bf

melissalinkert marked this pull request as ready for review November 3, 2022 21:25

sbesson reviewed Nov 14, 2022

View reviewed changes

Don't write channel annotations if there is a plate, unless --force-c…

63b5cf4

…hannel-annotations is used

melissalinkert closed this Nov 30, 2022

sbesson mentioned this pull request Feb 23, 2023

Add --split option #80

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve OMERO rendering metadata #82

Preserve OMERO rendering metadata #82

melissalinkert commented Sep 21, 2022

muhanadz commented Oct 7, 2022

sbesson commented Oct 10, 2022

melissalinkert commented Oct 11, 2022

sbesson commented Oct 11, 2022 •

edited

Loading

melissalinkert commented Oct 11, 2022

sbesson commented Oct 14, 2022

sbesson left a comment

melissalinkert commented Nov 14, 2022

sbesson commented Nov 14, 2022 •

edited

Loading

melissalinkert commented Nov 14, 2022

sbesson commented Nov 14, 2022 •

edited

Loading

melissalinkert commented Nov 15, 2022

joshmoore commented Nov 21, 2022

melissalinkert commented Nov 30, 2022

Preserve OMERO rendering metadata #82

Preserve OMERO rendering metadata #82

Conversation

melissalinkert commented Sep 21, 2022

muhanadz commented Oct 7, 2022

sbesson commented Oct 10, 2022

melissalinkert commented Oct 11, 2022

sbesson commented Oct 11, 2022 • edited Loading

melissalinkert commented Oct 11, 2022

sbesson commented Oct 14, 2022

sbesson left a comment

Choose a reason for hiding this comment

melissalinkert commented Nov 14, 2022

sbesson commented Nov 14, 2022 • edited Loading

melissalinkert commented Nov 14, 2022

sbesson commented Nov 14, 2022 • edited Loading

melissalinkert commented Nov 15, 2022

joshmoore commented Nov 21, 2022

melissalinkert commented Nov 30, 2022

sbesson commented Oct 11, 2022 •

edited

Loading

sbesson commented Nov 14, 2022 •

edited

Loading

sbesson commented Nov 14, 2022 •

edited

Loading