Bioformat image server reader pool extraction #1287

Rylern · 2023-08-09T10:25:59Z

Initial refactoring of the bio-format image server to extract code that could be reused in the OMERO extension.
Basically, the bio-format image server uses a pool of readers to retrieve pixel values in parallel. This pull request extracted this behavior from the bio-format image server.

4 classes were created:

ReaderPool: abstract class that can read pixel values of an image using several readers in parallel. A class extending this class must define how to fetch pixel values, but the rest is handled by ReaderPool.
BioFormatReaderPool: implementation of ReaderPool with specific code to open bio-format images.
ReaderWrapper: interface which is a wrapper around an image reader. It is suited for readers that return arrays of bytes when reading pixel values.
BioFormatReaderWrapper: implementation of ReaderWrapper with specific code to open bio-format images.

The BioFormatImageServer and BioFormatServerBuilder were slightly changed to adapt to the new classes.

The unit tests passed with these changes.

This pull request should not be merged now because I still have to address a few things:

Where should we place the ReaderPool and ReaderWrapper classes? Currently there are in the servers.bioformats package, but they are not specific to bio-format.
Should I refactor the BioFormatImageServer to use the best practises we have been discussing? I see this file has a few warnings and the constructor takes 500 lines.
I will now try to use ReaderPool and ReaderWrapper in the OMERO extension, so I may have to change a few things if I realize that theses classes are not completely generic.

I also have a question:

When does the getAssociatedImage(String) function of qupath.lib.images.servers.ImageServer is used? I don't think I was able to test it

petebankhead · 2023-08-24T15:10:49Z

Thanks!

Where should we place the ReaderPool and ReaderWrapper classes? Currently there are in the servers.bioformats package, but they are not specific to bio-format.

ReaderPool seems to be currently - it still has quite a lot of loci.* imports, which would prevent it from moving to a more core QuPath module.

I think that's fine because it makes sense for the OMERO extension to depend upon the Bio-Formats one - at least for raw pixel access via ICE, since many other dependencies are shared. And if we follow the advice of accessing pixels by Zarr then we might still have a Bio-Formats dependency via OMEZarrReader as described here.

Should I refactor the BioFormatImageServer to use the best practises we have been discussing? I see this file has a few warnings and the constructor takes 500 lines.

Yes, that would be good. But we can merge sooner if it helps.

I will now try to use ReaderPool and ReaderWrapper in the OMERO extension, so I may have to change a few things if I realize that theses classes are not completely generic.

I don't think you need to worry too much about making them very generic - just to work well enough for Bio-Formats and OMERO. They both have a quite different way of returning pixel arrays that I haven't seen elsewhere.

Based on the recent forum discussion, I have the impression that the current working Zarr support for Java uses n5-zarr, which in turn relates to (I think...) imglib2.

Since we already plan to explore imglib2, there's a chance that a lot of QuPath's ImageServer and image reading code may be replaced if we find better approaches with imglib2.

petebankhead · 2023-08-24T15:47:47Z

When does the getAssociatedImage(String) function of qupath.lib.images.servers.ImageServer is used? I don't think I was able to test it

It is used with View → Show slide label - but is really only relevant for some file formats (although useful when relevant).

It's inspired by the 'associated images' provided by OpenSlide here - since otherwise QuPath would have had no way to provide access to the label etc. But it doesn't map so easily to images from other readers, including Bio-Formats, which doesn't identify label images as being different.

Rylern · 2023-09-05T09:25:39Z

When does the getAssociatedImage(String) function of qupath.lib.images.servers.ImageServer is used? I don't think I was able to test it

It is used with View → Show slide label - but is really only relevant for some file formats (although useful when relevant).

It's inspired by the 'associated images' provided by OpenSlide here - since otherwise QuPath would have had no way to provide access to the label etc. But it doesn't map so easily to images from other readers, including Bio-Formats, which doesn't identify label images as being different.

Do you know a way to test it? This Show slide label window always indicates "No label available" with the images I have.

Apart from that, I think this pull request can be merged. The bio-format and omero ice image servers seem to be working with these new changes. I may still have to clean the code a bit but I think having the OMERO extension working properly is more important for now.

petebankhead

Thanks, I've added some comments.

The main question (maybe to discuss with @alanocallaghan and @finglis) is whether we should use Optional instead of just returning null. I am slightly in favor of using it sometimes - especially when the return really is optional - but here it seems to be used where throwing an exception would be preferable.

Returning null can be informative, inasmuch as it suggests we have a sparse image without pixels for every location - and shouldn't happen whenever there has been an exception.

Apart from that, I'm not sure if the ReaderWrapper interface and ReaderPool abstract classes have quite the right design for ease of interpretation and reuse.

ReaderWrapper looks very tied to the 'OME' way of doing things (Bio-Formats and OMERO); I'd expect a general image reader to return something more informative than a byte[][], which can only be interpreted with a lot of other return values and Bio-Formats logic. So it isn't very easy to use in a standalone way.

That isn't in itself a problem, but if writing a general image reader for use with the OMERO web API or IIIPImage Server (for example) I imagine it would be far harder to return a byte[][] than a BufferedImage.

So I think it should either 1) embrace being Bio-Formats/OMERO-specific, and prioritise simplicity, or 2) incorporate more of the processing logic that converts the byte[][] into a BufferedImage, and prioritise reusability. If the goal is for ReaderWrapper and ReaderPool to be reusable, it needs to be easy to generate and work with their return values.

ReaderPool then is abstract, but has very few abstract methods. One is to create a ReaderWrapper. This probably isn't helpful outside the class. It could also be handled with composition rather than inheritance by passing a Supplier<ReaderWrapper> as an argument to its constructor if really necessary, like when creating a ThreadLocal. If this change is made, ReaderPool could still be subclassed, but wouldn't have to be subclassed.

Elsewhere ReaderPool contains a lot of logic for image reading, which feels like it belongs in the reader itself - not the pool for managing readers. And it's also quite Bio-Formats-focussed, since the idea of a series within an image is quite Bio-Formats-specific.

So overall I don't have a clear idea of the logical separation between ReaderWrapper and ReaderPool. It feels like the logic of image reading is now more split across more classes + Bio-Formats itself, and it's quite hard to trace what is happening.

Finally, reading this reminds me that there were two important limitations in the original design:

It only supports returning all pixels for all channels simultaneously. In preparation for the future, it would be beneficial to have an API that optionally supports returning individual channels.
- This isn't needed if the refactoring is minor. But any major refactoring has a chance of regression (in terms of some obscure images failing), so we should try to avoid doing it multiple times.
Associated images can sometimes be very big - even pyramidal or with multiple channels. So the logic for reading them doesn't have to be fundamentally different to the logic for reading other images. From a Bio-Formats perspective, you might just request the image for a different series.

What do you think?

petebankhead · 2023-09-05T14:56:47Z

...nsion-bioformats/src/main/java/qupath/lib/images/servers/bioformats/BioFormatReaderPool.java

+ * <p>A {@link ReaderPool} to use with Bio-Format images.</p>
+ * <p>It uses the {@link BioFormatReaderWrapper}.</p>
+ */
+class BioFormatReaderPool extends ReaderPool {


Note it should be BioFormats and not BioFormat (also elsewhere).

Also: please leave a blank line after the class definition and before the first private fields are defined.

Usually there should be a logger first as well - here, I think it would make sense to log at the debug level when createReaderWrapper is called.

(I realise that's not always the case in other classes, but we have a better log viewer now :) )

Usually there should be a logger first as well - here, I think it would make sense to log at the debug level when createReaderWrapper is called.

Until now I've never used logger.debug, when should I use it?

I'd say whenever there's info that could be useful to track down an error, but we wouldn't want logged routinely (since info level is the default).

Then logger.trace helps for really fine-grained information (e.g. it might be used to record rendering times in the viewer). Logs at trace level should almost never be turned on, because they will result in enormous logs.

Logging levels aren't used consistently enough in QuPath, but the new log viewer makes them far more useful.

...nsion-bioformats/src/main/java/qupath/lib/images/servers/bioformats/BioFormatReaderPool.java