Support arrays of mixed-dimensionality #150

tcompa · 2022-10-25T06:45:56Z

At the moment all our image arrays are 4D (CZYX) and each one of our label arrays is 3D (ZYX). This property is visible in the .zarray files, and in the folder structure. When the dimension along Z is dummy (a single Z plane), we still use the 4D/3D structure, with shape like (num_channels, 1, num_y, num_x) or (1, num_y, num_x). Also ROIs are defined in the same way: they are always 3D shapes (defined by 6 numbers), and in some cases the Z part is dummy (starting at 0 and ending at pixel_size_z, corresponding to a single pixel).

The perspective is that we will handle arrays with mixed dimensions, which can be up to 5D (TCZYX) but also lack some of the intermediate channels (like TCYX), see #149 (comment):

Also, while we are not tackling time-data yet, maybe we should start thinking about this topic for such design decisions. Eventually, we will also process time-resolved data, so data may be 2D, 3D, 4D or e.g. 2D + time (=> 3 actual dimensions, but maybe saved as 4D with Z dimension = 1)

Broadly speaking, a possible (preliminary!) plan to support this general case would be to

Have some custom handling of the dimensionality in the zarr-creation tasks.
Consistently use named axis in all other tasks.
Make sure that the relevant functions/tasks are capable of handling arrays of different shapes

Re: point 1
This means that create_zarr_structure and yokogawa_to_zarr would include more logic, to choose the right structure of the target zarr array. This may include something like explicit user-provided parameters on the structure one should expect, or inference from the metadata if that's sufficiently robust. As always, the simplest is to have a couple of small test folders with different cases (e.g. CZYX, TCZYX, TCYX, and YX?)

Re: point 2
This may be a bit complex, but the nice advantage is that we would be moving even closer to OME-NGFF specs.
Note that sometimes we already have to specify named axes in the OME-NGFF metadata, e.g. in

fractal-tasks-core/fractal_tasks_core/napari_workflows_wrapper.py

Lines 204 to 215 in f85f880

    
           label_group.attrs["multiscales"] = [ 
        
               { 
        
                   "name": label_name, 
        
                   "version": __OME_NGFF_VERSION__, 
        
                   "axes": [ 
        
                       ax 
        
                       for ax in multiscales[0]["axes"] 
        
                       if ax["type"] != "channel" 
        
                   ], 
        
                   "datasets": new_datasets, 
        
               } 
        
           ]

Re: point 3
It should not be too challenging for functions with numpy arrays as inputs/outputs (thanks to broadcasting rules). It could be a bit trickier with dask arrays, but my feeling is that we are currently moving towards a direction where dask is mostly used to lazily-load arrays and organize the processing of several small parts (note that this could change, e.g. if we push towards in-task ROI-parallelization, and we may need to depend more heavily on dask arrays.. to be assessed).

The text was updated successfully, but these errors were encountered:

jluethi · 2022-11-01T17:28:53Z

Very good overview @tcompa

The perspective is that we will handle arrays with mixed dimensions, which can be up to 5D (TCZYX) but also lack some of the intermediate channels (like TCYX)

Actually, arrays can be n-dimensional. We always expect YX to be there. Anything else is optional. There will often be Z (though not always, we'll need to make the 2D only case work as well, see #124). There often will be multiple channels (those typically can just be looped over) and there may be time information (sometimes to be looped over, i.e. process timepoint by timepoint, e.g. for segmentation. Some other times we'll need to process whole time series at once, e.g. to do tracking).
And users may come up with extra dimensions at some point. We don't need to support processing those as long as we don't have clear use cases for them, but in an optimal case, we should fail when we get such OME-Zarr files / it should be easy to adapt a task to them.

Have some custom handling of the dimensionality in the zarr-creation tasks.

That seems good to me. We can be somewhat conservative in adding dimensions. Let's make sure 2D only (#124) and time data (#169) can be parsed, but hold off on more complex logic.

create_zarr_structure and yokogawa_to_zarr would include more logic

=> Sounds good to me. Let's add complexity where needed for the two issues above. I'll work on small test sets. The 2D is ready, the time one I will need to look into.

Consistently use named axis in all other tasks.

The seems like a very good approach to make sure we're stable when users start introducing different dimensions, when we only have specific ones.

Make sure that the relevant functions/tasks are capable of handling arrays of different shapes

Lets:
a) Find a good way to define what input a task can handle, e.g. in its docstring
b) Let's make sure the tasks then run on the different shapes they are supposed to work + explicitly load them

It could be a bit trickier with dask arrays

Good point. But our current approach should scale quite a while, I hope. Let's re-asses this if it becomes necessary

tcompa · 2023-06-12T15:36:11Z

A lot of discussion is ongoing in:

#150)

tcompa · 2023-10-12T10:26:18Z

Adding to this issue, work in https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/557/files introduces the functions get_single_image_ROI and get_image_grid_ROIs which (in the current versions) do require a set of ZYX pixel sizes.
These are obtained through the NgffImageMeta.pixel_sizes_zyx property, which is setting the Z pixel size to 1 if the corresponding channel is missing - and for this reason the import-ome-zarr task remains flexible.

In the future, also these new functions will need to be made more flexible (that is, they should not always require the Z pixel size).

tcompa mentioned this issue Oct 25, 2022

[napari-workflows] Support 2D processing #149

Closed

tcompa added the Overview label Oct 31, 2022

jluethi added this to the Tasks handle axes flexibly milestone Nov 1, 2022

jluethi mentioned this issue Jun 7, 2023

Support 2D-only OME-Zarrs? #398

Closed

tcompa mentioned this issue Jun 12, 2023

Making coordinateTransformations flexible for axes & comply with spec (include channels transformation) #420

Closed

2 tasks

jluethi mentioned this issue Sep 15, 2023

Cellpose task doesn't handle 2D MD data anymore #520

Closed

tcompa added a commit that referenced this issue Sep 19, 2023

Use new NgffImage model in yokogawa_to_ome_zarr task (ref #351, ref

30d23d5

#150)

tcompa mentioned this issue Sep 19, 2023

Introduce Pydantic models for NGFF metadata #528

Merged

7 tasks

This was referenced Oct 12, 2023

Introduce import-ome-zarr task #557

Merged

Include Omero-channels-metadata update in import-ome-zarr task #578

Closed

jluethi mentioned this issue Oct 19, 2023

Include Omero-channels-metadata update in import-ome-zarr task #579

Merged

1 task

jluethi added the OME-Zarr reader/writer label Jul 3, 2024

jluethi added the ngio label Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support arrays of mixed-dimensionality #150

Support arrays of mixed-dimensionality #150

tcompa commented Oct 25, 2022

jluethi commented Nov 1, 2022

tcompa commented Jun 12, 2023

tcompa commented Oct 12, 2023

Support arrays of mixed-dimensionality #150

Support arrays of mixed-dimensionality #150

Comments

tcompa commented Oct 25, 2022

jluethi commented Nov 1, 2022

tcompa commented Jun 12, 2023

tcompa commented Oct 12, 2023