Two questions about converting larger than memory ND data into ome-zarr #255

dpshepherd · 2023-02-25T16:19:16Z

Hi all,

Thanks for the hard work on this package and overall on ome-ngff. We are very excited to learn that Dask arrays are now supported!

We have 4D data of shape CZYX, where typically c=17 and dtype=np.uint16. The data is generated by iterative multiplexed light-sheet imaging. The 'zyx' dimension is the same for each channel and is usually large (ranging from [256,50000,50000] to [1000,100000,100000]). The full resolution data for each channel is stored as a Zarr array on disk and can be stacked together using Dask.

Two questions regarding converting this data to ome-zarr:

Should we pre-calculate the multiscale data on our own given the large size? Looking through a few issues and PRs, it isn't clear to us if the Scaler() function in ome-zarr-py performs lazy down-sampling.
Is there a concrete example on how to construct the metadata dictionary that contains the channel names and colors for each channel? We've found good example on the axes and transformations, but was a bit unsure about channels. Sorry if we missed something obvious.

Thanks!

The text was updated successfully, but these errors were encountered:

will-moore · 2023-02-27T14:09:14Z

The write_image() should be able to handle a dask array and perform lazy downsampling, but we (OME) haven't tested with the size of data you're working with, although others may have done.

The Scaler class only has one way of downsampling for dask arrays, which uses resize from https://github.com/ome/ome-zarr-py/blob/master/ome_zarr/dask_utils.py#L11
to downscale and then write the data to disk:

ome-zarr-py/ome_zarr/writer.py

Line 496 in 2c4d489

def _write_dask_image(

There was some discussion on the logic for that on the PR: #192 (comment)

There is a PR currently open to fix a bug with the resizing of the edge tiles in a dask array at #244.

There's also an issue raised about this at #237.

No, there's no channels constructor helper methods. Just the example at https://ngff.openmicroscopy.org/latest/#omero-md. Apologies for the minimal docs there.
The schema (see https://github.com/ome/ngff/blob/ee4d5dab677636a28f1f65c248a751e279a0d1fe/0.4/schemas/image.schema#L97) specifies that just window and color are required. The window.min and .max are the range of pixel values and the start/end are rendering settings range for black (start) to saturated (end)`.

toloudis · 2023-02-27T19:47:55Z

Coincidentally, I have an immediate need to parameterize the order parameter which we left at order=1 for the dask skimage rescale function.
https://github.com/ome/ome-zarr-py/blob/master/ome_zarr/scale.py#L153
Interestingly, for visualization of raw microscopy intensities, using order>1 preserves good details but for segmentations/labels we need to use low order to prevent interpolation.
I'll probably PR something soon on that. It could be interesting to allow providing one's own external Scaler implementation too - I can't remember if that was ever a thing.

dpshepherd · 2023-02-28T01:12:15Z

Hi all,

Thank you both for the info. We are trying with some smaller data first and hit a few technical snags. We'll work on them on our own and come back with more questions.

Thanks again!

dpshepherd · 2023-03-09T15:15:56Z

Hi all,

We ended up writing lazy downsampling code for these large datasets, as the current state of this project attempts to load the entire full-resolution array into memory to calculate the downsamples.

Because we generate the data from our own microscopes and are now doing the downsampling on our own, it makes more sense to re-arrange the existing zarr store and then add the various OME format attributes. Otherwise, we are needlessly copying data between two zarr stores. On that note, addressing issue #258 would help us a lot, because we could then validate.

Thanks for the guidance! I'll try to find a place to host the completed ome-zarr to see how viewing such a large dataset remotely works once everything is working.

will-moore mentioned this issue Mar 27, 2023

Scaling issues #268

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two questions about converting larger than memory ND data into ome-zarr #255

Two questions about converting larger than memory ND data into ome-zarr #255

dpshepherd commented Feb 25, 2023

will-moore commented Feb 27, 2023

toloudis commented Feb 27, 2023

dpshepherd commented Feb 28, 2023

dpshepherd commented Mar 9, 2023

Two questions about converting larger than memory ND data into ome-zarr #255

Two questions about converting larger than memory ND data into ome-zarr #255

Comments

dpshepherd commented Feb 25, 2023

will-moore commented Feb 27, 2023

toloudis commented Feb 27, 2023

dpshepherd commented Feb 28, 2023

dpshepherd commented Mar 9, 2023