Optimize writing performance for `MultiscaleSpatialImage` #577

LucaMarconato · 2024-06-11T14:26:25Z

As observed by @ArneDefauw, unnecessary loading operations are performed when calling write_multiscale() on a list of lazy tensors derived from to_multiscale().

Optimizing the order in which the data is computed and written to disk, so to avoid the loading of the same chunks 2+ times, would probably lead to a drastic performance improvement, up to 10-fold.

The text was updated successfully, but these errors were encountered:

ArneDefauw · 2024-06-12T07:55:20Z

I include a minimal example to reproduce the observed behaviour.

If arr.persist() is called, the code completes in ~10s, but if arr.persist() is commented, the code compeletes in ~50 s (i.e. 10s for each scale -> some_function is called 5 times).
As an alternative to using .persist(), writing to a zarr store, and then loading it back, evidently 'solves' the problem in a similar way.


import os
import tempfile
import time

import dask.array as da
import numpy as np
import spatialdata
from spatialdata.datasets import blobs

sdata = blobs()

start = time.time()

with tempfile.TemporaryDirectory() as temp_dir:
    sdata.write(os.path.join(temp_dir, "sdata_blobs_dummy.zarr"))

    def _some_function(arr):
        arr = arr * 2
        time.sleep(10)
        return arr

    arr = sdata["blobs_image"].data

    arr = da.map_blocks(_some_function, arr, dtype=float, meta=np.array((), dtype=float))

    arr = arr.persist()

    # or as alternative to persist, write to intermediate zarr store
    # dask_zarr_path = os.path.join(temp_dir, "dask_array.zarr")
    # arr.to_zarr(dask_zarr_path, overwrite=True)
    # arr = da.from_zarr(dask_zarr_path)

    se = spatialdata.models.Image2DModel.parse(
        arr,
        scale_factors=[2, 2, 2, 2],
    )

    sdata["blobs_image_processed"] = se

    sdata.write_element("blobs_image_processed")


print(time.time() - start)

LucaMarconato added the topic: IO label Jun 11, 2024

LucaMarconato mentioned this issue Jun 11, 2024

Incremental io #525

Closed

rtubelleza mentioned this issue Jul 24, 2024

Enhancement idea: Large multiscale / pyramidal image visualisation scverse/napari-spatialdata#241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize writing performance for `MultiscaleSpatialImage` #577

Optimize writing performance for `MultiscaleSpatialImage` #577

LucaMarconato commented Jun 11, 2024

ArneDefauw commented Jun 12, 2024

Optimize writing performance for MultiscaleSpatialImage #577

Optimize writing performance for MultiscaleSpatialImage #577

Comments

LucaMarconato commented Jun 11, 2024

ArneDefauw commented Jun 12, 2024

Optimize writing performance for `MultiscaleSpatialImage` #577

Optimize writing performance for `MultiscaleSpatialImage` #577