Big images #134

will-moore · 2023-01-23T23:04:01Z

Fixes #111

This adds support for exporting big images in a tile-based approach instead of a plane-based approach.
As suggested by @joshmoore - this tile-based approach is used for ALL images (instead of using the whole plane for as a single chunk as before).

First we write a full-size array at 0, a tile at a time.
Then resize via ome_zarr.dask_utils.resize (see bug-fix at ome/ome-zarr-py#244)
to create a pyramid.

Other changes:

The --numpy_cache functionality has been replaced.
Previously, if you anticipated connection issues with OMERO, you could use the --numpy_cache option to cache chunks to disk at the same time as writing them to zarr. The problem with this was that if you lost connection and weren't expecting it, then you had nothing cached. Also, caching doubled the disk usage.

Now, if connection fails, you can simply rerun the omero zarr export Image:ID command and we pick-up writing to the existing partially-generated array.

This is nicer as you don't have to delete the partially-exported image as before and it's much faster as you do don't have to copy the cached chunks and re-write to zarr.
Exporting the big image below required many re-runs of the export command - maybe 20 - 30 times before completion!

The tile_width and tile_height options that previously only applied to bioformats2raw usage, now also apply to API-based export.

Image exported with this PR is at:
https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0048A/9846152.zarr/

I also fixed the Plate export to work with the new export logic and to support the interruption (partial export) of a Plate and the continued export from where you left off.

joshmoore

I'll export some sample data...

Especially if we need to split the logic, 👍 for having tests of each.

joshmoore · 2023-01-24T09:45:02Z

src/omero_zarr/raw_pixels.py

-    )
+    # if big image...
+    if image.requiresPixelsPyramid():
+        paths = add_big_image(image, parent, level_count)


Is the big image method so much slower that it's worth splitting the logic?

Ah - you mean always use a tile-based approach, even if it's not a big image?
One question then would be: should the chunk-size match the whole plane (as it currently does for non-big images) or would it be better to have max plane of e.g. 1024 or 512 (which would be a ~breaking change)?

Actually, that reminds me that I should update to get a preferred tile-size from the rawPixelsStore, instead of hard-coding it at 512 x 512....

will-moore · 2023-01-24T17:05:16Z

@joshmoore Updated as I think you suggested? Removed code duplication so we use tile sizes from rfs.getTileSize() for ALL images now, even if not big images.
Maybe not quite as fast for non-big images as previously, but I think speed is not so critical for export, and it certainly means less code to support 1 way of exporting.
Also, this adds support for the --cache_numpy behaviour for ALL images (including big images) now.

will-moore · 2023-01-25T11:43:56Z

Looking to test this PR on the export of https://idr.openmicroscopy.org/webclient/?show=image-9846152, the preferred tile-size from that Image is tile_size_x: 19120, tile_size_y: 27.
This seems like a sub-optimal tile size for viewing an OME-NGFF image, so I wonder if instead of using the preferred tile-size from rawPixelsStore, we simply choose a good default, e.g. 512 x 512 (which is what iviewer uses for all Images).
This might mean slightly slower export, which is OK, but then you have better NGFF exported.

I'll go ahead and make that change, unless there's any objection?

sbesson · 2023-01-25T12:29:34Z

I'll go ahead and make that change, unless there's any objection?

I think the tile size given by the tile service is retrieved from the relevant Bio-Formats reader and effectively reflects how the data is stored on disk in its original file formats.

For comparison, bioformats2raw completely ignores this value when converting the data and sets a chunk size of 1024x1024. This is largely in agreement with the proposal you are making here.

will-moore · 2023-01-26T16:28:17Z

Since we already support --tile_width and --tile_height for bioformats2raw, I've used them for cli export too.

will-moore · 2023-02-01T20:30:31Z

Running current version against https://idr.openmicroscopy.org/webclient/?show=image-9846152
Loading all tiles for resolution 0 completed after many re-runs, losing connection after ~12 Z planes each time.

Then downsampling failed at this point, during downsample from 1 -> 2:

$ ls -alh ./9846152.zarr
total 4.0K
drwxrwxr-x. 5 wmoore wmoore 68 Feb  1 20:18 .
drwxrwxr-x. 4 wmoore wmoore 53 Feb  1 06:04 ..
drwxrwxr-x. 5 wmoore wmoore 68 Feb  1 12:03 0
drwxrwxr-x. 5 wmoore wmoore 68 Feb  1 20:10 1
drwxrwxr-x. 5 wmoore wmoore 68 Feb  1 20:19 2
-rw-rw-r--. 1 wmoore wmoore 24 Feb  1 06:04 .zgroup

$ find ./9846152.zarr/0 -type f | wc -l
72619
$ find ./9846152.zarr/1 -type f | wc -l
19111
$ find ./9846152.zarr/2 -type f | wc -l

downsample stacktrace

Traceback (most recent call last):
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/bin/omero", line 11, in <module>
    sys.exit(main())
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/main.py", line 125, in main
    rv = omero.cli.argv()
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1784, in argv
    cli.invoke(args[1:])
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1222, in invoke
    stop = self.onecmd(line, previous_args)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1299, in onecmd
    self.execute(line, previous_args)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1381, in execute
    args.func(args)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero_zarr/cli.py", line 107, in _wrapper
    return func(self, *args, **kwargs)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero_zarr/cli.py", line 316, in export
    image_to_zarr(image, args)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero_zarr/raw_pixels.py", line 36, in image_to_zarr
    add_image(image, root, tile_width=tile_width, tile_height=tile_height)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero_zarr/raw_pixels.py", line 63, in add_image
    paths = add_raw_image(image, parent, level_count, tile_width, tile_height)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero_zarr/raw_pixels.py", line 164, in add_raw_image
    downsample_pyramid_on_disk(parent, paths)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero_zarr/raw_pixels.py", line 188, in downsample_pyramid_on_disk
    da.to_zarr(arr=output, url=parent.store, component=path)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/array/core.py", line 3694, in to_zarr
    return arr.store(z, lock=False, compute=compute, return_stored=return_stored)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/array/core.py", line 1767, in store
    r = store([self], [target], **kwargs)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/array/core.py", line 1235, in store
    compute_as_if_collection(Array, store_dsk, map_keys, **kwargs)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/base.py", line 341, in compute_as_if_collection
    return schedule(dsk2, keys, **kwargs)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/threaded.py", line 89, in get
    results = get_async(
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/local.py", line 511, in get_async
    raise_exception(exc, tb)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/local.py", line 319, in reraise
    raise exc
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/local.py", line 224, in execute_task
    result = _execute_task(task, data)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/array/core.py", line 4366, in store_chunk
    return load_store_chunk(x, out, index, lock, return_stored, False)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/dask/array/core.py", line 4348, in load_store_chunk
    out[index] = x
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/zarr/core.py", line 1373, in __setitem__
    self.set_basic_selection(pure_selection, value, fields=fields)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/zarr/core.py", line 1468, in set_basic_selection
    return self._set_basic_selection_nd(selection, value, fields=fields)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/zarr/core.py", line 1772, in _set_basic_selection_nd
    self._set_selection(indexer, value, fields=fields)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/zarr/core.py", line 1800, in _set_selection
    check_array_shape('value', value, sel_shape)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/zarr/util.py", line 546, in check_array_shape
    raise ValueError('parameter {!r}: expected array with shape {!r}, got {!r}'
ValueError: parameter 'value': expected array with shape (1, 1, 265, 1024), got (1, 1, 259, 1024)

imagesc-bot · 2023-02-02T12:15:46Z

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/converting-other-idr-images-into-public-zarr/44025/20

will-moore · 2023-02-02T12:41:45Z

The resize error above was fixed by ome/ome-zarr-py@e1fdd12

The branch at this point was used to generate the image at https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0048A/9846152.zarr/

Also updated description with all changes in this PR.

will-moore · 2023-02-02T14:58:27Z

The pre-commit.ci check is failing with:

              RuntimeError: The Poetry configuration is invalid:
                - [extras.pipfile_deprecated_finder.2] 'pip-shims<=0.3.4' does not match '^[a-zA-Z-_.0-9]+$'

Same error at ome/ome-zarr-py#244

https://pypi.org/project/poetry/ last release was 10th Jan 2023, but that doesn't appear to be the issue since this check has passed after that date above.

I don't see any mention of Poetry itself in this repo's config.

will-moore · 2023-02-02T15:04:16Z

Googled for https://stackoverflow.com/questions/75269700/pre-commit-fails-to-install-isort-5-11-4-with-error-runtimeerror-the-poetry-co which suggested the fix in previous commit.

joshmoore

Still could use some functional testing (to convince ourselves that no chunk could be unintentionally missed) but from reading the code:

No objection to the new strategy but the dropping of an argument will require a bigger bump since pipelines could break. Additionally we might want to say, "need to delete existing datasets if you don't want them unused"
Generally could see moving to logging rather than print everywhere but we can follow up with this since I assume there are still a good number of prints now.
I could see adding dask in to the mix in the future to download/write tiles in parallel.

joshmoore · 2023-02-07T17:51:57Z

src/omero_zarr/raw_pixels.py


+    # create "0" array if it doesn't exist
+    path = "0"


Unrelated heads up that we might want to encapsulate these names in case we ever want to move to a SHOULD recommendation for the naming of the arrays that includes a prefix.

will-moore · 2023-02-08T14:46:27Z

Re logging, I tried using logging but couldn't work out how to actually see the logging statements.

If I do:

import logging
LOGGER = logging.getLogger("omero_zarr.raw_pixels")
...
 LOGGER.warn("t, c, z, chk_x, chk_y %s %s %s %s %s" % (t, c, z, chk_y, chk_x))

I see this e.g:

WARNING:omero_zarr.raw_pixels:t, c, z, chk_x, chk_y 0 2 0 0 0

which is not ideal (I really just want t, c, z, chk_x, chk_y 0 2 0 0 0).
If I use LOGGER.info() I see nothing at-all.

joshmoore · 2023-02-10T08:52:24Z

That comes down to the config. The idea is that someone else can also influence how it appears. If you are doing this in your own code, then use logging.basicConfig and you set the format and the log-level. Just make sure that that isn't in the released code so that the CLI, e.g., can set its preferences.

will-moore · 2023-02-16T13:36:12Z

Testing the export of a Plate with this PR, I see an issue with the logic for testing whether a chunk has been previously written:

chunk_keys = list(zarray.chunk_store.keys())
separator = zarray.chunk_store.key_separator

chunk_key = separator.join([str(k) for k in key_dims])
print("chunk_key", chunk_key)
print("keys", chunk_keys)
if chunk_key not in chunk_keys:
   ... load from omero...

This is failing to detect existing chunks and prints out...

chunk_key 0/4/0/0
keys ['.zgroup', '1/.zarray', '1/0/0/0', '1/1/0/0', '1/2/0/0', '1/3/0/0', '1/4/0/0', 'L/.zgroup', 'L/9/.zattrs', 'L/9/.zgroup', 'L/9/0/.zattrs', 'L/9/0/.zgroup', 'L/9/0/0/.zarray', 'L/9/0/0/0/0/0', 'L/9/0/0/1/0/0', 'L/9/0/0/2/0/0', 'L/9/0/0/3/0/0', 'L/9/0/0/4/0/0', 'L/9/0/1/.zarray', 'L/9/0/1/0.0.0', 'L/9/0/1/1.0.0', 'L/9/0/1/2.0.0', 'L/9/0/1/3.0.0', 'L/9/0/1/4.0.0', 'L/9/0/2/.zarray', 'L/9/0/2/0.0.0', 'L/9/0/2/1.0.0', 'L/9/0/2/2.0.0', 'L/9/0/2/3.0.0', 'L/9/0/2/4.0.0', 'L/9/0/3/.zarray', 'L/9/0/3/0.0.0', 'L/9/0/3/1.0.0', 'L/9/0/3/2.0.0', 'L/9/0/3/3.0.0', 'L/9/0/3/4.0.0', 'L/9/1/.zattrs', 'L/9/1/.zgroup', 'L/9/1/0/.zarray', 'L/9/1/0/0/0/0', 'L/9/1/0/1/0/0', 'L/9/1/0/2/0/0', 'L/9/1/0/3/0/0', 'L/9/1/0/4/0/0', 'L/9/1/1/.zarray', 'L/9/1/1/0.0.0', 'L/9/1/1/1.0.0', 'L/9/1/1/2.0.0', 'L/9/1/1/3.0.0', 'L/9/1/1/4.0.0', 'L/9/1/2/.zarray', 'L/9/1/2/0.0.0', 'L/9/1/2/1.0.0', 'L/9/1/2/2.0.0', 'L/9/1/2/3.0.0', 'L/9/1/2/4.0.0', 'L/9/1/3/.zarray', 'L/9/1/3/0.0.0', 'L/9/1/3/1.0.0', 'L/9/1/3/2.0.0', 'L/9/1/3/3.0.0', 'L/9/1/3/4.0.0', 'L/9/2/.zattrs', 'L/9/2/.zgroup', 'L/9/2/0/.zarray', 'L/9/2/0/0/0/0', 'L/9/2/0/1/0/0', 'L/9/2/0/2/0/0', 'L/9/2/0/3/0/0', 'L/9/2/0/4/0/0', 'L/9/2/1/.zarray', 'L/9/2/1/0.0.0', 'L/9/2/1/1.0.0', 'L/9/2/1/2.0.0', 'L/9/2/1/3.0.0', 'L/9/2/1/4.0.0', 'L/9/2/2/.zarray', 'L/9/2/2/0.0.0', 'L/9/2/2/1.0.0', 'L/9/2/2/2.0.0', 'L/9/2/2/3.0.0', 'L/9/2/2/4.0.0', 'L/9/2/3/.zarray', 'L/9/2/3/0.0.0', 'L/9/2/3/1.0.0', 'L/9/2/3/2.0.0', 'L/9/2/3/3.0.0', 'L/9/2/3/4.0.0', 'L/9/3/.zattrs', 'L/9/3/.zgroup', 'L/9/3/0/.zarray', 'L/9/3/0/0/0/0', 'L/9/3/0/1/0/0', 'L/9/3/0/2/0/0', 'L/9/3/0/3/0/0', 'L/9/3/0/4/0/0', 'L/9/3/1/.zarray', 'L/9/3/1/0.0.0', 'L/9/3/1/1.0.0', 'L/9/3/1/2.0.0', 'L/9/3/1/3.0.0', 'L/9/3/1/4.0.0', 'L/9/3/2/.zarray', 'L/9/3/2/0.0.0', 'L/9/3/2/1.0.0', 'L/9/3/2/2.0.0', 'L/9/3/2/3.0.0', 'L/9/3/2/4.0.0', 'L/9/3/3/.zarray', 'L/9/3/3/0.0.0', 'L/9/3/3/1.0.0', 'L/9/3/3/2.0.0', 'L/9/3/3/3.0.0', 'L/9/3/3/4.0.0', 'L/9/4/.zgroup', 'L/9/4/0/.zarray']

It looks like a different path separator is being used for the path to the Image (/) and for the Image dims (.). Trying to match this behaviour - and have it behave differently for Images vv Plate - seems too fragile.
So I think I'll revert to my previous approach (changed in 6af4061) to check for existing chunks

Much easier to judge export progress if Wells are ordered

src/omero_zarr/raw_pixels.py

joshmoore

In general, looking good. I do note the lack of automated testing around this stack that we probably need to spend a bit of time on.

joshmoore · 2023-02-20T10:05:55Z

src/omero_zarr/cli.py

@@ -126,12 +126,6 @@ def _configure(self, parser: Parser) -> None:
            "--output", type=str, default="", help="The output directory"
        )

-        parser.add_argument(
-            "--cache_numpy",


I assume this takes us to 0.6.0 unless we want to tolerate it being set and print a warning message.

I think 0.6.0 makes sense, since we also change the default chunking behaviour

src/omero_zarr/raw_pixels.py

joshmoore · 2023-03-09T16:50:21Z

👍 Merging.

Anything else to roll into an 0.6.0, @will-moore? @dominikl?

will-moore added 3 commits January 22, 2023 21:53

Initial big-image export working

e898da1

dask resize to generate pyramid from level 0

357eb30

downsample resolution '0' to create pyramid

64e98ab

joshmoore reviewed Jan 24, 2023

View reviewed changes

Use only tile-based export for all images

10828d2

Fix passing of cache_file_name_func only when needed

df18685

will-moore added 2 commits January 25, 2023 11:52

Use tile size of 512 x 512

c99f287

Print dims in tczyx order, and iterate in that order

2089f7f

Use --tile_width and --tile_height cli args

005c78c

will-moore force-pushed the big_images branch from 915b059 to 005c78c Compare January 26, 2023 16:15

Add tile_width & tile_height to README

8715198

will-moore added 3 commits January 31, 2023 16:49

Use / as dimension separator for numpy cache

164812d

Use existing zarr array instead of numpy_cache

6b1caa7

Use chunk_store.keys() to check for existing chunks

6af4061

will-moore mentioned this pull request Feb 2, 2023

Add 2 images from idr0048 IDR/ome-ngff-samples#19

Merged

Upgrade isort to 5.12.0 in .pre-commit-config.yaml

6310722

joshmoore reviewed Feb 7, 2023

View reviewed changes

Use existing_data.max() instead of chunk_keys

26fc244

will-moore added 2 commits February 16, 2023 13:58

Fix downsample_pyramid_on_disk for plates - ignore existing arrays

b626483

Sort wells for plate export

30be5c5

Much easier to judge export progress if Wells are ordered

joshmoore reviewed Feb 20, 2023

View reviewed changes

src/omero_zarr/raw_pixels.py Outdated Show resolved Hide resolved

joshmoore reviewed Feb 20, 2023

View reviewed changes

Sort Wells by (row, column) tuple

cdc22fd

joshmoore mentioned this pull request Mar 9, 2023

Replace print statements with logging #138

Open

joshmoore merged commit dcdbc7e into ome:master Mar 9, 2023

will-moore mentioned this pull request May 19, 2023

Use correct dimension_separator when downsampling #144

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big images #134

Big images #134

will-moore commented Jan 23, 2023 •

edited

joshmoore left a comment

joshmoore Jan 24, 2023

will-moore Jan 24, 2023

will-moore commented Jan 24, 2023

will-moore commented Jan 25, 2023

sbesson commented Jan 25, 2023

will-moore commented Jan 26, 2023

will-moore commented Feb 1, 2023 •

edited

imagesc-bot commented Feb 2, 2023

will-moore commented Feb 2, 2023

will-moore commented Feb 2, 2023

will-moore commented Feb 2, 2023

joshmoore left a comment

joshmoore Feb 7, 2023

will-moore commented Feb 8, 2023

joshmoore commented Feb 10, 2023

will-moore commented Feb 16, 2023

joshmoore left a comment

joshmoore Feb 20, 2023

will-moore Mar 9, 2023

joshmoore commented Mar 9, 2023

Big images #134

Big images #134

Conversation

will-moore commented Jan 23, 2023 • edited

joshmoore left a comment

Choose a reason for hiding this comment

joshmoore Jan 24, 2023

Choose a reason for hiding this comment

will-moore Jan 24, 2023

Choose a reason for hiding this comment

will-moore commented Jan 24, 2023

will-moore commented Jan 25, 2023

sbesson commented Jan 25, 2023

will-moore commented Jan 26, 2023

will-moore commented Feb 1, 2023 • edited

imagesc-bot commented Feb 2, 2023

will-moore commented Feb 2, 2023

will-moore commented Feb 2, 2023

will-moore commented Feb 2, 2023

joshmoore left a comment

Choose a reason for hiding this comment

joshmoore Feb 7, 2023

Choose a reason for hiding this comment

will-moore commented Feb 8, 2023

joshmoore commented Feb 10, 2023

will-moore commented Feb 16, 2023

joshmoore left a comment

Choose a reason for hiding this comment

joshmoore Feb 20, 2023

Choose a reason for hiding this comment

will-moore Mar 9, 2023

Choose a reason for hiding this comment

joshmoore commented Mar 9, 2023

will-moore commented Jan 23, 2023 •

edited

will-moore commented Feb 1, 2023 •

edited