Skip to content

ome zarr chunks#1092

Open
ArneDefauw wants to merge 8 commits intoscverse:mainfrom
ArneDefauw:fix/ome_zarr_chunks
Open

ome zarr chunks#1092
ArneDefauw wants to merge 8 commits intoscverse:mainfrom
ArneDefauw:fix/ome_zarr_chunks

Conversation

@ArneDefauw
Copy link
Contributor

Fix for #1090.

Note that unit tests still fail for ome-zarr ==0.14.0, due to #1091

@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

❌ Patch coverage is 74.54545% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.85%. Comparing base (6a3eef7) to head (2450bd4).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata/_io/io_raster.py 74.07% 14 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1092      +/-   ##
==========================================
- Coverage   91.96%   91.85%   -0.11%     
==========================================
  Files          51       51              
  Lines        7729     7772      +43     
==========================================
+ Hits         7108     7139      +31     
- Misses        621      633      +12     
Files with missing lines Coverage Δ
src/spatialdata/__init__.py 95.65% <100.00%> (-0.51%) ⬇️
src/spatialdata/_io/io_raster.py 88.70% <74.07%> (-5.20%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ArneDefauw ArneDefauw marked this pull request as ready for review March 12, 2026 09:04
@ArneDefauw
Copy link
Contributor Author

also added fix for #1091.

On linux, unit tests are failing, I will try to reproduce it

@ArneDefauw
Copy link
Contributor Author

ArneDefauw commented Mar 12, 2026

also added fix for #1091.

On linux, unit tests are failing, I will try to reproduce it

Used S0 instead of s0, for pyramid scale 0, therefore unit tests were failing on linux

Copy link
Member

@LucaMarconato LucaMarconato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I would make some changes as described.

Comment on lines +93 to +103
def _prepare_single_scale_storage_options(
storage_options: JSONDict | list[JSONDict] | None,
) -> JSONDict | list[JSONDict] | None:
if storage_options is None:
return None
if isinstance(storage_options, dict):
prepared = dict(storage_options)
if "chunks" in prepared:
prepared["chunks"] = _normalize_explicit_chunks(prepared["chunks"])
return prepared
return [dict(options) for options in storage_options]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function behaves like _prepare_multiscale_storage_options(), without normalizing the list of storage options case. Can we remove it and just use _prepare_multiscale_storage_options() (after renaming it to _prepare_storage_options()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I unified the two functions in 20696b5. Happy to hear what you think (in case we need two, we can revert, but I think we can proceed with one function).

Comment on lines +43 to +44
if isinstance(value, str | bytes):
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Please either remove or document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed here: 2450bd4

Comment on lines +51 to +52
if isinstance(value, str | bytes):
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed here: 2450bd4

Comment on lines +58 to +67
def _is_regular_dask_chunk_grid(chunk_grid: Sequence[Sequence[int]]) -> bool:
# Match Dask's private _check_regular_chunks() logic without depending on its internal API.
for axis_chunks in chunk_grid:
if len(axis_chunks) <= 1:
continue
if len(set(axis_chunks[:-1])) > 1:
return False
if axis_chunks[-1] > axis_chunks[0]:
return False
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a docstring with examples (or examples in-line with the code) to show what fails and what not.

I would add the following:
triggers the continue in the first if:

  • [(4,)]
  • [()]

triggers the first return False

  • [(4, 4, 3, 4)]

triggers the second return False

  • [(4, 4, 4, 5)]

exits with the last return True

  • [(4, 4, 4, 4)], succeeds, all chunks equal
  • [(4, 4, 4, 1)], succeeds, final chunk is < of the initial one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: I would add all the examples above in a test, for the function _is_regular_dask_chunk_grid().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added docstring and tests here: 2450bd4

Comment on lines +629 to +634
def test_write_irregular_dask_chunks_without_explicit_storage_options(tmp_path: Path) -> None:
data = da.from_array(RNG.random((3, 800, 1000)), chunks=((3,), (300, 200, 300), (512, 488)))
image = Image2DModel.parse(data, dims=("c", "y", "x"))
sdata = SpatialData(images={"image": image})

sdata.write(tmp_path / "data.zarr")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find it more natural if this test was failing. Now that chunks = raster_data.chunks has been removed it, writing doesn't fail, but it ignores the chunks in the data. I think a natural behavior is that if storage option specifies chunks, these are used, otherwise the ones from the data (and if the `chunks from the data are irregular and no storage options are specified, an error would be raised).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now implemented this in 20696b5 changing the test so that it expects to fail.

@LucaMarconato
Copy link
Member

LucaMarconato commented Mar 19, 2026

I'll go ahead and implement a fix for the code review points.

@LucaMarconato
Copy link
Member

LucaMarconato commented Mar 19, 2026

@LucaMarconato
Copy link
Member

I implemented the changes mentioned in the code review. Please let me know if you agree with the changes. If yes I'll merge and work on a release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants