`xr.open_zarr` is 3x slower than `zarr.open`, even at scale #9111

max-sixty · 2024-06-12T22:51:04Z

What is your issue?

I'm doing some benchmarks on Xarray + Zarr vs. some other formats, and I get quite a surprising result — in a very simple array, xarray is adding a lot of overhead to reading a Zarr array.

Here's a quick script — super simple, just a single chunk. It's 800MB of data — so not some tiny array where reading a metadata json file or allocating an index is going to throw the results.

import numpy as np
import zarr
import xarray as xr
import dask
print(zarr.__version__, xr.__version__, dask.__version__)

(
    xr.DataArray(np.random.rand(10000, 10000), name="foo")
    .to_dataset()
    .chunk(None)
    .to_zarr("test.zarr", mode="w")
)

%timeit xr.open_zarr("test.zarr").compute()
%timeit zarr.open("test.zarr")["foo"][:]

2.17.2 2024.5.1.dev37+gce196d56 2024.5.2
551 ms ± 15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
183 ms ± 2.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So:

551ms for xarray
183ms for zarr

Having a quick look with py-spy suggests there might be some thread contention, but not sure how much is really contention vs. idle threads waiting.

Making the array 10x bigger (with 10 chunks) reduces the relative difference, but it's still fairly large:

2.17.2 2024.5.1.dev37+gce196d56 2024.5.2
6.88 s ± 353 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.15 s ± 264 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Any thoughts on what might be happening? Is the benchmark at least correct?

The text was updated successfully, but these errors were encountered:

jhamman · 2024-06-12T23:35:33Z

Thanks for opening this issue @max-sixty. Would you mind splitting the timing into a the following two steps:

%timeit ds = xr.open_zarr("test.zarr")["foo"]
%timeit ds.compute()
%timeit arr = zarr.open("test.zarr")["foo"]
%timeit arr[:]

I would like to know if the difference is in the metadata loading step or in the fetching/decoding of chunks. I may also suggest avoiding dask chunks in the xarray case just to narrow things down.

max-sixty · 2024-06-13T00:21:45Z

OK, no great updates with those unfortunately.

Slightly adjusting to:

da = xr.open_zarr("test.zarr")["foo"]
arr = zarr.open("test.zarr")["foo"]

%timeit xr.open_zarr("test.zarr")["foo"]
%timeit da.compute()
%timeit zarr.open("test.zarr")["foo"]
%timeit arr[:]

...shows there's basically nothing in the metadata step — the array is hopefully big enough to amortize that:

2.17.2 2024.5.1.dev37+gce196d56 2024.5.2
402 µs ± 9.98 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
575 ms ± 22.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
89.5 µs ± 1.72 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
191 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I may also suggest avoiding dask chunks in the xarray case just to narrow things down.

If we adjust to avoiding dask + chunks, we get the same result (assuming this is what you meant, lmk if not, probably this isn't the simplest way?):

da = xr.DataArray(np.random.rand(10000, 10000), name="foo")
da.encoding = dict(
    chunks=(10000, 10000), preferred_chunks={"dim_0": 10000, "dim_1": 10000}
)
da.to_dataset().to_zarr("test.zarr", mode="w")

%timeit xr.open_zarr("test.zarr")["foo"].compute()
%timeit zarr.open("test.zarr")["foo"][:]

2.17.2 2024.5.1.dev37+gce196d56 2024.5.2
583 ms ± 23.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
198 ms ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

jhamman · 2024-06-13T00:53:00Z

Curious, I'm seeing quite similar (comparable between xarray and zarr) results but with slightly different versions. I'll update this post when I've used your exact set of versions. UPDATE: I've run this with a bunch of xarray/zarr/dask versions and I'm not seeing the differences you were seeing above 🤷 .

da = xr.open_zarr("test.zarr", chunks=None)["foo"]  # no dask, just a lazy backend array
arr = zarr.open("test.zarr")["foo"]

%timeit xr.open_zarr("test.zarr")["foo"]
%timeit da.compute()
%timeit zarr.open("test.zarr")["foo"]
%timeit arr[:]

2.18.0 2024.5.0 2024.5.1
1.01 ms ± 7.07 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
704 ms ± 8.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
129 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
705 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.17.2 2024.5.0 2024.5.1
1.01 ms ± 18.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
754 ms ± 117 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
130 µs ± 557 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
702 ms ± 5.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.17.2 2024.5.1.dev37+gce196d56 2024.5.1
971 µs ± 9.56 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
698 ms ± 4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
129 µs ± 894 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
700 ms ± 7.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2.17.2 2024.5.1.dev37+gce196d56 2024.5.2
971 µs ± 8.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
700 ms ± 8.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
136 µs ± 14.3 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
704 ms ± 6.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

max-sixty · 2024-06-13T01:08:59Z

OK so it depends on how we do the chunks:

import numpy as np
import zarr
import xarray as xr
import dask

print(zarr.__version__, xr.__version__, dask.__version__)

da = xr.DataArray(np.random.rand(10000, 10000), name="foo")
da.to_dataset().to_zarr("default-chunks.zarr", mode="w")
da.to_dataset().chunk(None).to_zarr("no-chunks.zarr", mode="w")

for chunk_type in ["no-chunks", "default-chunks"]:
    print(f"\n# {chunk_type}")

    print("zarr read")
    %timeit zarr.open(f"{chunk_type}.zarr")["foo"][:]

    print("xarray default read")
    %timeit xr.open_zarr(f"{chunk_type}.zarr")["foo"].compute()

    print("xarray chunk=None read")
    %timeit xr.open_zarr(f"{chunk_type}.zarr", chunks=None)["foo"].compute()

2.18.2 2024.5.1.dev37+gce196d56 2024.5.2

# no-chunks
zarr read
225 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
xarray default read
568 ms ± 19.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
xarray chunks=None read
183 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# default-chunks
zarr read
729 ms ± 39.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
xarray default read
357 ms ± 13.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
xarray chunks=None read
769 ms ± 34.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Conclusion:

If write without chunks and read with chunks=None then xarray is fast — that's the thing to do in my benchmark
If write without chunks and read with the default, then xarray is much slower — that's the big anomaly here — 568ms vs. 225ms

jhamman · 2024-06-13T01:19:52Z

This seems like part of the explanation then: When you don't specify chunks w/ zarr, you get default chunking (512 fairly small chunks in this case).

Is it possible that your xarray default read for no-chunks is paying an overhead for using Dask but getting no parallelism to benefit because there is only one chunk to process?

max-sixty · 2024-06-13T02:27:19Z

Is it possible that your xarray default read for no-chunks is paying an overhead for using Dask but getting no parallelism to benefit because there is only one chunk to process?

Yes, this very likely. But this seems like a lot of overhead — ~2x even on a huge array. Here's an array 10x the size with 10 chunks; it's 6.25s vs 3.12s. And this is with 10 chunks, so it should get some credit for the parallelization.

import numpy as np
import zarr
import xarray as xr
import dask
print(zarr.__version__, xr.__version__, dask.__version__)

(
    xr.DataArray(np.random.rand(100000, 10000), name="foo")
    .to_dataset()
    .chunk(10000)
    .to_zarr("test.zarr", mode="w")
)

%timeit -n 1 -r 1 xr.open_zarr("test.zarr", chunks={}).compute()
%timeit -n 1 -r 1 xr.open_zarr("test.zarr", chunks=None).compute()
%timeit -n 1 -r 1 xr.open_zarr("test.zarr").compute()
%timeit -n 1 -r 1 zarr.open("test.zarr")["foo"][:]

2.18.2 2024.5.1.dev37+gce196d56 2024.5.2
7.72 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
3.36 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
6.25 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
3.12 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

...that said, possibly this is a dask issue?

Makes them more structure, consistent. I think removes a mistake re the default chunks arg in `open_zarr` (it's not `None`, it's `auto`). Adds a comment re performance with `chunks=None`, closing pydata#9111

* Improve zarr chunks docs Makes them more structure, consistent. I think removes a mistake re the default chunks arg in `open_zarr` (it's not `None`, it's `auto`). Adds a comment re performance with `chunks=None`, closing #9111

max-sixty · 2024-06-26T18:20:05Z

Updated docs so will close

max-sixty added needs triage Issue that has not been reviewed by xarray team member topic-performance topic-zarr Related to zarr storage library labels Jun 12, 2024

max-sixty removed the needs triage Issue that has not been reviewed by xarray team member label Jun 13, 2024

max-sixty mentioned this issue Jun 19, 2024

Improve zarr chunks docs #9140

Merged

max-sixty closed this as completed Jun 26, 2024

kacpnowak mentioned this issue Sep 9, 2024

move to Dask array + first working version of the multiformer clessig/atmorep#37

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`xr.open_zarr` is 3x slower than `zarr.open`, even at scale #9111

`xr.open_zarr` is 3x slower than `zarr.open`, even at scale #9111

max-sixty commented Jun 12, 2024 •

edited

Loading

jhamman commented Jun 12, 2024

max-sixty commented Jun 13, 2024 •

edited

Loading

jhamman commented Jun 13, 2024 •

edited

Loading

max-sixty commented Jun 13, 2024 •

edited

Loading

jhamman commented Jun 13, 2024

max-sixty commented Jun 13, 2024 •

edited

Loading

max-sixty commented Jun 26, 2024

xr.open_zarr is 3x slower than zarr.open, even at scale #9111

xr.open_zarr is 3x slower than zarr.open, even at scale #9111

Comments

max-sixty commented Jun 12, 2024 • edited Loading

What is your issue?

jhamman commented Jun 12, 2024

max-sixty commented Jun 13, 2024 • edited Loading

jhamman commented Jun 13, 2024 • edited Loading

max-sixty commented Jun 13, 2024 • edited Loading

jhamman commented Jun 13, 2024

max-sixty commented Jun 13, 2024 • edited Loading

max-sixty commented Jun 26, 2024

`xr.open_zarr` is 3x slower than `zarr.open`, even at scale #9111

`xr.open_zarr` is 3x slower than `zarr.open`, even at scale #9111

max-sixty commented Jun 12, 2024 •

edited

Loading

max-sixty commented Jun 13, 2024 •

edited

Loading

jhamman commented Jun 13, 2024 •

edited

Loading

max-sixty commented Jun 13, 2024 •

edited

Loading

max-sixty commented Jun 13, 2024 •

edited

Loading