Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose zarr caching from xarray #2812

Open
rabernat opened this issue Mar 14, 2019 · 12 comments · May be fixed by #2813 or #2814
Open

expose zarr caching from xarray #2812

rabernat opened this issue Mar 14, 2019 · 12 comments · May be fixed by #2813 or #2814
Labels
topic-documentation topic-zarr Related to zarr storage library

Comments

@rabernat
Copy link
Contributor

Zarr has its own internal mechanism for caching, described here:

However, this capability is currently inaccessible from xarray.

I propose to add a new keyword cache=True/False to open_zarr which wraps the store in an LRUStoreCache.

@rabernat
Copy link
Contributor Author

Or should we use xarray's own caching mechanism?

@rabernat
Copy link
Contributor Author

I have created two PRs which attempt to provide zarr caching in different ways. I would welcome some advice on which one is a better approach.

@dcherian dcherian added the topic-zarr Related to zarr storage library label Sep 7, 2019
@tasansal
Copy link

Hi @rabernat, I looked at your PRs, and they seem to haven't gotten much attention.

I tried using a store with LRUCache in open_zarr, but it appears to ignore the cache.

For our use cases in https://github.com/TGSAI/mdio-python, we usually want to use any form of LRUCache (it doesn't have to be Zarr's necessarily).

  • Do you know of a hack to make this work?
  • What can we do to help and start working on this?

@rabernat
Copy link
Contributor Author

I have successfully used the Zarr LRU cache with Xarray. You just have to initialize the Store object outside of Xarray and then pass it to open_zarr or open_dataset(store, engine="zarr").

Have you tried that?

@dcherian
Copy link
Contributor

You just have to initialize the Store object outside of Xarray and then pass it to open_zarr or open_dataset(store, engine="zarr").

This would be good to document!

@tasansal
Copy link

tasansal commented Sep 13, 2022

@rabernat, yes, I have tried that like this:

from zarr.storage import FSStore, LRUStoreCache
import xarray as xr

path = "gs://prefix/object.zarr"

store_nocache = FSStore(path)
store_cached = LRUStoreCache(store_nocache, max_size=2**30)

ds = xr.open_zarr(store_cached)

When I read the same data twice, it still downloads. Am I doing something wrong?

While I wait for a response, I will try it again and update if it works, but the last time I checked, it didn't.

Note to self: I also need to check it with Zarr backend and Dask backend.

@tasansal
Copy link

@rabernat

Following up on the previous, yes it does work with the Zarr backend! I agree with @dcherian, we should add this to the docs.

However, the behavior in Dask is strange. I think it is making each worker have its own cache and blowing up memory if I ask for a large cache.

@dcherian
Copy link
Contributor

@tasansal a PR would be very welcome!

@rabernat
Copy link
Contributor Author

Glad you got it working! So you're saying it does not work with open_zarr and does work with open_dataset(...engine='zarr')? Weird. We should deprecate open_zarr.

However, the behavior in Dask is strange. I think it is making each worker have its own cache and blowing up memory if I ask for a large cache.

Yes, I think I experienced that as well. I think the entire cache is serialized and passed around between workers.

@tasansal
Copy link

I couldn't get open_zarr to open without Daskifying arrays. open_dataset(..., engine="zarr") does open without Daskifying when you haven't passed chunks.

@tasansal
Copy link

@dcherian, I will start a PR. Where do you think this belongs in the docs? Some places I can think of:

@dcherian
Copy link
Contributor

docs.xarray.dev/en/stable/user-guide/io.html seems great to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-documentation topic-zarr Related to zarr storage library
Projects
None yet
3 participants