Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_dask() not lazy when simplecache:: in urlpath #73

Open
aaronspring opened this issue Aug 3, 2020 · 1 comment
Open

to_dask() not lazy when simplecache:: in urlpath #73

aaronspring opened this issue Aug 3, 2020 · 1 comment

Comments

@aaronspring
Copy link
Collaborator

when loading to_dask with caching as in pangeo-data/pangeo-datastore#113, fsspec.open_local first loads the whole dataset and then opens the data in xarray, still with chunks but after having spend the time on downloading.

is there a way to circumvent this in intake-xarray or is this a consequence from fsspec caching that cannot be changed for intake-xarray?

it would be great to just do to_dask() without spending the time to download and only cache when xarray runs compute.

@martindurant
Copy link
Member

Whilst this may be possible, it would be tricky. Dask wants to open the file to assess the chunking; it could be done on the original file, but only cache it when actually loading, in theory. There is a block-wise cacher in fsspec, which only downloads the parts of a file that are accessed, as they are accessed, but that only works with a library expecting to work with python file-like objects (i.e., there's a reason to call open_local: the library wants a real local file). You could do something with FUSE, where the file looks real to the OS, but uses block-wise chunking internally - this kind of thing I'm pretty sure has never been tried.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants