Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control chunking of non-spatial dimensions #81

Closed
maawoo opened this issue Jul 27, 2022 · 3 comments · Fixed by #126
Closed

Control chunking of non-spatial dimensions #81

maawoo opened this issue Jul 27, 2022 · 3 comments · Fixed by #126

Comments

@maawoo
Copy link

maawoo commented Jul 27, 2022

Hi all,
as far as I can see it is currently not possible to control the chunking of non-spatial dimensions when loading a dataset, right?
So in order to have chunks that, for example, extend through the time dimension, I'd need an extra step like this:

ds = odc.stac.load(items=items, bands=bands, chunks={'x': 2048, 'y': 2048})
ds = ds.chunk(chunks={'time':-1})

I've used ODC in the past and expected chunks={'x': 2048, 'y': 2048, 'time':-1} to work. If this is only going to be added sometime in the future, I think there should be a note in the documentation that currently only chunking of spatial dimensions is supported.

@Kirill888
Copy link
Member

Kirill888 commented Jul 28, 2022

Your assessment is correct, currently temporal dimension always has chunk size of 1. Pretty sure in datacube it also starts with time=1 and then re-chunks just before returning, rather than actually populating multiple time-slices in one Dask task.

This looks like omission to me, I think proper support for loading multiple temporal slices in one Dask task is a useful thing to have as it can reduce Dask graph size and overhead by a lot. It's trivial enough to detect time chunk requests and do rechunk after construction but I think proper implementation is not too hard either...

@woodcockr
Copy link
Member

@SpacemanPaul and I have this refactor on the radar for datacube-core so it can better handle temporal chunking in a dask context and exploit any temporal chunking in the storage. make this comment because datacube was mentioned, not because it relates to development in odc-stac.

@Kirill888
Copy link
Member

Kirill888 commented Jul 28, 2022

exploit any temporal chunking in the storage.

I wonder how well "multi-band" reads are supported in gdal/rasterio now (netcdf timeslices are treated as "bands").

Probably best to use native libs for things like netcdf, zarr, tiledb rather than jamming it through gdal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants