Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimension order produced by Rechunk is opaque and not controllable; mismatch can cause errors from subsequent ChunksToZarr. #94

Open
mjwillson opened this issue Feb 27, 2024 · 0 comments

Comments

@mjwillson
Copy link

In a pipeline in which Rechunk is followed by ChunksToZarr, one can run into errors when the dimension order of variables output by Rechunk doesn't match that of the template you pass to ChunksToZarr, resulting in errors like:

ValueError: variable 'geopotential_quantiles' already exists with different dimension names ('hour', 'dayofyear', 'level', 'latitude', 'longitude', 'quantile') != ('level', 'hour', 'dayofyear', 'latitude', 'longitude', 'quantile'), but changing variable dimensions is not supported by to_zarr().

As far as I can tell Rechunk doesn't allow you to control the output dimension order (at least, not on a per-variable basis, which may be necessary to match a given template). An alternative could be to transpose the output template instead to match whatever Rechunk is going to produce, but it's hard to know what that's going to be as well.

As another way around this, it'd be nice if ChunksToZarr could just do the transpose rather than complain if it finds this kind of dimension mismatch (same dimensions in a different order).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant