Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmony 992 #11

Merged
merged 11 commits into from
Oct 15, 2021
Merged

Harmony 992 #11

merged 11 commits into from
Oct 15, 2021

Conversation

hailiangzhang
Copy link
Contributor

This PR will introduce rechucking during netcdf-to-zarr step.

The destination chunk size will be the multiplier of the original chunk size closest to 3000, or 3000 if the original chunk size is more than that.

If I don't explain it well, the formula is here from convert.py:

116     new_chunks = map(
117         lambda x: min(x[0], int(3000 / x[1]) * x[1] if x[1] < 3000 else 3000),
118         zip(shape, chunks),
119     )

Basically the idea is to make the destination chunk size to be "aligned" with the original chunk size (for performance reason), and 3000 is a number to avoid memory issue with the current netcdf-to-zarr container size.

We may not see too much performance improvement here, which will be the focus of the subsequent related stories.

@hailiangzhang hailiangzhang merged commit 4040407 into main Oct 15, 2021
@owenlittlejohns owenlittlejohns deleted the HARMONY-992 branch March 28, 2022 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants