Skip to content

Conversation

thuydotm
Copy link
Contributor

@thuydotm thuydotm commented Nov 10, 2021

This PR uses the same approach as #568 to improve performance for zonal stats when input data arrays are dask-backed. It computes stats chunk by chunk and then summarizes all the results and return output as a dask DataFrame.

This also limits stats that supported in dask case to a subset of default stats, which is safer since a custom statistics would not be always element-wise thus can produce unexpected results.

nodata_zones is removed as we already support zone_ids, and exclude invalid values (nan, inf) from our calculations.

@thuydotm thuydotm requested a review from ianthomas23 November 15, 2021 08:31
@thuydotm thuydotm added the ready to merge PR is ready to merge label Nov 15, 2021
@ianthomas23
Copy link
Contributor

Just a few minor comments, otherwise it looks good to merge.

@thuydotm
Copy link
Contributor Author

Thanks Ian, I just updated the code. I'll merge into master once the tests all passed.

@thuydotm thuydotm merged commit 9d2ee7c into master Nov 16, 2021
@thuydotm thuydotm deleted the zonal_stats_dask_speedup branch December 23, 2021 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge PR is ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants