You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a way to do reductions over chunked data? I am working with large dataset and cannot afford to re-chunk along time dimension. Something similar like reduce function, but over a dataset applying a function along multiple data variables at the same time. There is a beautiful example where the current map_blocks falls short in this discussion #5774:
importxarrayasxrimportnumpyasnpnt=4da=xr.DataArray(np.arange(nt), coords={"t": np.arange(nt)}, dims=["t"])
defgetsum_da(da, sumdims):
sumda=da.sum(dim=sumdims, skipna=True)
returnsumdada.sum(dim="t").compute() # prints 6 = 0 + 1 + 2 + 3result=xr.map_blocks(
getsum_da, da.chunk(chunks={"t": -1}), args=["t"]
) # no chunking along summation indexprint(result.compute()) # prints 6result=xr.map_blocks(
getsum_da, da.chunk(chunks={"t": 1}), args=["t"]
) # with chunking along summation indexprint(result.compute()) # prints 3, the value for the last of 4 chunks
I was hopping this would sum the histograms over time. But in reality this simply return the last chunk's bins. Would be nice to have function like map_blocks_reduce where a global object would accumulate the result.
Thanks!
The text was updated successfully, but these errors were encountered:
Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!
What is your issue?
Is there a way to do reductions over chunked data? I am working with large dataset and cannot afford to re-chunk along time dimension. Something similar like
reduce
function, but over a dataset applying a function along multiple data variables at the same time. There is a beautiful example where the currentmap_blocks
falls short in this discussion #5774:In my case I'm doing more intricate stuff:
I was hopping this would sum the histograms over time. But in reality this simply return the last chunk's bins. Would be nice to have function like
map_blocks_reduce
where a global object would accumulate the result.Thanks!
The text was updated successfully, but these errors were encountered: