Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_rolling_wrapped_dask is failing with dask-master #2940

Closed
shoyer opened this issue May 3, 2019 · 5 comments · Fixed by #3040
Closed

test_rolling_wrapped_dask is failing with dask-master #2940

shoyer opened this issue May 3, 2019 · 5 comments · Fixed by #3040
Labels

Comments

@shoyer
Copy link
Member

shoyer commented May 3, 2019

The test_rolling_wrapped_dask tests in test_dataarray.py are failing with dask master, e.g., as seen here:
https://travis-ci.org/pydata/xarray/jobs/527936531

I reproduced this locally. git bisect identified the culprit as dask/dask#4756.

The source of this issue on the xarray side appears to be these lines:

if isinstance(padded.data, dask_array_type):
values = dask_rolling_wrapper(func, padded,
window=self.window,
min_count=min_count,
axis=axis)

In particular, we are currently padded as an xarray.DataArray object, not a dask array. Changing this to padded.data shows that passing an actual dask array to dask_array_ops.rolling_window results in failing tests.

@fujiisoup @jhamman any idea what's going on here?

@shoyer shoyer added the bug label May 4, 2019
@fujiisoup
Copy link
Member

@shoyer

I have no idea yet...
I will look inside later today.

@fujiisoup
Copy link
Member

Sent a fix

@shoyer
Copy link
Member Author

shoyer commented Jun 20, 2019

Looking into this in more detail, the fix looks somewhat non-trivial. For example, we definitely are not padding by enough inside dask_rolling_wrapper if center=False:

depth[axis] = (window + 1) // 2

I'm not sure if this was ever tested properly, at least for dask arrays with multiple chunks. My guess is that this previously worked by consolidating dask arrays into a single chunk, which would simply fail for arrays that are larger than fit into memory.

For now, I think it would be safest to issue a new release that simply disables rolling() methods on dask arrays (by raising an error), and to save a full fix for later. I am concerned about letting the current behavior stick around, which silently calculates the wrong result.

@zbarry
Copy link

zbarry commented Jun 20, 2019

@shoyer - yeah, I'd tend to agree. I had to do a bit of digging when I encountered this problem, because I was getting serialization errors (since it was doing away with the chunks on massive arrays) that I then eventually connected to this rolling problem. An explicit message that says "can't do this yet" would be good in light of this kind of situation.

@shoyer
Copy link
Member Author

shoyer commented Jun 27, 2019

#3040 is my preferred fix for the rolling window issue. I plan to merge it sometime tomorrow unless I get any comments....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants