-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very large Dask graph size with high resolution grids #127
Comments
Thanks @ScottWales for the detailed report and examples. @aulemahal Any clue on what could be done about this ? |
@ScottWales thanks for the detailed issue, this is definitely problematic and we need to look into this asap! |
Here is a breakdown of the dask tasks:
and the My - limited - understanding is that dask chunks the sparse weights matrix as if it was dense. |
Argh... Sorry guys! Ok so I did see that problem in the original PR about @ScottWales : the workaround is to config dask to accept large chunks. Since the chunks are sparse, there is no memory explosion. Ex: from dask import config
config.set({'array.chunk-size': '500GiB'}) I am currently digging dask's code and doc. I found the following: The weights are transformed to a dask array in There, dask doesn't recognize the sparse array as something special and converts it to dask using the same as it was a numpy array. THUS, the size is computed as (It didn't happen before because we where passing the weights as a The solution is to have dask measure the size of the array from the number of non-zero elements, not from its shape. I see a few ways forward:
I personally believe (2) is the cleanest way, but we have to ask people at |
@aulemahal since the issue seems to be coming from the transition from |
If it's simpler like that. I think it makes no sense to use Going back to |
If you're going down the sparse route, I would try using That said, this issue is pretty clearly a dask bug when it autochunks inputs to |
@dcherian thanks very much for this valuable feedback, @aulemahal do you have time to look into this? |
@dcherian @raphaeldussin This is indeed very promising! I am currently testing this new approach and am able to remove the need to call I am continuing my tests, but this is interesting! I expect a larger task graph than before (0.6.0) for DataArrays since |
Regridding from 0.1 to 0.25 degree global grids is creating a very large task graph with xesmf 0.6.1, causing problems when evaluating the result.
Sample code:
Evaluated versions at https://nbviewer.org/gist/ScottWales/4fe0e9a5725b5a3bf07ab94e80778846
With 0.6.0, the regridded data has 240 Dask tasks. With 0.6.1 the number has exploded to 409,841 tasks, and appears to scale with the grid resolution - low resolution regrid task graphs have a reasonable size.
Expected behaviour is that the Dask task graph created by xesmf is a reasonably small multiple of the number of chunks in the input data, and that it is independent of grid resolution. Having a too large task graph slows down Dask as it prepares the tasks to run and is causing excess memory use when the result is evaluated.
The text was updated successfully, but these errors were encountered: