Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix map_blocks HLG layering #3598

Merged
merged 4 commits into from
Dec 7, 2019
Merged

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Dec 5, 2019

[x] closes #3599

This fixes an issue with the HighLevelGraph noted in
#3584, and exposed by a recent
change in Dask to do more HLG fusion.

cc @dcherian.

This fixes an issue with the HighLevelGraph noted in
pydata#3584, and exposed by a recent
change in Dask to do more HLG fusion.
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this so quickly @TomAugspurger

So it was a graph construction issue.

Can you add some comments to make it clear why the graph needs to be constructed this way? You can also give yourself credit with a note in whats-new.rst

xarray/core/parallel.py Show resolved Hide resolved
xarray/core/parallel.py Show resolved Hide resolved
xarray/tests/test_dask.py Outdated Show resolved Hide resolved
xarray/tests/test_dask.py Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor Author

Hopefully the new comments make sense. I'm struggling a bit to explain things since I don't fully understand them myself :)

So it was a graph construction issue.

I think so. Dask doesn't actually validate arguments passed to HighLevelGraph. But I believe we assume that when all the values in dependencies are themselves keys of layers. We didn't have that before with things like

(Pdb) pp collections[0].dask.dependencies
{'all-84bc51ac43a9275b3662b0089710eab9': {'or_-64f95b81b2f8001b4c61f2023ac4c223'},
 ...
 'eq-abac622d95ce5055d3e7b7dea944ec37': {'lambda-e79de3edfa267f41111057d26471bce3-x',
                                         'ones-c4a83f4b990021618d55e0fa61a351d6'},
...
}

The 'lambda-e79de3edfa267f41111057d26471bce3-x' wasn't a layer of the graph. It was previously nested under the single new layer we were creating gname or lambda-e79de3edfa267f41111057d26471bce3 in this case.

@dcherian
Copy link
Contributor

dcherian commented Dec 5, 2019

But I believe we assume that when all the values in dependencies are themselves keys of layers.

I came to the same conclusion when I looked at it but didn't know if this was by accident or design. The docs don't say anything.

anyway the fix looks good. Thanks for tackling this.

I'll merge tomorrow if there are no other comments.

@dcherian dcherian merged commit cafcaee into pydata:master Dec 7, 2019
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 7, 2019
* upstream/master:
  Fix map_blocks HLG layering (pydata#3598)
  Silence sphinx warnings: Round 2 (pydata#3592)
  2x~5x speed up for isel() in most cases (pydata#3533)
  remove xarray again (pydata#3591)
  fix plotting with transposed nondim coords. (pydata#3441)
  make coarsen reductions consistent with reductions on other classes (pydata#3500)
  Resolve the version issues on RTD (pydata#3589)
  Add bottleneck & rasterio git tip to upstream-dev CI (pydata#3585)
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 30, 2019
…equiv

* 'master' of github.com:pydata/xarray: (28 commits)
  Add nanmedian for dask arrays (pydata#3604)
  added pyinterp to related projects (pydata#3655)
  Allow incomplete hypercubes in combine_by_coords (pydata#3649)
  concat keeps attrs from first variable. (pydata#3637)
  Extend DatetimeAccessor properties and support `.dt` accessor for Timedelta (pydata#3612)
  update readthedocs.yml (pydata#3639)
  silence sphinx warnings round 3 (pydata#3602)
  Fix/quantile wrong errmsg (pydata#3635)
  Provide shape info in shape mismatch error. (pydata#3619)
  Minor doc fixes (pydata#3615)
  Respect user-specified coordinates attribute. (pydata#3487)
  Add Facetgrid.row_labels & Facetgrid.col_labels (pydata#3597)
  Fix pint integration tests (pydata#3600)
  Minor fix to combine_by_coords to allow for the combination of CFTimeIndexes separated by large time intervals (pydata#3543)
  Fix map_blocks HLG layering (pydata#3598)
  Silence sphinx warnings: Round 2 (pydata#3592)
  2x~5x speed up for isel() in most cases (pydata#3533)
  remove xarray again (pydata#3591)
  fix plotting with transposed nondim coords. (pydata#3441)
  make coarsen reductions consistent with reductions on other classes (pydata#3500)
  ...
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

map_blocks graph construction bug
2 participants