Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with Dask (add tests; implement the Dask collection interface on Quantity) #883

Closed
jthielen opened this issue Sep 15, 2019 · 7 comments · Fixed by #1129
Closed
Milestone

Comments

@jthielen
Copy link
Contributor

Based on #878 and pydata/xarray#525, it would be helpful for interoperability between xarray, pint, and dask for pint to implement the dask collection interface for when a pint Quantity wraps a dask array. This should allow a Quantity-wrapped dask array to still behave in a dask-array-like way (i.e., as a "duck dask array"). There could also be convenience methods like compute(), persist(), and chunk(), following xarray's example.

Implementation of this could likely follow or come along with changes discussed in #878 and #845. Based on @hgrecco's comment (#878 (comment)), I would guess that this would also all be following a decision being made about #875 and #764 to know when this should be implemented.

Also, ping @crusaderky, since you've been working a lot with xarray, pint, and dask together, and I'd want to hear your thoughts on this.

@crusaderky
Copy link
Contributor

The only caveat I see is that xarray wraps specifically around a dask.array, not a generic dask collection.

The obvious example of another dask collection that should be treated differently is dask.dataframe.DataFrame -although from a quick test, it looks like xarray.Dataset.from_dataframe doesn't work, and the constructor of DataArray casts the dataframe into a dask.array, losing index and columns along the way. @shoyer did I miss anything?

So isinstance(obj, dask_array_type) scattered around the xarray codebase should be changed with something that duck-tests the object to have the __dask_* magic methods and to be 'numpy-like'. It doesn't help that numpy-like is not something well defined; e.g. dask.dataframe.DataFrame has a __array__ method, but we don't want to treat it as numpy-like.

@jthielen
Copy link
Contributor Author

jthielen commented Sep 16, 2019

So isinstance(obj, dask_array_type) scattered around the xarray codebase should be changed with something that duck-tests the object to have the __dask_* magic methods and to be 'numpy-like'. It doesn't help that numpy-like is not something well defined; e.g. dask.dataframe.DataFrame has a __array__ method, but we don't want to treat it as numpy-like.

With NEP 18, I would think "numpy-like" is now well-defined, i.e., I would hope that checking for __array_function__ in addition to the __dask_* methods is sufficient. This would successfully exclude dask.dataframe.DataFrame, and match how xarray currently checks for general array type compatibility. If some data object defines __array_function__ but isn't sufficiently "numpy-like" for xarray...that could be problematic. 😬 (EDIT: see commentary below and in #950 about how having __array_function__ does not equate to being a duck array)

Though, if I'm missing something that makes a combined __array_function__ and __dask*_ method check insufficient, do let me know.

(This also presumes having a recent enough dask version to ensure __array_function__ is implemented for dask arrays, but based on your comments in pydata/xarray#3222, xarray seems to be relying on features of recent versions of dask anyways.)

@crusaderky
Copy link
Contributor

This would successfully exclude dask.dataframe.DataFrame

Today, yes. But I would expect that in the future DataFrame objects (both pandas and dask) will start defining __array_function__, since it makes perfectly sense to call np.concatenate(DataFrame, DataFrame) and similar functions.

@crusaderky
Copy link
Contributor

(This also presumes having a recent enough dask version to ensure array_function is implemented for dask arrays, but based on your comments in pydata/xarray#3222, xarray seems to be relying on features of recent versions of dask anyways.)

That's fine - if you want to bask in the light of NEP18, you'll need very recent versions of your whole numeric stack anyway.

@jthielen jthielen changed the title Implement the dask collection interface on pint Quantity Integration with Dask (add tests; implement the Dask collection interface on Quantity) Dec 30, 2019
bors bot added a commit that referenced this issue Dec 30, 2019
963: Add tests and documentation with improvement of downcast type compatibility (part of #845) r=hgrecco a=jthielen

As a part of #845, this PR adds tests for downcast type compatibility with Sparse's `COO` and NumPy's `MaskedArray`, along with more careful handling of downcast types throughout the library. Also included is new documentation on array type compatibility, including the type casting hierarchy digraph by @shoyer and @crusaderky.

While this PR doesn't fully bring Pint's downcast type compatibility to a completed state, I think this gets it "good enough" for the upcoming release, and the remaining issues are fairly well defined:

- MaskedArray non-commutativity (#633 / numpy/numpy#15200)
- Dask compatibility (#883)
- Addition of CuPy tests (no issue on issue tracker yet)

Because of that, I think this can close #845, but if @hgrecco you want that kept open until the above items are resolved, let me know.

- [x] Closes #37; Closes #845
- [x] Executed ``black -t py36 . && isort -rc . && flake8`` with no errors
- [x] The change is fully covered by automated unit tests
- [x] Documented in docs/ as appropriate
- [x] Added an entry to the CHANGES file


Co-authored-by: Jon Thielen <github@jont.cc>
@jthielen
Copy link
Contributor Author

With #845 / #963 deferring on tests with Dask, I've updated this issue to also cover adding tests with Dask (in part, due to the hold up of dask/dask#4583). I'd be glad to continue working on it, but since I likely won't get chance to do so until later in January, if someone else wanted to take this on in the mean time, feel free!

In any case, hopefully this can be something included in the release after the upcoming one (so Pint 0.11)?

@hgrecco hgrecco added this to the 0.11 milestone Jan 7, 2020
@hgrecco hgrecco modified the milestones: 0.11, 0.12 Feb 10, 2020
@jthielen
Copy link
Contributor Author

jthielen commented Mar 5, 2020

Sorry I have to keep putting this off, but given some more urgent projects that have come up, I do not think I will get the chance to work on this again until April. So, if 0.12 is due out in mid-to-late March, this may need to be bumped again from 0.12 to 0.13, or just un-milestoned until I or someone else is able to work on this.

@dschien
Copy link

dschien commented Apr 6, 2020

@hgrecco I am eagerly waiting for a new release so I stop using github links in my dependencies. I had a look at the 0.12 release plan and then came across this. Bumping @jthielen 's comment - should this be postponed to 0.13?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants