Pint support for DataArray #3643

keewis · 2019-12-18T23:49:29Z

This is part of the effort to add support for pint (see #3594) on DataArray. This depends on #3611, so we should merge that first.

Tests added
Passes black . && mypy . && flake8
Fully documented, including whats-new.rst for all changes and api.rst for new API

list of failing tests from #3594:

__init__: Needs unit support in IndexVariable, blocked by Explicit indexes in xarray's data-model (Future of MultiIndex) #1603
aggregation:
- pint currently does not implement prod
- calling np.median returns a numpy scalar instead of a DataArray because DataArray does not implement __array_function__
- everything else works
bivariate ufuncs: fixed by ufunc calls with mixed args and upcast types hgrecco/pint#951 (but the unit won't be correct if the array is not passed first. See Make commutative operations commutative in both magnitude and units hgrecco/pint#1019)
numpy methods: rank only works with numpy.ndarray (uses bottleneck)
ffill, bfill: uses bottleneck, which does not support NEP-18?
where: works (but the choice of treating array(nan) and array(0) as special needs to be discussed)
interpolate_na: enforces the use of numpy.vectorize which does not support NEP18
combine_first: works, but does not test indexing (which should fail)
equals, identical: works (right now identical returns the same as equals)
drop_sel, sel, loc: indexes strip units (Explicit indexes in xarray's data-model (Future of MultiIndex) #1603)
interp: uses scipy.interpolate.interp1d which strips units
interp_like: same as interp
to_unstacked_dataset: blocked by IndexVariable
quantile: works (but needs a new pint version since it uses numpy.nanquantile
rolling: uses numpy.lib.stride_tricks.as_strided which is not supported by NEP18
rolling_exp: numbagg

keewis · 2019-12-24T00:08:34Z

While trying to get the tests to pass, I encountered a few issues:

the implementation of __array_ufunc__:

xarray/xarray/core/arithmetic.py

Lines 37 to 39 in 3cbc459

    
           for x in inputs + out: 
        
               if not isinstance(x, self._HANDLED_TYPES + (SupportsArithmetic,)): 
        
                   return NotImplemented

checks if it handles all types. However, the way it is implemented will cause it to return NotImplemented for duck arrays, even if it would be fine to treat them the same as ndarray. Adding and not hasattr(x, "__array_function__") will make my tests pass, but I thought I'd ask first.

I've been testing if the aggregation functions work, both as method and as function. This works fine for mean but not for median: calling np.median(da) returns a numpy scalar instead of the data array wrapping a 0d array returned by da.median(). I can't seem to find the reason why this happens, mostly because I don't know where to look.

keewis · 2019-12-25T23:40:59Z

~~The aggregation tests fail because numpy.median needs __array_function__ to be implemented, but I think that's out of scope for this PR.~~ I marked it as xfailing.

The bivariate ufunc test is fixed by hgrecco/pint#951.

shift could be fixed by using

        filler = np.full_like(np.resize(trimmed_data, shape), fill_value, dtype=dtype)

instead of

xarray/xarray/core/variable.py

Lines 1105 to 1112 in 651f27f

    
           if isinstance(trimmed_data, dask_array_type): 
        
               chunks = list(trimmed_data.chunks) 
        
               chunks[axis] = (shape[axis],) 
        
               full = functools.partial(da.full, chunks=chunks) 
        
           else: 
        
               full = np.full 
        
           filler = full(shape, fill_value, dtype=dtype)

but I didn't check if that would be a bad for performance (and pint would have to implement np.resize). However, that would not work for dask arrays so I'm not sure what to do.

rolling uses bottleneck which does not play nice with pint, yet. I'd suggest using

       	if bottleneck_move_func	is not None and not isinstance(
	    self.obj.data, dask_array_type
	) and not hasattr(self.obj.data, "__array_function__"):

instead of

xarray/xarray/core/rolling.py

Lines 363 to 365 in 651f27f

    
           if bottleneck_move_func is not None and not isinstance( 
        
               self.obj.data, dask_array_type 
        
           ):

If we go with that, we'd need pint to implement np.pad.

Edit: I added it, but this still requires np.pad. However, it makes a lot of other, unrelated tests fail, so I removed the commit again.

Unless I missed something, these should be the only issues left for pint support in DataArray (not counting the IndexVariable issue).

keewis · 2019-12-26T23:36:47Z

@jthielen, it looks like this needs numpy.pad and numpy.resize in pint. I tried to implement it myself, but while numpy.resize is easy (add an entry to the implement_consistent_units_by_argument loop), that is not the case for numpy.pad. Regardless of whether my proposed fixes are actually the best way, numpy.pad is used in other places of the code base so we'd probably need both, anyway. What do you think?

jthielen · 2019-12-27T02:09:45Z

@jthielen, it looks like this needs numpy.pad and numpy.resize in pint. I tried to implement it myself, but while numpy.resize is easy (add an entry to the implement_consistent_units_by_argument loop), that is not the case for numpy.pad. Regardless of whether my proposed fixes are actually the best way, numpy.pad is used in other places of the code base so we'd probably need both, anyway. What do you think?

No problem at all to add these! See hgrecco/pint#956.

956: Add np.pad and np.resize implementations r=hgrecco a=jthielen As requested in pydata/xarray#3643 (comment), this PR adds implementations for `np.pad` and `np.resize`, with accompanying tests. - ~~Closes (insert issue number)~~ (#905 follow-up) - [x] Executed ``black -t py36 . && isort -rc . && flake8`` with no errors - [x] The change is fully covered by automated unit tests - [x] Documented in docs/ as appropriate - ~~Added an entry to the CHANGES file~~ (#905 follow-up) Co-authored-by: Jon Thielen <github@jont.cc>

crusaderky · 2020-01-30T12:33:45Z

if you merge from master you should get all green on the upstream-dev test now

keewis · 2020-01-30T14:03:28Z

most of these are failing tests related to this PR but it is on hold until #3706 and #3611 (in that order) have been merged.

xarray/core/variable.py

normally, this should work, but the documentation mismatches the implementation of searchsorted and names the keys as `keys` instead of `v`

keewis · 2020-03-22T01:24:09Z

A few updates on the progress here:

The implementation of interpolate_na uses apply_ufunc with vectorize=True which in turn uses numpy.vectorize. numpy.vectorize is explicitly mentioned as deferred in NEP18 (i.e. not supported). I guess we can't support interpolate_na unless someone knows a way around that? Passing vectorize=False to apply_ufunc does make it work, though.

The failure of test_bivariate_ufunc is due to the way pint infers units, but whether or not that should matter was discussed in #3706 (but we didn't really reach a consensus there, I think?).

interp does work up until it calls scipy.interpolate.interp1d, which seems to strip the units?

reindex works for data, ~~but the tests need a rewrite so they also test coords / dims with units (the same for the interp tests)~~.

max-sixty · 2020-04-02T21:15:25Z

LGTM! Thanks as ever @keewis

max-sixty · 2020-04-02T21:18:51Z

not sure what the bot is checking, xarray/core/arithmetic.py doesn't have 1934 lines (the last line is 107) and black doesn't complain about indentation for any line

I put in an issue

keewis · 2020-04-02T21:25:25Z

I put in an issue

thanks, @max-sixty.

keewis · 2020-04-02T21:54:06Z

it seems we can't use __array_function__ to check if the type should be handled or not (using __array_function__ works, but that's by accident, I think, and stops working once/if we implement __array_function__on xarray objects).

~~I'd say it is pretty rare the other type knows what to do with a xarray object, so I think we should retry using the wrapped object.~~ I don't know the reason for the current implementation, though, so I could be missing something. For reference, this only is a problem with functions that accept multiple values, like numpy.maximum. Any thoughts on this, @shoyer?

Edit: actually, that smells like a type hierarchy issue, as discussed in hgrecco/pint#845 and dask/dask#5329

keewis · 2020-04-06T13:52:46Z

Ugh. Trying to add pint.Quantityto the list of handled types does not work since pint tries to add xarray to its known non-handled types – that means we have a circular import. I guess I'll xfail the bivariate_ufuncs tests, too.

To resolve this, the type registration mentioned in #1617 and

xarray/xarray/core/arithmetic.py

Line 23 in ba9f822

# TODO: allow extending this with some sort of registration system

might help?

dcherian · 2020-04-06T13:59:07Z

pint tries to add xarray to its known non-handled types

I don't know much about this stuff can't pint have just a list of handled types and return NotImplementedfor anything else?

keewis · 2020-04-06T14:05:40Z

see dask/dask#5329 (comment). Basically, pint is high enough in the type hierarchy that there are too many types below it and it is easier for it to list those above it.

Actually, pint and xarray using different (conflicting) ways to check the support was the reason why I asked for advice in that issue.

jthielen · 2020-04-07T03:57:45Z

@keewis Is part of the specific problem that Pint has a guarded import of xarray when defining its upcast types? Would it help if it checked for the fully qualified class name instead?

jthielen · 2020-04-07T03:58:05Z

Also, after glancing through all this, it seems like xarray is dealing inconsistently with the type casting hierarchy:

Construction/wrapping allowing __array_function__-defining duck arrays among other explictly listed additions (pandas.Index, dask arrays, etc., see

xarray/xarray/core/variable.py

Line 166 in 9b5140e

def as_compatible_data(data, fastpath=False):

)
Binary ops only denying based on xarray's internal hierarchy (Dataset, DataArray, Variable), otherwise deferring to underlying data

__array_ufunc__ allowing a list of supported types (

xarray/xarray/core/arithmetic.py

Lines 24 to 30 in 9b5140e

    
           _HANDLED_TYPES = ( 
        
               np.ndarray, 
        
               np.generic, 
        
               numbers.Number, 
        
               bytes, 
        
               str, 
        
           ) + dask_array_type

along with SupportsArithmetic)

(__array_function__ yet to be defined)

Should this be its own issue?

keewis · 2020-04-07T09:53:23Z

Is part of the specific problem that Pint has a guarded import of xarray when defining its upcast types? Would it help if it checked for the fully qualified class name instead?

SupportsArithmetic is a base class of Variable, DataArray and Dataset, so if I try to import pint for type checking, pint can't use these classes, otherwise we'd have a circular import. Comparing fully qualified class names would work, but it might make sense to wait for a recommendation regarding my question.

Should this be its own issue?

Definitely, even if it's just to clarify why we accept different types in those functions: I think __init__ may convert its arguments so it supports more types than __array_ufunc__ (I don't know about the others, though). For the missing __array_function__ see #3917.

dcherian · 2020-04-29T16:08:03Z

IMO @keewis you should merge this if you think its ready.

even if it's just to clarify why we accept different types in those functions: I think init may convert its arguments so it supports more types than array_ufunc (I don't know about the others, though).

open an issue for this?

keewis · 2020-04-29T16:11:07Z

see #3950

Merging: sure, I just wanted to avoid merging my own PRs without a final review

* upstream/master: (39 commits) Pint support for DataArray (pydata#3643) Apply blackdoc to the documentation (pydata#4012) ensure Variable._repr_html_ works (pydata#3973) Fix handling of abbreviated units like msec (pydata#3998) full_like: error on non-scalar fill_value (pydata#3979) Fix some code quality and bug-risk issues (pydata#3999) DOC: add pandas.DataFrame.to_xarray (pydata#3994) Better chunking error messages for zarr backend (pydata#3983) Silence sphinx warnings (pydata#3990) Fix distributed tests on upstream-dev (pydata#3989) Add multi-dimensional extrapolation example and mention different behavior of kwargs in interp (pydata#3956) keep attrs in interpolate_na (pydata#3970) actually use preformatted text in the details summary (pydata#3978) facetgrid: Ensure that colormap params are only determined once. (pydata#3915) RasterioDeprecationWarning (pydata#3964) Empty line missing for DataArray.assign_coords doc (pydata#3963) New coords to existing dim (doc) (pydata#3958) implement a more threadsafe call to colorbar (pydata#3944) Fix wrong order of coordinate converted from pd.series with MultiIndex (pydata#3953) Updated list of core developers (pydata#3943) ...

…k-issues * upstream/master: (22 commits) support darkmode (pydata#4036) Use literal syntax instead of function calls to create the data structure (pydata#4038) Add template xarray object kwarg to map_blocks (pydata#3816) Transpose coords by default (pydata#3824) Remove broken test for Panel with to_pandas() (pydata#4028) Allow warning with cartopy in docs plotting build (pydata#4032) Support overriding existing variables in to_zarr() without appending (pydata#4029) chore: Remove unnecessary comprehension (pydata#4026) fix to_netcdf docstring typo (pydata#4021) Pint support for DataArray (pydata#3643) Apply blackdoc to the documentation (pydata#4012) ensure Variable._repr_html_ works (pydata#3973) Fix handling of abbreviated units like msec (pydata#3998) full_like: error on non-scalar fill_value (pydata#3979) Fix some code quality and bug-risk issues (pydata#3999) DOC: add pandas.DataFrame.to_xarray (pydata#3994) Better chunking error messages for zarr backend (pydata#3983) Silence sphinx warnings (pydata#3990) Fix distributed tests on upstream-dev (pydata#3989) Add multi-dimensional extrapolation example and mention different behavior of kwargs in interp (pydata#3956) ...

keewis mentioned this pull request Dec 21, 2019

support for units with pint #3594

Open

18 tasks

keewis force-pushed the pint-support-dataarray branch from 4863205 to 61c3eb9 Compare December 26, 2019 18:08

jthielen mentioned this pull request Dec 27, 2019

Add np.pad and np.resize implementations hgrecco/pint#956

Merged

3 tasks

keewis mentioned this pull request Dec 28, 2019

Tests for variables with units #3654

Merged

2 tasks

amcnicho reviewed Mar 11, 2020

View reviewed changes

xarray/core/variable.py Outdated Show resolved Hide resolved

keewis added 10 commits March 11, 2020 23:09

remove xfail marks from median and cumprod

9cf9018

remove all xfails not related to indexes or external packages

a300a17

switch away from using assert_equal_with_units

f5fd0cc

use assert_allclose in a few cases instead

eed6c70

don't use a kwarg for searchsorted

7198b46

normally, this should work, but the documentation mismatches the implementation of searchsorted and names the keys as `keys` instead of `v`

move the tests for item into their own test function

2477588

move the searchsorted tests into their own test function

a660b4e

remove a wrapping pytest.param

41675cb

treat objects implementing __array_function__ the same as ndarray

570ad1d

mark numpy.median as xfailing

308df31

keewis force-pushed the pint-support-dataarray branch from 54e6fde to 308df31 Compare March 11, 2020 22:13

keewis added 5 commits March 12, 2020 18:21

remove the xfail marks for the all and any tests

039d17d

use assert_units_equal to check the resulting units

796d5e7

don't attempt to use interpolate_na with int dtype arrays

b5c9a21

update the xfail reason for DataArray.interpolate_na

741a979

xfail the compatible units bivariate_ufunc test and don't use 0

00a0b38

combine and expand the reindex and interp tests

a27fea7

max-sixty mentioned this pull request Apr 2, 2020

"Indentation is not a multiple of four" potential error? pep8speaks-org/pep8speaks#161

Closed

Merge branch 'master' into pint-support-dataarray

79430f0

keewis force-pushed the pint-support-dataarray branch from 2031b52 to 79430f0 Compare April 6, 2020 13:53

keewis added 2 commits April 6, 2020 15:54

xfail the bivariate ufunc tests

c3cd732

remove the check for __array_ufunc__

c09b6ed

TomNicholas added the topic-arrays related to flexible array support label Apr 6, 2020

TomNicholas added this to In progress in Duck Array Wrapping via automation Apr 6, 2020

jthielen mentioned this pull request Apr 7, 2020

Consistent Handling of Type Casting Hierarchy #3950

Open

keewis mentioned this pull request Apr 15, 2020

pint support for Dataset #3975

Merged

3 tasks

keewis added 3 commits April 21, 2020 18:25

skip the DataArray.identical tests

28ec566

Merge branch 'master' into pint-support-dataarray

ae14a1a

use pytest.param

4303d6b

keewis merged commit 3820fb7 into pydata:master Apr 29, 2020

Duck Array Wrapping automation moved this from In progress to Done Apr 29, 2020

keewis deleted the pint-support-dataarray branch April 29, 2020 16:12

znicholls mentioned this pull request Jul 30, 2020

Ops with scalars and vectors openscm/scmdata#82

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pint support for DataArray #3643

Pint support for DataArray #3643

keewis commented Dec 18, 2019 •

edited

Loading

keewis commented Dec 24, 2019

keewis commented Dec 25, 2019 •

edited

Loading

keewis commented Dec 26, 2019

jthielen commented Dec 27, 2019

crusaderky commented Jan 30, 2020

keewis commented Jan 30, 2020

keewis commented Mar 22, 2020 •

edited

Loading

max-sixty commented Apr 2, 2020

max-sixty commented Apr 2, 2020

keewis commented Apr 2, 2020

keewis commented Apr 2, 2020 •

edited

Loading

keewis commented Apr 6, 2020 •

edited

Loading

dcherian commented Apr 6, 2020

keewis commented Apr 6, 2020

jthielen commented Apr 7, 2020

jthielen commented Apr 7, 2020

keewis commented Apr 7, 2020 •

edited

Loading

dcherian commented Apr 29, 2020

keewis commented Apr 29, 2020 •

edited

Loading

Pint support for DataArray #3643

Pint support for DataArray #3643

Conversation

keewis commented Dec 18, 2019 • edited Loading

keewis commented Dec 24, 2019

keewis commented Dec 25, 2019 • edited Loading

keewis commented Dec 26, 2019

jthielen commented Dec 27, 2019

crusaderky commented Jan 30, 2020

keewis commented Jan 30, 2020

keewis commented Mar 22, 2020 • edited Loading

max-sixty commented Apr 2, 2020

max-sixty commented Apr 2, 2020

keewis commented Apr 2, 2020

keewis commented Apr 2, 2020 • edited Loading

keewis commented Apr 6, 2020 • edited Loading

dcherian commented Apr 6, 2020

keewis commented Apr 6, 2020

jthielen commented Apr 7, 2020

jthielen commented Apr 7, 2020

keewis commented Apr 7, 2020 • edited Loading

dcherian commented Apr 29, 2020

keewis commented Apr 29, 2020 • edited Loading

keewis commented Dec 18, 2019 •

edited

Loading

keewis commented Dec 25, 2019 •

edited

Loading

keewis commented Mar 22, 2020 •

edited

Loading

keewis commented Apr 2, 2020 •

edited

Loading

keewis commented Apr 6, 2020 •

edited

Loading

keewis commented Apr 7, 2020 •

edited

Loading

keewis commented Apr 29, 2020 •

edited

Loading