ENH: Implementing NEP 18's __array_function__ #26380

jakirkham · 2019-05-14T00:37:27Z

It would be useful to have __array_function__ support as described in NEP 18 implemented for Pandas objects. This would allow users to run NumPy functions on Pandas objects while deferring to Pandas on how those operations should run.

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-05-17T17:53:05Z

This would be interesting to explore (in addition to Series.array_ufunc: #23293).

jorisvandenbossche · 2019-06-19T17:39:41Z

One thing we might think about: do we want to keep this working exactly the same as the current methods, or would we want to take the opportunity to make it more compatible with numpy?

Not sure that are more things, but what I am thinking about is the axis handling in case of reductions for a DataFrame (so a rather specific case, maybe not relevant for many of the functions covered by __array_function__):

In [19]: df = pd.DataFrame(np.random.randn(5, 3), columns=['a', 'b', 'c'] ) 

In [20]: df.sum() 
Out[20]: 
a   -0.823846
b    6.850160
c    0.696525
dtype: float64

In [21]: np.sum(df)
Out[21]: 
a   -0.823846
b    6.850160
c    0.696525
dtype: float64

In [22]: np.sum(df.values) 
Out[22]: 6.7228383609003615

In [24]: np.sum(df.values, axis=0) 
Out[24]: array([-0.82384625,  6.85015992,  0.69652469])

On numpy, the default is axis=None to reduce all dimensions. In pandas we don't have this functionality, and (somewhat unfortunately IMO) the axis=None means the default of 0 in practice. I think it would be nice to add this axis=None behaviour to pandas (optional of course, default would stay the same). But if we do that, the question could be if we want to "respect" the default axis of np.sum (but it would need to go through a deprecation cycle anyway, probably).

TomAugspurger · 2019-06-19T17:47:36Z

@jorisvandenbossche note that for any / all, we do interpret axis=None as reduce all dimensions.

In [25]: df = pd.DataFrame({"A": [True, False], "B": [True, True]})

In [26]: df.all(axis=None)
Out[26]: False

IIRC, that was necessary for compatibility with a change in NumPy. I may be wrong, but I think we wanted to expand that interpretation of None to all the reduction methods.

edit: with a change in the default to axis=0, to maintain compatibility.

jorisvandenbossche · 2019-06-19T17:52:15Z

Yes, I think it would be good to add that option to all reduction methods. But apart from that, it is still the question what the default np.sum(df) should ideally do (follow numpy's default axis=None, or pandas' default axis=0. Since numpy is calling, I personally would find it logical to follow numpy's default).

TomAugspurger · 2019-06-19T17:56:27Z

Ah, yeah didn't mean to distract from your general point. I'm a bit conflicted on what to do here, but following NumPy's default is probably best. Not sure though.

…

On Wed, Jun 19, 2019 at 12:52 PM Joris Van den Bossche < ***@***.***> wrote: Yes, I think it would be good to add that option to all reduction methods. But apart from that, it is still the question what the default np.sum(df) should ideally do (follow numpy's default axis=None, or pandas' default axis=0. Since numpy is calling, I personally would find it logical to follow numpy's default). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#26380?email_source=notifications&email_token=AAKAOITYSOCTFCKJVTYRB3TP3JW5TA5CNFSM4HMUPBRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCU2WI#issuecomment-503663961>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIRQWD5D3X6PHTBGSLLP3JW5TANCNFSM4HMUPBRA> .

TomAugspurger · 2020-03-26T21:14:38Z

I think we'll want to implement this for our arrays first (e.g. IntegerArray).

jbrockmendel · 2020-11-20T15:48:42Z

I've got a branch that implements __array_function__ for NDarrayBackedExtensionArray and is now passing the tests. The difficult part is that apparently there is no nice way to implement it for just a handful of np.foo functions (say just np.delete and np.repeat) without breaking every other np.foo function, many of which are called on EAs in our tests.

TomAugspurger · 2020-11-22T19:48:55Z

Dask handles that with a warning and a fallback:
https://github.com/dask/dask/blob/0ca77043bbbe015dcb69378ece54419332734f40/dask/array/core.py#L1423-L1444.

If we want something similar, we could cast to an ndarray as a fallback.

shoyer · 2021-03-20T18:54:40Z

My two cents:

This would definitely make sense for pandas array objects. These objects have the semantics of 1D NumPy arrays.
I'm not sure it makes sense for pandas.Series or pandas.DataFrame. These objects don't work like NumPy arrays, so implementating NumPy functions on them seems a little funny.

jbrockmendel · 2021-03-22T22:15:10Z

If we want something similar, we could cast to an ndarray as a fallback.

The part I'm having trouble with is reliably identifying where self is in the args/kwargs when it could be e.g. hidden inside a tuple somewhere. I tracked the dask implementation back to base.compute before getting lost.

jakirkham · 2021-03-22T22:27:50Z

cc @pentschev @rgommers (for vis)

jakirkham mentioned this issue May 14, 2019

Check for __array_function__ when skipping __array_wrap__ dask/dask#4797

Open

TomAugspurger added the Enhancement label May 17, 2019

ghost mentioned this issue Jun 19, 2019

Tracking issue for EA Series Operations Support #26913

Closed

TomAugspurger mentioned this issue Mar 26, 2020

Int64 Series input to numpy.polyfit() fails #32989

Open

simonjayhawkins mentioned this issue Jun 27, 2020

API: implement __array_function__ for ExtensionArray #35032

Closed

shoyer mentioned this issue Mar 20, 2021

Rename is_arraylike to is_duck_array dask/dask#7327

Closed

dcherian mentioned this issue May 10, 2021

Support for pandas Extension Arrays pydata/xarray#5287

Closed

jakirkham mentioned this issue Aug 18, 2021

KeyError when using distributed scheduler with __array_function__ dask/distributed#5224

Closed

simonjayhawkins mentioned this issue Aug 26, 2021

BUG: np.select function does not work with BooleanDtype #43228

Open

2 tasks

jbrockmendel mentioned this issue Dec 21, 2021

BUG: inconsistent types when applying numpy operations #41756

Closed

3 tasks

jbrockmendel added the Compat pandas objects compatability with Numpy or Python functions label Jan 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Implementing NEP 18's __array_function__ #26380

ENH: Implementing NEP 18's __array_function__ #26380

jakirkham commented May 14, 2019

TomAugspurger commented May 17, 2019

jorisvandenbossche commented Jun 19, 2019

TomAugspurger commented Jun 19, 2019 •

edited

Loading

jorisvandenbossche commented Jun 19, 2019

TomAugspurger commented Jun 19, 2019 via email

TomAugspurger commented Mar 26, 2020

jbrockmendel commented Nov 20, 2020

TomAugspurger commented Nov 22, 2020

shoyer commented Mar 20, 2021

jbrockmendel commented Mar 22, 2021

jakirkham commented Mar 22, 2021

ENH: Implementing NEP 18's __array_function__ #26380

ENH: Implementing NEP 18's __array_function__ #26380

Comments

jakirkham commented May 14, 2019

TomAugspurger commented May 17, 2019

jorisvandenbossche commented Jun 19, 2019

TomAugspurger commented Jun 19, 2019 • edited Loading

jorisvandenbossche commented Jun 19, 2019

TomAugspurger commented Jun 19, 2019 via email

TomAugspurger commented Mar 26, 2020

jbrockmendel commented Nov 20, 2020

TomAugspurger commented Nov 22, 2020

shoyer commented Mar 20, 2021

jbrockmendel commented Mar 22, 2021

jakirkham commented Mar 22, 2021

TomAugspurger commented Jun 19, 2019 •

edited

Loading