Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataFrame.__array_ufunc__ #36955

Merged
merged 20 commits into from
Nov 25, 2020

Conversation

TomAugspurger
Copy link
Contributor

For some cases, this will preserve extension types of arrays by calling
the ufunc blockwise.

In [1]: import pandas as pd; import numpy as np
In [2]: df = pd.DataFrame({"A": pd.array([0, 1], dtype="Sparse")})

In [3]: np.sin(df).dtypes
Out[3]:
A    Sparse[float64, nan]
dtype: object

Implementation-wise, this was done by moving Series.__array_ufunc__ to NDFrame and making it generic for Series / DataFrame. The DataFrame implementation goes through BlockManager.apply(ufunc).

We don't currently handle the multi-input case well for dataframes (aside from ufuncs that are implemented as dunder ops like np.add). For these, we fall back to the old implementation of converting to an ndarray and wrapping the result. This loses extension types.

We also don't currently handle multi-output ufuncs (like np.modf). This would require a BlockManager.apply that returns a Tuple[BlockManager], nout per input block. Maybe someday, but that's low priority.

closes #23743

@TomAugspurger TomAugspurger added the Numeric Operations Arithmetic, Comparison, and Logical operations label Oct 7, 2020
@TomAugspurger TomAugspurger added this to the 1.2 milestone Oct 7, 2020
For some cases, this will preserve extension types of arrays by calling
the ufunc blockwise.

```python
In [1]: import pandas as pd; import numpy as np
In [2]: df = pd.DataFrame({"A": pd.array([0, 1], dtype="Sparse")})

In [3]: np.sin(df).dtypes
Out[3]:
A    Sparse[float64, nan]
dtype: object
```

We don't currently handle the multi-input case well (aside from ufuncs that
are implemented as dunder ops like `np.add`). For these, we fall back to
the old implementation of converting to an ndarray.
pandas/core/frame.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm some questions.

doc/source/whatsnew/v1.2.0.rst Show resolved Hide resolved
pandas/core/generic.py Show resolved Hide resolved
@jbrockmendel
Copy link
Member

needs rebase, otherwise i think pretty ready

@jreback
Copy link
Contributor

jreback commented Nov 4, 2020

can you merge master

@jbrockmendel
Copy link
Member

@TomAugspurger needs rebase, otherwise i think good to go

pandas/core/arraylike.py Show resolved Hide resolved
)
else:
reconstruct_axes = dict(zip(self._AXIS_ORDERS, self.axes))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally can you split this function up a bit (if easy)

@@ -220,6 +220,8 @@ Other enhancements
- :meth:`DatetimeIndex.searchsorted`, :meth:`TimedeltaIndex.searchsorted`, :meth:`PeriodIndex.searchsorted`, and :meth:`Series.searchsorted` with datetimelike dtypes will now try to cast string arguments (listlike and scalar) to the matching datetimelike type (:issue:`36346`)
-
- Added methods :meth:`IntegerArray.prod`, :meth:`IntegerArray.min`, and :meth:`IntegerArray.max` (:issue:`33790`)
- Calling a NumPy ufunc on a ``DataFrame`` with extension types now presrves the extension types when possible (:issue:`23743`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presrves -> preserves

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update :->

@jbrockmendel
Copy link
Member

if we were to implement __array_function__ (after dropping numpy 1.16 i guess) is that a "instead of" or an "in addition to" __array_ufunc__ thing?

does this make __array_wrap__ unnecessary?

@TomAugspurger
Copy link
Contributor Author

__array_ufunc__ is separate from __array_function__. If there's ever a specialized protocol (like __array_ufunc__) it takes priority over __array_function__.

IIRC I tried removing __array_wrap__ but there may have been a test failure. Apparently it's used by things other than __array_ufunc__.

@jbrockmendel
Copy link
Member

@TomAugspurger can you merge master? i tried doing it myself but am getting some really weird-looking errors

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Nov 22, 2020 via email

@jreback
Copy link
Contributor

jreback commented Nov 24, 2020

ideally we could do this for 1.2 if can merge master

@jbrockmendel
Copy link
Member

@jreback i merged master 2 days ago, this should be OK

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok 2 comments i think should do here (these are typos / doc-string). my comments about splitting up the actual function can be done later. ping on green.

@@ -220,6 +220,8 @@ Other enhancements
- :meth:`DatetimeIndex.searchsorted`, :meth:`TimedeltaIndex.searchsorted`, :meth:`PeriodIndex.searchsorted`, and :meth:`Series.searchsorted` with datetimelike dtypes will now try to cast string arguments (listlike and scalar) to the matching datetimelike type (:issue:`36346`)
-
- Added methods :meth:`IntegerArray.prod`, :meth:`IntegerArray.min`, and :meth:`IntegerArray.max` (:issue:`33790`)
- Calling a NumPy ufunc on a ``DataFrame`` with extension types now presrves the extension types when possible (:issue:`23743`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update :->

pandas/core/arraylike.py Show resolved Hide resolved
@jreback jreback merged commit c6eb725 into pandas-dev:master Nov 25, 2020
@jreback
Copy link
Contributor

jreback commented Nov 25, 2020

thanks @TomAugspurger very nice

@TomAugspurger TomAugspurger deleted the frame-array-ufunc branch November 26, 2020 00:26
@TomAugspurger
Copy link
Contributor Author

Thanks Brock.

@jorisvandenbossche
Copy link
Member

The fact that we now align is a breaking change, but it seems one that could easily be detected (when the operands are not yet aligned) and done with a deprecation cycle?

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Nov 30, 2020

Did we face a similar issue with Series implementing __array_ufunc__ in #23293? I don't see anything in the change log?

I vaguely recall a discussion around np.add(a, b) aligning, but, say, np.logaddexp(a, b) not, being considered a bug.

@jorisvandenbossche
Copy link
Member

Yes, we did a similar change back then for Series I think, and it seems we didn't discuss it much at the time either. And I also don't recall (m)any issues about it that were raised after doing the change. Which is certainly a sign that this might be indeed OK to simply change.

I would say the main differences is that for Series we changed this in 0.25, but now for 1.x, we said to do changes using a deprecation cycle where possible (and this seems a case where it is easily possible)

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Nov 30, 2020

Right, the 1.0 vs. 0.25 distinction is the important part. I'd prefer to see a deprecation warning for the alignment, but I don't know if I'll have time to do it for this release.

One complication just noticed: there isn't an easy way to silence the warning, like with a keyword. We'd need to only warn when the inputs are unaligned, and require users to call .align() ahead of time.

@jorisvandenbossche
Copy link
Member

One complication just noticed: there isn't an easy way to silence the warning, like with a keyword. We'd need to only warn when the inputs are unaligned, and require users to call .align() ahead of time.

Indeed, aligning manually would be the way to get around the warning if you want the new behaviour, otherwise the solution is to first convert to numpy arrays before calling the ufunc, in case you want the old non-aligning behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement DataFrame.__array_ufunc__
4 participants