Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Series.__array_ufunc__ doesn't handle .outer #27186

Closed
TomAugspurger opened this issue Jul 2, 2019 · 4 comments · Fixed by #27198
Closed

REGR: Series.__array_ufunc__ doesn't handle .outer #27186

TomAugspurger opened this issue Jul 2, 2019 · 4 comments · Fixed by #27198
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@TomAugspurger
Copy link
Contributor

0.24.2

In [4]: s = pd.Series([1, 2, 3])

In [5]: o = np.array([0, 1 ,2])

In [6]: np.subtract.outer(s, o)
Out[6]:
array([[ 1,  0, -1],
       [ 2,  1,  0],
       [ 3,  2,  1]])

master

In [7]: np.subtract.outer(s, o)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-7-a64bc670e48a> in <module>
----> 1 np.subtract.outer(s, o)

~/sandbox/pandas/pandas/core/series.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
    793             return None
    794         else:
--> 795             return construct_return(result)
    796
    797     def __array__(self, dtype=None):

~/sandbox/pandas/pandas/core/series.py in construct_return(result)
    784                                      index=index,
    785                                      name=name,
--> 786                                      copy=False)
    787
    788         if type(result) is tuple:

~/sandbox/pandas/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    255             else:
    256                 data = sanitize_array(data, index, dtype, copy,
--> 257                                       raise_cast_failure=True)
    258
    259                 data = SingleBlockManager(data, index, fastpath=True)

~/sandbox/pandas/pandas/core/internals/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    654     elif subarr.ndim > 1:
    655         if isinstance(data, np.ndarray):
--> 656             raise Exception('Data must be 1-dimensional')
    657         else:
    658             subarr = com.asarray_tuplesafe(data, dtype=dtype)

Exception: Data must be 1-dimensional

Probably not an RC blocker, but I may try to get a fix in quick.

Do we want to return an ndarray here? We don't want a DataFrame where the index & columns match?

cc @jorisvandenbossche @shoyer if you have thoughts on the ideal return type.

@TomAugspurger TomAugspurger added the Numeric Operations Arithmetic, Comparison, and Logical operations label Jul 2, 2019
@TomAugspurger TomAugspurger added this to the 0.25.0 milestone Jul 2, 2019
@jorisvandenbossche
Copy link
Member

if you have thoughts on the ideal return type.

It could indeed be a DataFrame.

(I first wanted to say: that's only possible if both inputs are a Series, otherwise you don't have labels for the columns, but: inputs first get aligned, so you always get the same labels for index / columns.
And that is actually a change in behavior compared to the current one (which is also the case for the plain ufuncs -> should this be mentioned more explicitly in the whatsnew? I don't think we currently have anything about that)

BTW, for outer ufuncs, in numpy you can have different lengths for both inputs, and end up with a non-square array. But since we do alignment in normal ufuncs in pandas, we should do alignment here as well I suppose. Although I would also see usefulness in a non-aligning behaviour, where the resulting index/column labels can be different.

@TomAugspurger
Copy link
Contributor Author

Hmm, apparently the inputs can be n-dimensional. So returning a DataFrame doesn't scale well. Really, outer[Series, Series] and outer[Series, Index] are the only two that could return a DataFrame. outer[Series, DataFrame] is already 3-dimensional, so we would need to return an ndarray.

So I guess we should always return an ndarray...

which is also the case for the plain ufuncs -> should this be mentioned more explicitly in the whatsnew?

Yes... I missed that. I'll make a PR.

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jul 2, 2019
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jul 2, 2019
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jul 2, 2019
@TomAugspurger TomAugspurger added the Regression Functionality that used to work in a prior pandas version label Jul 2, 2019
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jul 3, 2019
@TomAugspurger
Copy link
Contributor Author

For reference, xarray raises

In [22]: a = xr.DataArray([1, 2, 3])

In [23]: np.subtract.outer(a, a)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-23-83caec80c9b6> in <module>
----> 1 np.subtract.outer(a, a)

~/Envs/pandas-dev/lib/python3.7/site-packages/xarray/core/arithmetic.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
     47                 'alternative, consider explicitly converting xarray objects '
     48                 'to NumPy arrays (e.g., with `.values`).'
---> 49                 .format(method, ufunc))
     50
     51         if any(isinstance(o, SupportsArithmetic) for o in out):

NotImplementedError: outer method for ufunc <ufunc 'subtract'> is not implemented on xarray objects, which currently only support the __call__ method. As an alternative, consider explicitly converting xarray objects to NumPy arrays (e.g., with `.values`).

Given the ambiguity about whether .outer(Series, Series) should return a DataFrame or an ndarray, that seems sensible. I'll restore the previous behavior, but with a deprecation.

@jorisvandenbossche
Copy link
Member

Yes, that seems like the safest option (we can later still always enable it again if there is a clear use case)

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jul 3, 2019
TomAugspurger added a commit that referenced this issue Jul 3, 2019
* DEPR: Deprecate outer ufunc in Series.__array_ufunc__

Closes #27186
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants