CLN: Make ufunc works for Index #10638

sinhrks · 2015-07-20T13:16:06Z

I understand these are never used because Index is no longer the subclass of np.array. Correct?

Add tests for ~~all~~ almost ufuncs
- http://docs.scipy.org/doc/numpy/reference/ufuncs.html
~~CategoricalIndex~~: Needs to be done separately to fix Categorical, because number of categories can be changed.

np.sin(pd.Categorical([1, 2, 3]))
array([ 0.84147098,  0.90929743,  0.14112001])

MultiIndex: Raise TypeError or AttributeError, as ufuncs are performed to array of tuples.

jreback · 2015-07-20T13:24:03Z

yep I think you could take them out. as np.array(index) will coerce to a ndarray (and not an index sub-class). This is done by _shallow_copy in any event.

shoyer · 2015-07-20T15:57:44Z

No, we need this. Otherwise np.sin(idx) returns an ndarray.

It's not documented, but __array_finalize__ works even on non-subclasses.

jreback · 2015-07-20T16:00:20Z

ok let's add some tests and clean up the code
I suspect this should be in core/base then

eg to support the example above

jreback · 2015-07-20T16:25:13Z

IIRC there is another issue where for example (#9966 with a PR #9974) :

In [22]: np.sin(Index(range(5)))
Out[22]: Int64Index([0.0, 0.841470984808, 0.909297426826, 0.14112000806, -0.756802495308], dtype='float64')

which is wrong, this needs to be created w/o slightly differently, e.g.

Index(func(...), **self._get_attributes_dict())

so that the attributes get propogates AND the correct type gets inferred (e.g. this would return a FloatI64Index in this case)

maybe you want to supersede #9974

sinhrks · 2015-07-21T12:44:02Z

Thanks. I'll fix Index.__array_wraps__ to make ufunc work.

https://github.com/pydata/pandas/blob/master/pandas/core/index.py#L253

Am I correct that defining __array_finalize__ is not needed if __array_wraps__ is defined? Looks __array_finalize__ is not defined in normal Index.

http://docs.scipy.org/doc/numpy/reference/arrays.classes.html

shoyer · 2015-07-21T15:49:18Z

This is correct -- array_wraps is what we need.

On Tue, Jul 21, 2015 at 5:44 AM, Sinhrks notifications@github.com wrote:

Thanks. I'll fix Index.__array_wraps__ to make ufunc work.
https://github.com/pydata/pandas/blob/master/pandas/core/index.py#L253
Am I correct that defining __array_finalize__ is not needed if __array_wraps__ is defined? Looks __array_finalize__ is not defined in normal Index.

http://docs.scipy.org/doc/numpy/reference/arrays.classes.html

Reply to this email directly or view it on GitHub:
#10638 (comment)

sinhrks · 2015-07-22T13:32:28Z

Let me confirm 2 points:

Invalid ops for datetime-like indexes.

I understand numeric ufuncs (like np.sin) for datetime-like is almost meaningless. PeriodIndex is likely to return unexpected result because of current __array__ definition. One idea is to raise error when PeriodIndex get float (invalid) result.

import pandas as pd

# OK
np.sin(pd.DatetimeIndex(['2015-07', '2015-08']))
# TypeError: ufunc 'sin' not supported for the input types, and the inputs could not be safely coerced to   any supported types according to the casting rule ''safe''

# NG
pidx = pd.PeriodIndex(['2015-07', '2015-08'], freq='M')
np.sin(pidx)
# Float64Index([-0.594884318359, 0.354966539136], dtype='float64')

# ufunc is applied to i8 values
pidx.__array__()
# array([546, 547])

Some ufunc returns bool dtype. In such cases, it should return bool array as it is (like Index.duplicated)
```
np.isnan(pidx)
# Index([False, False], dtype='object')
```

jreback · 2015-07-23T11:14:38Z

@sinhrks both of those points look right

shoyer · 2015-07-23T16:15:24Z

Yes, agreed

On Thu, Jul 23, 2015 at 4:14 AM, jreback notifications@github.com wrote:

@sinhrks both of those points look right

Reply to this email directly or view it on GitHub:
#10638 (comment)

jreback · 2015-07-23T17:05:37Z

pandas/tseries/period.py

@@ -281,6 +281,16 @@ def __contains__(self, key):
            return False
        return key.ordinal in self._engine

+    def __array_wrap__(self, result, context=None):


doesn't this need to go in tseries/base, e.g. ufuncs are not valid for any datetimelike?

timedelta64 and datetime64 are real numpy dtypes, so ufuncs will already choke on them:

In [1]: import pandas as pd In [2]: i = pd.date_range('2000-01-01', periods=5) In [3]: import numpy as np In [4]: np.square(i) TypeError: ufunc 'square' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' In [5]: np.square(i.to_period()) Out[5]: PeriodIndex(['330671-10-02', '330731-10-03', '330791-10-05', '330851-10-09', '330911-10-16'], dtype='int64', freq='D')

I agree that tests for these other index types would be a good idea, though.

Also, I think we should probably prohibit all ufuncs on PeriodIndex, not just those that return float. For example, it's not valid to square periods, even though that returns integers.

Raising TypeError in __array_wrap__ affects to arithmetic using np.array. Currently, it works:

pd.PeriodIndex(['2011', '2012'], freq='A') + np.array([1, 2]) # PeriodIndex(['2012', '2014'], dtype='int64', freq='A-DEC')

So it can't be prohibited?

PeriodIndex + integer ndarray should not be expected to work -- the array
does not have the right type!

On Mon, Aug 17, 2015 at 3:35 PM, Sinhrks notifications@github.com wrote:

In pandas/tseries/period.py
#10638 (comment):

@@ -281,6 +281,16 @@ def contains(self, key):
return False
return key.ordinal in self._engine

def array_wrap(self, result, context=None):

Raising TypeError in array_wrap affects to arithmetic using np.array.
Currently, it works:

pd.PeriodIndex(['2011', '2012'], freq='A') + np.array([1, 2])

PeriodIndex(['2012', '2014'], dtype='int64', freq='A-DEC')

So it can't be prohibited?

—
Reply to this email directly or view it on GitHub
https://github.com/pydata/pandas/pull/10638/files#r37245206.

actually I disagree
add integers to period works and makes sense as its a freq op

Yes, because this is a valid freq shift, np.array should work as the same.

pd.PeriodIndex(['2011', '2012'], freq='A') + 1 # PeriodIndex(['2012', '2013'], dtype='int64', freq='A-DEC')

And #10744 has been done expecting arithmetic using np.array works.

Reconsidered this, and defining _add_ndarray for PeriodIndex may be an alternative. I think options are:

Always raise TypeError in __array_wraps__. Support ndarray ops in another methods (PeriodIndex must be lhs).

Raise TypeError if __array_wraps__ gets non-integers. Some ufunc which return int outputs meaningless results (like square)

NOTE: Maybe not intentional, PeriodIndex.shift works for ndarray. Thus we can use shift for ndarray ops.

@sinhrks remember that ATM PeriodIndex is an odd duck. Its a real int64 dtyped array, that happens to have boxing. So it allows any integer-like ops (as opposed to DatetimeIndex which prohibits a lot more because its M8[ns]).

So its needs handling to error on prohibited ops. So on on raising in __array_wraps__ for prohibited ops, but this might require some dispatching mechanism to the sub-class to determine what is allowed. E.g. we do it similarly like this in the _add_numeric_ops and like routines.

jreback · 2015-08-05T21:36:25Z

@sinhrks you might want to steal tests from #9974

jreback · 2015-08-15T23:11:18Z

@sinhrks how's this coming along?

sinhrks · 2015-09-01T14:43:13Z

pandas/tseries/period.py

+            elif isinstance(func, np.ufunc):
+                if 'M->M' not in func.types:
+                    msg = "ufunc '{0}' not supported for the PeriodIndex"
+                    raise ValueError(msg.format(func.__name__))


Should be TypeError, but cannot be raised from here because numpy catches.

@shoyer you know any better way to override the ufunc calling scheme? (e.g. as __numpy_ufunc__ doesn't exist yet). or is this best way?

I think this sort of thing is the best we can do for now

ok let's add a note that eventually we could replace this with numpy_ufunc

sinhrks · 2015-09-05T12:22:40Z

Updated and now green.

jreback · 2015-09-05T13:17:06Z

pandas/tests/test_index.py

+                         np.arctanh, np.deg2rad, np.rad2deg]:
+                if isinstance(idx, pd.tseries.base.DatetimeIndexOpsMixin):
+                    # raise TypeError or ValueError (PeriodIndex)
+                    # PeriodIndex behavior should be changed in future version


why the different exception behavior?

TypeError raised from __array_wrap__ seems to be catchrd by numpy and return unintended result.

jreback · 2015-09-05T14:22:59Z

this doesn't seeem to close #9966 ?

In [9]: df = pd.DataFrame({'i': np.linspace(0, 5, 6).astype(np.int64), 
               'j': np.linspace(2, 3, 6)}).set_index('i')

In [10]: df
Out[10]: 
     j
i     
0  2.0
1  2.2
2  2.4
3  2.6
4  2.8
5  3.0

In [11]: df.index
Out[11]: Int64Index([0, 1, 2, 3, 4, 5], dtype='int64', name=u'i')

In [12]: df.index = df.index / 10.

In [13]: df.index
Out[13]: Int64Index([0.0, 0.1, 0.2, 0.3, 0.4, 0.5], dtype='float64', name=u'i')

sinhrks · 2015-09-05T21:22:42Z

@jreback Standard arithmetic is handled by _numeric_binop. Now fixed.

df.index / 2.
# Float64Index([0.0, 0.5, 1.0, 1.5, 2.0, 2.5], dtype='float64', name=u'i')

np.divide(df.index, 2.)
# Float64Index([0.0, 0.5, 1.0, 1.5, 2.0, 2.5], dtype='float64', name=u'i')

sinhrks · 2015-09-05T22:12:39Z

Failed in Timedelta ops. On current master, mult and div doesn't affect to existing freq, but it should?

import pandas as pd
idx = pd.timedelta_range('2 days', periods=3, freq='2D')

idx * 2
# TimedeltaIndex(['4 days', '8 days', '12 days'], dtype='timedelta64[ns]', freq='2D')
# -> freq should be "4D"?

idx / 2
# TimedeltaIndex(['1 days', '2 days', '3 days'], dtype='timedelta64[ns]', freq='2D')
# -> freq should be "D"?

jreback · 2015-09-05T22:14:58Z

@sinhrks hmm I think you need to set the freq=None so it get's reinferred (or maybe can et it to 'infer')

CLN: Make ufunc works for Index

jreback · 2015-09-06T16:31:11Z

@sinhrks thanks!

jreback added the Clean label Jul 20, 2015

sinhrks changed the title ~~CLN: Remote Datetime-like Index.__array_finalize__~~ CLN: Remove Datetime-like Index.__array_finalize__ Jul 22, 2015

sinhrks force-pushed the array_finalize branch from 2b47a72 to 247682e Compare July 22, 2015 13:17

jreback added this to the 0.17.0 milestone Jul 23, 2015

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions labels Jul 23, 2015

sinhrks force-pushed the array_finalize branch from 247682e to 2882fd2 Compare July 23, 2015 12:38

jreback reviewed Jul 23, 2015
View reviewed changes

max-sixty mentioned this pull request Jul 23, 2015

Drop & insert on subtypes of index return their subtypes #10620

Closed

sinhrks changed the title ~~CLN: Remove Datetime-like Index.__array_finalize__~~ CLN: Make ufunc works for Index Jul 23, 2015

sinhrks force-pushed the array_finalize branch 5 times, most recently from 1050f88 to 2757884 Compare July 31, 2015 12:17

sinhrks mentioned this pull request Aug 1, 2015

BUG: PeriodIndex.order doesnt preserve freq #10305

Merged

sinhrks force-pushed the array_finalize branch from 2757884 to 4ae3ff3 Compare August 3, 2015 13:24

jreback mentioned this pull request Aug 5, 2015

Change of index values dtype should be reflected in index type #9974

Closed

sinhrks force-pushed the array_finalize branch from 4ae3ff3 to 0455ccd Compare August 17, 2015 11:18

sinhrks reviewed Sep 1, 2015
View reviewed changes

sinhrks force-pushed the array_finalize branch 10 times, most recently from c511a09 to 27c7e8a Compare September 5, 2015 00:38

jreback reviewed Sep 5, 2015
View reviewed changes

sinhrks force-pushed the array_finalize branch from 27c7e8a to 6a15d78 Compare September 5, 2015 21:13

sinhrks force-pushed the array_finalize branch 2 times, most recently from fb824f0 to e585781 Compare September 5, 2015 21:33

sinhrks force-pushed the array_finalize branch from e585781 to bd7ba08 Compare September 6, 2015 02:43

CLN: Remove __array_finalize__

e244bdd

sinhrks force-pushed the array_finalize branch from bd7ba08 to e244bdd Compare September 6, 2015 03:27

jreback added a commit that referenced this pull request Sep 6, 2015

Merge pull request #10638 from sinhrks/array_finalize

0cacd24

CLN: Make ufunc works for Index

jreback merged commit 0cacd24 into pandas-dev:master Sep 6, 2015

sinhrks deleted the array_finalize branch September 6, 2015 17:31

jreback mentioned this pull request Sep 7, 2015

Period/DatetimeIndex may not be broadcasting correctly #5032

Closed

sinhrks mentioned this pull request Sep 8, 2015

BUG: DatetimeIndex.freq can defer by ufunc #11024

Merged

mroeschke mentioned this pull request Dec 21, 2016

TST: Concat MultiIndex dataframes with deepcopy (#9967) #14936

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: Make ufunc works for Index #10638

CLN: Make ufunc works for Index #10638

sinhrks commented Jul 20, 2015

jreback commented Jul 20, 2015

shoyer commented Jul 20, 2015

jreback commented Jul 20, 2015

jreback commented Jul 20, 2015

sinhrks commented Jul 21, 2015

shoyer commented Jul 21, 2015

http://docs.scipy.org/doc/numpy/reference/arrays.classes.html

sinhrks commented Jul 22, 2015

jreback commented Jul 23, 2015

shoyer commented Jul 23, 2015

@sinhrks both of those points look right

jreback Jul 23, 2015

shoyer Jul 23, 2015

shoyer Jul 23, 2015

sinhrks Aug 17, 2015

shoyer Aug 17, 2015

jreback Aug 17, 2015

sinhrks Aug 17, 2015

sinhrks Aug 18, 2015

jreback Aug 18, 2015

jreback commented Aug 5, 2015

jreback commented Aug 15, 2015

sinhrks Sep 1, 2015

jreback Sep 1, 2015

shoyer Sep 1, 2015

jreback Sep 1, 2015

sinhrks commented Sep 5, 2015

jreback Sep 5, 2015

sinhrks Sep 5, 2015

jreback commented Sep 5, 2015

sinhrks commented Sep 5, 2015

sinhrks commented Sep 5, 2015

jreback commented Sep 5, 2015

jreback commented Sep 6, 2015

CLN: Make ufunc works for Index #10638

CLN: Make ufunc works for Index #10638

Conversation

sinhrks commented Jul 20, 2015

jreback commented Jul 20, 2015

shoyer commented Jul 20, 2015

jreback commented Jul 20, 2015

jreback commented Jul 20, 2015

sinhrks commented Jul 21, 2015

shoyer commented Jul 21, 2015

sinhrks commented Jul 22, 2015

jreback commented Jul 23, 2015

shoyer commented Jul 23, 2015

@sinhrks both of those points look right

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PeriodIndex(['2012', '2014'], dtype='int64', freq='A-DEC')

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Aug 5, 2015

jreback commented Aug 15, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sinhrks commented Sep 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 5, 2015

sinhrks commented Sep 5, 2015

sinhrks commented Sep 5, 2015

jreback commented Sep 5, 2015

jreback commented Sep 6, 2015