New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimedeltaIndex +/- NaT inconsistency with Series[timedelta64]+/- NaT #19124

Closed
jbrockmendel opened this Issue Jan 7, 2018 · 4 comments

Comments

Projects
None yet
2 participants
@jbrockmendel
Member

jbrockmendel commented Jan 7, 2018

Analogous to (#18808)

Adapted from tests.indexes.timedeltas.test_arithmetic.TestTimedeltaIndexArithmetic.test_timedelta_ops_with_missing_values

tdi = pd.TimedeltaIndex(['00:00:01'])
ser = pd.Series(tdi)

>>> tdi + pd.NaT
DatetimeIndex(['NaT'], dtype='datetime64[ns]', freq=None)
>>> tdi - pd.NaT
DatetimeIndex(['NaT'], dtype='datetime64[ns]', freq=None)

>>> ser + pd.NaT
0   NaT
dtype: timedelta64[ns]
>>> ser - pd.NaT
0   NaT
dtype: timedelta64[ns]

i.e. Series[timedelta64] treats pd.NaT as a timedelta, while TimedeltaIndex treats pd.NaT as a datetime (in a way that makes sense for addition but not for subtraction).

Expected Behavior
Internal consistency in the sense that:

(tdi - pd.NaT).dtype == (ser - pd.NaT).dtype
(tdi + pd.NaT).dtype == (ser + pd.NaT).dtype

For tdi - pd.NaT it only makes sense for the result to be timedelta64, so the TimedeltaIndex behavior is wrong.

For tdi + pd.NaT a reasonable argument can be made for the result being either timedelta64 or datetime64. Returning timedelta64 maintains consistency with TimedeltaIndex.__sub__, while datetime64 maintains consistency with TimedeltaIndex.__radd__. All in all I think keeping addition commutative is the more important of the two.

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 7, 2018

these should all be timedelta

@jreback jreback added this to the 0.23.0 milestone Jan 7, 2018

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Jan 8, 2018

these should all be timedelta

I'm fine with this, but it merits double-checking because the current behavior is specifically tested:

    tdi = TimedeltaIndex(['1 day', '2 day'], name='x')
    exp = DatetimeIndex([NaT, NaT], name='x')
    for (left, right) in [(NaT, tdi)]:
        tm.assert_index_equal(left + right, exp)
        tm.assert_index_equal(right + left, exp)
        tm.assert_index_equal(left - right, exp)
        tm.assert_index_equal(right - left, exp)

The subtraction is clearly wrong as is, but if the default is to view NaT as a datetime, then pd.NaT + tdi should be a DatetimeIndex, in which case tdi + pd.NaT probably should too.

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 9, 2018

as I said it is inconsistent, so let's change it to make it correct

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Jan 9, 2018

I've implemented the change in #19139, but really want to make sure we're on the same page that there is no way to be entirely internally consistent because pd.NaT is quacks like either a datetime or a timedelta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment