Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Period - Period return DateOffset instead of int #21314

Merged
merged 12 commits into from
Jun 29, 2018
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,52 @@ Backwards incompatible API changes

.. _whatsnew_0240.api.datetimelike:

Period Subtraction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a reference tag here

^^^^^^^^^^^^^^^^^^

Subtraction of a ``Period`` from another ``Period`` will give a ``DateOffset``.
instead of an integer (:issue:`21314`)

.. ipython:: python

june = pd.Period('June 2018')
april = pd.Period('April 2018')
june - april

Previous Behavior:

.. code-block:: ipython

In [2]: june = pd.Period('June 2018')

In [3]: april = pd.Period('April 2018')

In [4]: june - april
Out [4]: 2

Similarly, subtraction of a ``Period`` from a ``PeriodIndex`` will not return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will now return

an ``Index`` of ``DateOffset`` objects instead of an ``Int64Index``

.. ipython:: python

pi = pd.period_range('June 2018', freq='M', periods=3)
pi - pi[0]

Previous Behavior:

.. code-block:: ipython

In [2]: pi = pd.period_range('June 2018', freq='M', periods=3)

In [3]: pi - pi[0]
Out[3]: Int64Index([0, 1, 2], dtype='int64')

Datetimelike API Changes
^^^^^^^^^^^^^^^^^^^^^^^^

- For :class:`DatetimeIndex` and :class:`TimedeltaIndex` with non-``None`` ``freq`` attribute, addition or subtraction of integer-dtyped array or ``Index`` will return an object of the same class (:issue:`19959`)
- Subtraction of :class:`Period` from :class:`Period` will return a :class:`DateOffset` object instead of an integer (:issue:`21314`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make a new sub-section and show a previous / new behavior example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "can" part of that is a very good question. I'll try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your subsection looks good, you don't need this note here any longer

- Subtraction of :class:`Period` from :class:`PeriodIndex` (or vice-versa) will return an object-dtype :class:`Index` instead of :class:`Int64Index` (:issue:`21314`)

.. _whatsnew_0240.api.other:

Expand Down
7 changes: 5 additions & 2 deletions pandas/_libs/tslibs/period.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1120,9 +1120,12 @@ cdef class _Period(object):
if other.freq != self.freq:
msg = _DIFFERENT_FREQ.format(self.freqstr, other.freqstr)
raise IncompatibleFrequency(msg)
return self.ordinal - other.ordinal
return (self.ordinal - other.ordinal) * self.freq
elif getattr(other, '_typ', None) == 'periodindex':
return -other.__sub__(self)
# GH#21314 PeriodIndex - Period returns an object-index
# of DateOffset objects, for which we cannot use __neg__
# directly, so we have to apply it pointwise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a test that hits this explicity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Without this we get test failures in tests.indexes.period.test_arithmetic.TestPeriodIndexSeriesMethods for
test_pi_ops, test_pi_sub_period, test_pi_sub_period_nat

return other.__sub__(self).map(lambda x: -x)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for the object-dtype Index to attempt __neg__ like this?

else: # pragma: no cover
return NotImplemented
elif is_period_object(other):
Expand Down
7 changes: 6 additions & 1 deletion pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -899,7 +899,9 @@ def __add__(self, other):
raise TypeError("cannot add {dtype}-dtype to {cls}"
.format(dtype=other.dtype,
cls=type(self).__name__))

elif is_categorical_dtype(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can make this is_extension_dtype (here and below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work for now, but it isn't obvious to me how it will shake out as more EA subclasses get implemented. Let me give this some thought.

# Categorical op will raise; defer explicitly
return NotImplemented
else: # pragma: no cover
return NotImplemented

Expand Down Expand Up @@ -964,6 +966,9 @@ def __sub__(self, other):
raise TypeError("cannot subtract {dtype}-dtype from {cls}"
.format(dtype=other.dtype,
cls=type(self).__name__))
elif is_categorical_dtype(other):
# Categorical op will raise; defer explicitly
return NotImplemented
else: # pragma: no cover
return NotImplemented

Expand Down
13 changes: 8 additions & 5 deletions pandas/core/indexes/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -551,13 +551,14 @@ def is_all_dates(self):
@property
def is_full(self):
"""
Returns True if there are any missing periods from start to end
Returns True if this PeriodIndex is range-like in that all Periods
between start and end are present, in order.
"""
if len(self) == 0:
return True
if not self.is_monotonic:
raise ValueError('Index is not monotonic')
values = self.values
values = self.asi8
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the docstring for is_full correct? It looks to me like 1) it is flipped True/False and 2) the word "missing" is ambiguous since it can refer to pd.NaT/np.nan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to update, I don't remember why this was added / what is purpose is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I don't remember even why we have this (can follow up later)

return ((values[1:] - values[:-1]) < 2).all()

@property
Expand Down Expand Up @@ -761,17 +762,19 @@ def _sub_datelike(self, other):
return NotImplemented

def _sub_period(self, other):
# If the operation is well-defined, we return an object-Index
# of DateOffsets. Null entries are filled with pd.NaT
if self.freq != other.freq:
msg = _DIFFERENT_FREQ_INDEX.format(self.freqstr, other.freqstr)
raise IncompatibleFrequency(msg)

asi8 = self.asi8
new_data = asi8 - other.ordinal
new_data = np.array([self.freq * x for x in new_data])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to implement DateOffset * arraylike so we don't have to right this broadcasting repeatedly.


if self.hasnans:
new_data = new_data.astype(np.float64)
new_data[self._isnan] = np.nan
# result must be Int64Index or Float64Index
new_data[self._isnan] = tslib.NaT

return Index(new_data)

def shift(self, n):
Expand Down
9 changes: 5 additions & 4 deletions pandas/tests/frame/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,9 +258,10 @@ def test_ops_frame_period(self):
assert df['B'].dtype == object

p = pd.Period('2015-03', freq='M')
off = p.freq
# dtype will be object because of original dtype
exp = pd.DataFrame({'A': np.array([2, 1], dtype=object),
'B': np.array([14, 13], dtype=object)})
exp = pd.DataFrame({'A': np.array([2 * off, 1 * off], dtype=object),
'B': np.array([14 * off, 13 * off], dtype=object)})
tm.assert_frame_equal(p - df, exp)
tm.assert_frame_equal(df - p, -1 * exp)

Expand All @@ -271,7 +272,7 @@ def test_ops_frame_period(self):
assert df2['A'].dtype == object
assert df2['B'].dtype == object

exp = pd.DataFrame({'A': np.array([4, 4], dtype=object),
'B': np.array([16, 16], dtype=object)})
exp = pd.DataFrame({'A': np.array([4 * off, 4 * off], dtype=object),
'B': np.array([16 * off, 16 * off], dtype=object)})
tm.assert_frame_equal(df2 - df, exp)
tm.assert_frame_equal(df - df2, -1 * exp)
15 changes: 9 additions & 6 deletions pandas/tests/indexes/period/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -730,11 +730,12 @@ def test_pi_ops(self):

self._check(idx + 2, lambda x: x - 2, idx)
result = idx - Period('2011-01', freq='M')
exp = pd.Index([0, 1, 2, 3], name='idx')
off = idx.freq
exp = pd.Index([0 * off, 1 * off, 2 * off, 3 * off], name='idx')
tm.assert_index_equal(result, exp)

result = Period('2011-01', freq='M') - idx
exp = pd.Index([0, -1, -2, -3], name='idx')
exp = pd.Index([0 * off, -1 * off, -2 * off, -3 * off], name='idx')
tm.assert_index_equal(result, exp)

@pytest.mark.parametrize('ng', ["str", 1.5])
Expand Down Expand Up @@ -864,14 +865,15 @@ def test_pi_sub_period(self):
freq='M', name='idx')

result = idx - pd.Period('2012-01', freq='M')
exp = pd.Index([-12, -11, -10, -9], name='idx')
off = idx.freq
exp = pd.Index([-12 * off, -11 * off, -10 * off, -9 * off], name='idx')
tm.assert_index_equal(result, exp)

result = np.subtract(idx, pd.Period('2012-01', freq='M'))
tm.assert_index_equal(result, exp)

result = pd.Period('2012-01', freq='M') - idx
exp = pd.Index([12, 11, 10, 9], name='idx')
exp = pd.Index([12 * off, 11 * off, 10 * off, 9 * off], name='idx')
tm.assert_index_equal(result, exp)

result = np.subtract(pd.Period('2012-01', freq='M'), idx)
Expand All @@ -898,11 +900,12 @@ def test_pi_sub_period_nat(self):
freq='M', name='idx')

result = idx - pd.Period('2012-01', freq='M')
exp = pd.Index([-12, np.nan, -10, -9], name='idx')
off = idx.freq
exp = pd.Index([-12 * off, pd.NaT, -10 * off, -9 * off], name='idx')
tm.assert_index_equal(result, exp)

result = pd.Period('2012-01', freq='M') - idx
exp = pd.Index([12, np.nan, 10, 9], name='idx')
exp = pd.Index([12 * off, pd.NaT, 10 * off, 9 * off], name='idx')
tm.assert_index_equal(result, exp)

exp = pd.TimedeltaIndex([np.nan, np.nan, np.nan, np.nan], name='idx')
Expand Down
7 changes: 4 additions & 3 deletions pandas/tests/scalar/period/test_period.py
Original file line number Diff line number Diff line change
Expand Up @@ -572,7 +572,7 @@ def test_strftime(self):
def test_sub_delta(self):
left, right = Period('2011', freq='A'), Period('2007', freq='A')
result = left - right
assert result == 4
assert result == 4 * right.freq

with pytest.raises(period.IncompatibleFrequency):
left - Period('2007-01', freq='M')
Expand Down Expand Up @@ -1064,8 +1064,9 @@ def test_sub(self):
dt1 = Period('2011-01-01', freq='D')
dt2 = Period('2011-01-15', freq='D')

assert dt1 - dt2 == -14
assert dt2 - dt1 == 14
off = dt1.freq
assert dt1 - dt2 == -14 * off
assert dt2 - dt1 == 14 * off

msg = r"Input has different freq=M from Period\(freq=D\)"
with tm.assert_raises_regex(period.IncompatibleFrequency, msg):
Expand Down
5 changes: 3 additions & 2 deletions pandas/tests/series/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -517,16 +517,17 @@ def test_ops_series_period(self):
assert ser.dtype == object

per = pd.Period('2015-01-10', freq='D')
off = per.freq
# dtype will be object because of original dtype
expected = pd.Series([9, 8], name='xxx', dtype=object)
expected = pd.Series([9 * off, 8 * off], name='xxx', dtype=object)
tm.assert_series_equal(per - ser, expected)
tm.assert_series_equal(ser - per, -1 * expected)

s2 = pd.Series([pd.Period('2015-01-05', freq='D'),
pd.Period('2015-01-04', freq='D')], name='xxx')
assert s2.dtype == object

expected = pd.Series([4, 2], name='xxx', dtype=object)
expected = pd.Series([4 * off, 2 * off], name='xxx', dtype=object)
tm.assert_series_equal(s2 - ser, expected)
tm.assert_series_equal(ser - s2, -1 * expected)

Expand Down