Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DatetimeIndex + arraylike of DateOffsets #18849

Merged
merged 13 commits into from
Dec 29, 2017

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Dec 19, 2017

Before:

>>> dti = pd.date_range('2017-01-01', periods=2)
>>> other = np.array([pd.offsets.MonthEnd(), pd.offsets.Day(n=2)])

>>> dti + other
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('O')

# Same for `dti - other`, `dti + pd.Index(other)`, `dti - pd.Index(other)`

>>> dti + pd.Series(other)
0    DatetimeIndex(['2017-01-31', '2017-01-31'], dt...
1    DatetimeIndex(['2017-01-03', '2017-01-04'], dt...
dtype: object

# yikes.

After:

>>> dti + other
pandas/core/indexes/datetimelike.py:677: PerformanceWarning: Adding/subtracting array of DateOffsets to <class 'pandas.core.indexes.datetimes.DatetimeIndex'> not vectorized
  PerformanceWarning)
DatetimeIndex(['2017-01-31', '2017-01-04'], dtype='datetime64[ns]', freq=None)

>>> dti - pd.Index(other)
DatetimeIndex(['2016-12-31', '2016-12-31'], dtype='datetime64[ns]', freq=None)

>>> dti + pd.Series(other)
0   2017-01-31
1   2017-01-04
dtype: datetime64[ns]

Caveat This will need a follow-up to make sure name attribute is propogated correctly.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@gfyoung gfyoung added Datetime Datetime data dtype Bug labels Dec 19, 2017
@gfyoung
Copy link
Member

gfyoung commented Dec 19, 2017

@jbrockmendel : As this PR stands, I'm a little uneasy about it. Besides the name bug that you mention as a "caveat," the fact that you have that PerformanceWarning, while not super problematic, is still a little bothersome.

@pytest.mark.parametrize('box', [np.array, pd.Index])
def test_dti_add_offset_array(self, tz, box):
dti = pd.date_range('2017-01-01', periods=2, tz=tz)
# TODO: check that `name` propogates correctly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference issue number below all of your new tests.

return self - other[0]
else:
from pandas.errors import PerformanceWarning
warnings.warn("Adding/subtracting array of DateOffsets to "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that you added this yourself. Still, feel a little uneasy about this.

@jbrockmendel
Copy link
Member Author

the fact that you have that PerformanceWarning, while not super problematic, is still a little bothersome.

This is explicitly copying the Series behavior.

@gfyoung
Copy link
Member

gfyoung commented Dec 19, 2017

This is explicitly copying the Series behavior.

Hmmm...I see. Theoretically, it makes sense. Though if we can avoid adding this warning, that would be great. Thus, either you can confirm that there is indeed a performance difference OR the code is rewritten so that we avoid this overhead.

@jbrockmendel
Copy link
Member Author

Thus, either you can confirm that there is indeed a performance difference OR the code is rewritten so that we avoid this overhead

I've been working on offsets a lot recently (and upcoming, see #18854). Perf has improved quite a bit, but I can absolutely confirm that this is not a speedy operation.

if len(other) == 1:
return self + other[0]
else:
from pandas.errors import PerformanceWarning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you are adding code here that already exists in datetimes.py:_add_offset ? is there a reason you are not dispatching to that (which may need to handle array-likes in addition to scalars). Further I am not in love with explicit type checking for DTI here. again why are you not handling that at a higher level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_add_offset is for a single offset which does not have an implementation of apply_index. This is for an array of offsets. It's a case handled by Series but not by DatetimeIndex.

I can make a _add_offset_array and dispatch to that.

@jreback jreback added the Frequency DateOffsets label Dec 20, 2017
@codecov
Copy link

codecov bot commented Dec 23, 2017

Codecov Report

Merging #18849 into master will decrease coverage by 0.02%.
The diff coverage is 90.32%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18849      +/-   ##
==========================================
- Coverage    91.6%   91.58%   -0.03%     
==========================================
  Files         150      150              
  Lines       48939    48966      +27     
==========================================
+ Hits        44833    44845      +12     
- Misses       4106     4121      +15
Flag Coverage Δ
#multiple 89.94% <90.32%> (-0.03%) ⬇️
#single 41.72% <19.35%> (+0.55%) ⬆️
Impacted Files Coverage Δ
pandas/core/ops.py 90.24% <100%> (+0.03%) ⬆️
pandas/core/indexes/datetimes.py 95.45% <86.66%> (-0.13%) ⬇️
pandas/core/indexes/datetimelike.py 97.04% <92.3%> (-0.16%) ⬇️
pandas/core/indexes/interval.py 92.61% <0%> (-1.21%) ⬇️
pandas/core/sparse/array.py 91.82% <0%> (-0.47%) ⬇️
pandas/util/testing.py 84.68% <0%> (-0.22%) ⬇️
pandas/core/dtypes/cast.py 88.42% <0%> (-0.18%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbec3c9...81c4fbf. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. just some additional testing requested around series + offset series names.

name=dti.name, freq='infer')
tm.assert_index_equal(res, expected)

def test_dti_with_offset_series(self, tz):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parametrize this with name (None, same name, other name) to make sure they are propagated correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Looks like Series.op(Index) was always taking on the name of the Series.

@@ -282,6 +282,7 @@ Conversion
- Bug in :meth:`Index.astype` with a categorical dtype where the resultant index is not converted to a :class:`CategoricalIndex` for all types of index (:issue:`18630`)
- Bug in :meth:`Series.astype` and ``Categorical.astype()`` where an existing categorical data does not get updated (:issue:`10696`, :issue:`18593`)
- Bug in :class:`Series` constructor with an int or float list where specifying ``dtype=str``, ``dtype='str'`` or ``dtype='U'`` failed to convert the data elements to strings (:issue:`16605`)
- Bug in :class:`DatetimeIndex` where adding or subtracting an array-like of ``DateOffset`` objects either raised (``np.array``, ``pd.Index``) or broadcast incorrectly (``pd.Series``) (:issue:`18224`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use this PR number here (as 18224 is a very general reference). Is there any issue for this one specifically? (don't create one, just if there is an open one).

@jreback jreback merged commit 7818d51 into pandas-dev:master Dec 29, 2017
@jreback
Copy link
Contributor

jreback commented Dec 29, 2017

thanks!

hexgnu pushed a commit to hexgnu/pandas that referenced this pull request Jan 1, 2018
@jbrockmendel jbrockmendel deleted the dti_add_offset_array branch January 5, 2018 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Frequency DateOffsets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants