New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DatetimeIndex + arraylike of DateOffsets #18849

Merged
merged 13 commits into from Dec 29, 2017

Conversation

Projects
None yet
3 participants
@jbrockmendel
Member

jbrockmendel commented Dec 19, 2017

Before:

>>> dti = pd.date_range('2017-01-01', periods=2)
>>> other = np.array([pd.offsets.MonthEnd(), pd.offsets.Day(n=2)])

>>> dti + other
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('O')

# Same for `dti - other`, `dti + pd.Index(other)`, `dti - pd.Index(other)`

>>> dti + pd.Series(other)
0    DatetimeIndex(['2017-01-31', '2017-01-31'], dt...
1    DatetimeIndex(['2017-01-03', '2017-01-04'], dt...
dtype: object

# yikes.

After:

>>> dti + other
pandas/core/indexes/datetimelike.py:677: PerformanceWarning: Adding/subtracting array of DateOffsets to <class 'pandas.core.indexes.datetimes.DatetimeIndex'> not vectorized
  PerformanceWarning)
DatetimeIndex(['2017-01-31', '2017-01-04'], dtype='datetime64[ns]', freq=None)

>>> dti - pd.Index(other)
DatetimeIndex(['2016-12-31', '2016-12-31'], dtype='datetime64[ns]', freq=None)

>>> dti + pd.Series(other)
0   2017-01-31
1   2017-01-04
dtype: datetime64[ns]

Caveat This will need a follow-up to make sure name attribute is propogated correctly.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
@gfyoung

This comment has been minimized.

Member

gfyoung commented Dec 19, 2017

@jbrockmendel : As this PR stands, I'm a little uneasy about it. Besides the name bug that you mention as a "caveat," the fact that you have that PerformanceWarning, while not super problematic, is still a little bothersome.

@pytest.mark.parametrize('box', [np.array, pd.Index])
def test_dti_add_offset_array(self, tz, box):
dti = pd.date_range('2017-01-01', periods=2, tz=tz)
# TODO: check that `name` propogates correctly

This comment has been minimized.

@gfyoung

gfyoung Dec 19, 2017

Member

Reference issue number below all of your new tests.

return self - other[0]
else:
from pandas.errors import PerformanceWarning
warnings.warn("Adding/subtracting array of DateOffsets to "

This comment has been minimized.

@gfyoung

gfyoung Dec 19, 2017

Member

Ah, I see that you added this yourself. Still, feel a little uneasy about this.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Dec 19, 2017

the fact that you have that PerformanceWarning, while not super problematic, is still a little bothersome.

This is explicitly copying the Series behavior.

@gfyoung

This comment has been minimized.

Member

gfyoung commented Dec 19, 2017

This is explicitly copying the Series behavior.

Hmmm...I see. Theoretically, it makes sense. Though if we can avoid adding this warning, that would be great. Thus, either you can confirm that there is indeed a performance difference OR the code is rewritten so that we avoid this overhead.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Dec 20, 2017

Thus, either you can confirm that there is indeed a performance difference OR the code is rewritten so that we avoid this overhead

I've been working on offsets a lot recently (and upcoming, see #18854). Perf has improved quite a bit, but I can absolutely confirm that this is not a speedy operation.

if len(other) == 1:
return self + other[0]
else:
from pandas.errors import PerformanceWarning

This comment has been minimized.

@jreback

jreback Dec 20, 2017

Contributor

so you are adding code here that already exists in datetimes.py:_add_offset ? is there a reason you are not dispatching to that (which may need to handle array-likes in addition to scalars). Further I am not in love with explicit type checking for DTI here. again why are you not handling that at a higher level.

This comment has been minimized.

@jbrockmendel

jbrockmendel Dec 20, 2017

Member

_add_offset is for a single offset which does not have an implementation of apply_index. This is for an array of offsets. It's a case handled by Series but not by DatetimeIndex.

I can make a _add_offset_array and dispatch to that.

@jreback jreback added the Frequency label Dec 20, 2017

jbrockmendel added some commits Dec 20, 2017

@codecov

This comment has been minimized.

codecov bot commented Dec 23, 2017

Codecov Report

Merging #18849 into master will decrease coverage by 0.02%.
The diff coverage is 90.32%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18849      +/-   ##
==========================================
- Coverage    91.6%   91.58%   -0.03%     
==========================================
  Files         150      150              
  Lines       48939    48966      +27     
==========================================
+ Hits        44833    44845      +12     
- Misses       4106     4121      +15
Flag Coverage Δ
#multiple 89.94% <90.32%> (-0.03%) ⬇️
#single 41.72% <19.35%> (+0.55%) ⬆️
Impacted Files Coverage Δ
pandas/core/ops.py 90.24% <100%> (+0.03%) ⬆️
pandas/core/indexes/datetimes.py 95.45% <86.66%> (-0.13%) ⬇️
pandas/core/indexes/datetimelike.py 97.04% <92.3%> (-0.16%) ⬇️
pandas/core/indexes/interval.py 92.61% <0%> (-1.21%) ⬇️
pandas/core/sparse/array.py 91.82% <0%> (-0.47%) ⬇️
pandas/util/testing.py 84.68% <0%> (-0.22%) ⬇️
pandas/core/dtypes/cast.py 88.42% <0%> (-0.18%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbec3c9...81c4fbf. Read the comment docs.

@jreback

lgtm. just some additional testing requested around series + offset series names.

name=dti.name, freq='infer')
tm.assert_index_equal(res, expected)
def test_dti_with_offset_series(self, tz):

This comment has been minimized.

@jreback

jreback Dec 23, 2017

Contributor

can you parametrize this with name (None, same name, other name) to make sure they are propagated correctly.

This comment has been minimized.

@jbrockmendel

jbrockmendel Dec 24, 2017

Member

Good call. Looks like Series.op(Index) was always taking on the name of the Series.

@@ -282,6 +282,7 @@ Conversion
- Bug in :meth:`Index.astype` with a categorical dtype where the resultant index is not converted to a :class:`CategoricalIndex` for all types of index (:issue:`18630`)
- Bug in :meth:`Series.astype` and ``Categorical.astype()`` where an existing categorical data does not get updated (:issue:`10696`, :issue:`18593`)
- Bug in :class:`Series` constructor with an int or float list where specifying ``dtype=str``, ``dtype='str'`` or ``dtype='U'`` failed to convert the data elements to strings (:issue:`16605`)
- Bug in :class:`DatetimeIndex` where adding or subtracting an array-like of ``DateOffset`` objects either raised (``np.array``, ``pd.Index``) or broadcast incorrectly (``pd.Series``) (:issue:`18224`)

This comment has been minimized.

@jreback

jreback Dec 23, 2017

Contributor

use this PR number here (as 18224 is a very general reference). Is there any issue for this one specifically? (don't create one, just if there is an open one).

@jreback jreback added this to the 0.23.0 milestone Dec 29, 2017

@jreback jreback merged commit 7818d51 into pandas-dev:master Dec 29, 2017

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Contributor

jreback commented Dec 29, 2017

thanks!

@jbrockmendel jbrockmendel referenced this pull request Dec 29, 2017

Open

DataFrame vs Series vs Index arithmetic Roundup #18824

37 of 59 tasks complete

hexgnu added a commit to hexgnu/pandas that referenced this pull request Jan 1, 2018

@jbrockmendel jbrockmendel referenced this pull request Jan 5, 2018

Merged

Fix TimedeltaIndex +/- offset array #19095

3 of 4 tasks complete

@jbrockmendel jbrockmendel deleted the jbrockmendel:dti_add_offset_array branch Jan 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment