New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect exception raised by Series[datetime64] + int #19147

Merged
merged 17 commits into from Jan 17, 2018

Conversation

Projects
None yet
2 participants
@jbrockmendel
Member

jbrockmendel commented Jan 9, 2018

Series[datetime64] +/- int is currently raising a ValueError when it should be raising a TypeError. This was introduced in #19024.

In the process of fixing this bug, this also goes most of the way towards fixing #19123 by raising on add/sub with integer arrays.

Setup:

dti = pd.date_range('2016-01-01', periods=2)
ser = pd.Series(dti)

ATM:

>>> ser + 1
[...]
ValueError: Cannot shift with no freq

After this PR (and also in 0.22.0 and anything before #19024):

>>> ser + 1
[...]
TypeError: incompatible type for a datetime/timedelta operation [__add__]
  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
@@ -703,6 +702,23 @@ def wrapper(left, right, name=name, na_op=na_op):
return wrapper
def _dispatch_to_index_op(op, left, right, index_class):

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 9, 2018

Member

We're going to end up using this again for TimedeltaIndex a few bugfixes from now.

This comment has been minimized.

@jreback

jreback Jan 9, 2018

Contributor

you can de-privatize. rename to 'dispatch_to_index_operation'. needs a doc-string.

@@ -692,6 +693,9 @@ def __add__(self, other):
return self._add_datelike(other)
elif isinstance(other, Index):
return self._add_datelike(other)
elif is_integer_dtype(other) and self.freq is None:
# GH#19123
raise NullFrequencyError("Cannot shift with no freq")

This comment has been minimized.

@jreback

jreback Jan 9, 2018

Contributor

just make this a ValueError, we don't want to have custom error messages normally.

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 9, 2018

Member

We need to catch this specifically in core.ops. Want to catch by checking the error message?

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 9, 2018

Member

don't want to have custom error messages normally

The error message here "Cannot shift with no freq" is the same as the error message raised if adding just an integer when self.freq is None.

@@ -703,6 +702,23 @@ def wrapper(left, right, name=name, na_op=na_op):
return wrapper
def _dispatch_to_index_op(op, left, right, index_class):

This comment has been minimized.

@jreback

jreback Jan 9, 2018

Contributor

you can de-privatize. rename to 'dispatch_to_index_operation'. needs a doc-string.

checking and timezone handling.
"""
left_idx = index_class(left)
left_idx.freq = None # avoid accidentally allowing integer add/sub

This comment has been minimized.

@jreback

jreback Jan 9, 2018

Contributor

this is a mutating operation, don't do it

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 9, 2018

Member

This is necessary. If I change this to an assert left_idx.freq is None that assertion fails in a number of tests:

dti = pd.date_range('1999-09-30', periods=10, tz='US/Pacific')
ser = pd.Series(dti)
>>> pd.DatetimeIndex(ser).freq
<Day>

It looks like this only occurs for tzaware cases.

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 9, 2018

Member

It looks like this only occurs for tzaware cases.

Possibly related to caching of DatetimeIndexes and the fact that Series[datetime64tz]._data.blocks[0].values is itself a DatetimeIndex?

@@ -1226,7 +1226,8 @@ def _get_roll(self, i, before_day_of_month, after_day_of_month):
return roll
def _apply_index_days(self, i, roll):
i += (roll % 2) * Timedelta(days=self.day_of_month).value
nanos = (roll % 2) * Timedelta(days=self.day_of_month).value

This comment has been minimized.

@jreback

jreback Jan 9, 2018

Contributor

leftover from another PR (and I had commented on that), remove from here.

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 9, 2018

Member

You’re rIght that this is duplicated, but it is correct in both places, and this PR breaks without it.

This comment has been minimized.

@jreback

jreback Jan 11, 2018

Contributor

see other PR, this needs a doc-string

@jbrockmendel jbrockmendel changed the title from Fix incorrect exception raises by Series[datetime64] + int to Fix incorrect exception raised by Series[datetime64] + int Jan 9, 2018

@codecov

This comment has been minimized.

codecov bot commented Jan 9, 2018

Codecov Report

Merging #19147 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19147      +/-   ##
==========================================
- Coverage   91.56%   91.53%   -0.03%     
==========================================
  Files         148      148              
  Lines       48856    48870      +14     
==========================================
+ Hits        44733    44735       +2     
- Misses       4123     4135      +12
Flag Coverage Δ
#multiple 89.91% <100%> (-0.03%) ⬇️
#single 41.69% <55.55%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/ops.py 92.17% <100%> (+0.1%) ⬆️
pandas/core/indexes/datetimelike.py 97.11% <100%> (+0.03%) ⬆️
pandas/errors/__init__.py 100% <100%> (ø) ⬆️
pandas/plotting/_converter.py 65.22% <0%> (-1.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c271d4d...2231505. Read the comment docs.

@@ -680,6 +680,25 @@ def test_timedelta_series_ops(self):
assert_series_equal(ts - s, expected2)
assert_series_equal(ts + (-s), expected2)
def test_td64series_add_intlike(self):
# GH#19123

This comment has been minimized.

@jreback

jreback Jan 11, 2018

Contributor

don't conjoin words

test_td64_series_and_integer_like

@@ -1428,6 +1441,25 @@ def test_dt64series_arith_overflow(self):
res = dt - ser
tm.assert_series_equal(res, -expected)
def test_dt64series_add_intlike(self):

This comment has been minimized.

@jreback

jreback Jan 11, 2018

Contributor

same

@@ -1226,7 +1226,8 @@ def _get_roll(self, i, before_day_of_month, after_day_of_month):
return roll
def _apply_index_days(self, i, roll):
i += (roll % 2) * Timedelta(days=self.day_of_month).value
nanos = (roll % 2) * Timedelta(days=self.day_of_month).value

This comment has been minimized.

@jreback

jreback Jan 11, 2018

Contributor

see other PR, this needs a doc-string

left_idx.freq = None
try:
result = op(left_idx, right)
except NullFrequencyError:

This comment has been minimized.

@jreback

jreback Jan 11, 2018

Contributor

ok this is fine then. Please update tests where this should be caught (e.g. .shift)

# avoid accidentally allowing integer add/sub. For datetime64[tz] dtypes,
# left_idx may inherit a freq from a cached DatetimeIndex.
# See discussion in GH#19147.
left_idx.freq = None

This comment has been minimized.

@jreback

jreback Jan 11, 2018

Contributor

actually I don't like this mutation here. why is it necessary? (e.g. if one of the end-points is None, e.g. an integer series this will still raise, yes?)

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 11, 2018

Member

Because if left is tzaware then pd.DatetimeIndex(left, freq=None) may come back with non-None freq.

dti = pd.date_range('2016-01-01', periods=2, tz='US/Pacific')
ser = pd.Series(dti)

>>> pd.DatetimeIndex(ser, freq=None).freq
<Day>

This comment has been minimized.

@jreback

jreback Jan 11, 2018

Contributor

and why is this a problem?

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 11, 2018

Member

Because in this scenario, without the left_idx.freq = None line, ser + 1 fails to raise.

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 11, 2018

Member

Just parametrized test_dt64_series_add_intlike to make sure the tzaware case is covered.

This comment has been minimized.

@jreback

jreback Jan 13, 2018

Contributor

so you didn't answer my question here. I don't want mutation.

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 13, 2018

Member

I really, really did. Without setting the frequency to none, integer addition fails to raise.

This comment has been minimized.

@jreback

jreback Jan 14, 2018

Contributor

then we need a better way to detect this, we cannot mutate things to just see if they raise

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Jan 12, 2018

Re-push or let sit?

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 12, 2018

needs a rebase

jbrockmendel added some commits Jan 12, 2018

@@ -593,6 +594,12 @@ def test_nat_new(self):
exp = np.array([tslib.iNaT] * 5, dtype=np.int64)
tm.assert_numpy_array_equal(result, exp)

This comment has been minimized.

@jreback

jreback Jan 13, 2018

Contributor

there are very likely tests that are raising ValueError that should now be NullFrequencyError, pls change them.

You can find them very easily, by temporarly changing the error where NullFrequencyInhertis to something else (e.g. KeyError) and then seeing what raises

This comment has been minimized.

@jbrockmendel

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 14, 2018

Member

Found and changed one.

This comment has been minimized.

@jreback

jreback Jan 14, 2018

Contributor

ok great

# avoid accidentally allowing integer add/sub. For datetime64[tz] dtypes,
# left_idx may inherit a freq from a cached DatetimeIndex.
# See discussion in GH#19147.
left_idx.freq = None

This comment has been minimized.

@jreback

jreback Jan 14, 2018

Contributor

then we need a better way to detect this, we cannot mutate things to just see if they raise

class NullFrequencyError(ValueError):
"""

This comment has been minimized.

@jreback

jreback Jan 14, 2018

Contributor

this needs a line in the api changes section

This comment has been minimized.

@jbrockmendel
@@ -593,6 +594,12 @@ def test_nat_new(self):
exp = np.array([tslib.iNaT] * 5, dtype=np.int64)
tm.assert_numpy_array_equal(result, exp)

This comment has been minimized.

@jreback

jreback Jan 14, 2018

Contributor

ok great

@jreback jreback added this to the 0.23.0 milestone Jan 16, 2018

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 16, 2018

needs a rebase, otherwise lgtm. ping on green.

@@ -273,6 +273,7 @@ Other API Changes
- The default ``Timedelta`` constructor now accepts an ``ISO 8601 Duration`` string as an argument (:issue:`19040`)
- ``IntervalDtype`` now returns ``True`` when compared against ``'interval'`` regardless of subtype, and ``IntervalDtype.name`` now returns ``'interval'`` regardless of subtype (:issue:`18980`)
- :func:`Series.to_csv` now accepts a ``compression`` argument that works in the same way as the ``compression`` argument in :func:`DataFrame.to_csv` (:issue:`18958`)
- :func:`DatetimeIndex.shift` and :func:`TimedeltaIndex.shift` will now raise ``NullFrequencyError`` (which subclasses ``ValueError``) when the index object frequency is ``None`` (:issue:`19147`)

This comment has been minimized.

@jreback

jreback Jan 16, 2018

Contributor

say something like: which subclasses ValueError, the currently raised exception

This comment has been minimized.

@jbrockmendel
@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Jan 16, 2018

Ping

@jbrockmendel jbrockmendel referenced this pull request Jan 16, 2018

Merged

CLN: Remove unused core.internals methods #19250

2 of 4 tasks complete
@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Jan 16, 2018

One more bugfix is ready to go once this goes in, plus another PR getting rid of _Op and _TimeOp.

@jreback jreback merged commit 3db6f66 into pandas-dev:master Jan 17, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Contributor

jreback commented Jan 17, 2018

thanks!

@jbrockmendel jbrockmendel deleted the jbrockmendel:nullfreq branch Feb 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment