Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: silent overflow in DateTimeArray subtraction #22508

Merged
merged 5 commits into from
Aug 31, 2018

Conversation

shengpu-tang
Copy link
Contributor

@shengpu-tang shengpu-tang commented Aug 26, 2018

@shengpu-tang
Copy link
Contributor Author

Hi @jbrockmendel , thanks for the follow-up.

  • With the suggested change, it now throws OverflowError: Overflow in int64 addition. Is this desired behavior?
  • Where should I add the tests? Maybe under tests/arrays/test_datetimelike.py?

@codecov
Copy link

codecov bot commented Aug 26, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@fa47b8d). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #22508   +/-   ##
=========================================
  Coverage          ?   92.03%           
=========================================
  Files             ?      169           
  Lines             ?    50780           
  Branches          ?        0           
=========================================
  Hits              ?    46737           
  Misses            ?     4043           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.44% <100%> (?)
#single 42.25% <100%> (?)
Impacted Files Coverage Δ
pandas/core/arrays/datetimes.py 95.52% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa47b8d...cfaf42b. Read the comment docs.

@gfyoung gfyoung added Bug Timedelta Timedelta data type labels Aug 26, 2018
@gfyoung
Copy link
Member

gfyoung commented Aug 26, 2018

@shengpu1126 : You should make the changes so that it gets the expected output that you reported in the original issue. And yes, that does seem like a good place to put the test. Don't forget a whatsnew entry too.

@jbrockmendel
Copy link
Member

@gfyoung it isn’t that easy. If you look at the scalar op in the original example, it returns a stdlib timedelta, not a pd.Timedelta.

@shengpu1126 i think raising overflowerror is correct, will doublecheck.

@gfyoung
Copy link
Member

gfyoung commented Aug 26, 2018

it isn’t that easy. If you look at the scalar op in the original example, it returns a stdlib timedelta, not a pd.Timedelta.

@jbrockmendel : Ah, I see. My response was motivated by the fact that you didn't comment on @shengpu1126 expected output in the original issue, which I took to be a tacit agreement of the expected behavior for patching this bug.

@jbrockmendel
Copy link
Member

Test will go in tests.arithmetic.test_datetime64

@pep8speaks
Copy link

pep8speaks commented Aug 28, 2018

Hello @shengpu1126! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 30, 2018 at 15:56 Hours UTC

@@ -582,6 +582,7 @@ Datetimelike
- Bug in :class:`DataFrame` comparisons against ``Timestamp``-like objects failing to raise ``TypeError`` for inequality checks with mismatched types (:issue:`8932`,:issue:`22163`)
- Bug in :class:`DataFrame` with mixed dtypes including ``datetime64[ns]`` incorrectly raising ``TypeError`` on equality comparisons (:issue:`13128`,:issue:`22163`)
- Bug in :meth:`DataFrame.eq` comparison against ``NaT`` incorrectly returning ``True`` or ``NaN`` (:issue:`15697`,:issue:`22163`)
- Bug in :class:`DatetimeIndex` subtraction that incorrectly failed to raise `OverflowError` (:issue:22492, :issue:22508)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a little more here, e.g. when this fails to raise

@@ -1570,6 +1570,27 @@ def test_datetimeindex_sub_timestamp_overflow(self):
with pytest.raises(OverflowError):
dtimin - variant

def test_datetimeindex_sub_datetimeindex_overflow(self):
dtimax = pd.to_datetime(['now', pd.Timestamp.max])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment with the issue number

@jreback jreback added this to the 0.24.0 milestone Aug 29, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments. ping on green (or at least except for 3.6/3.5 builds are currently failing to an unrelated error). so might want to wait a little bit to rebase.

@@ -582,6 +582,7 @@ Datetimelike
- Bug in :class:`DataFrame` comparisons against ``Timestamp``-like objects failing to raise ``TypeError`` for inequality checks with mismatched types (:issue:`8932`,:issue:`22163`)
- Bug in :class:`DataFrame` with mixed dtypes including ``datetime64[ns]`` incorrectly raising ``TypeError`` on equality comparisons (:issue:`13128`,:issue:`22163`)
- Bug in :meth:`DataFrame.eq` comparison against ``NaT`` incorrectly returning ``True`` or ``NaN`` (:issue:`15697`,:issue:`22163`)
- Bug in :class:`DatetimeIndex` subtraction that incorrectly failed to raise `OverflowError` when the difference between :class:`Timestamp`s (a :class:`Timedelta` object) exceeds `int64` limit (:issue:`22492`, :issue:`22508`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now confusing. let's go back to the prior one;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry about that!
for the record, it was mentioned here that pd.Timedeltas are represented as nanoseconds using 64 bit integers, and I believe that's the source of potential overflow.

dtimax - ts_neg

expected = pd.Timestamp.max.value - ts_pos[1].value
res = dtimax - ts_pos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result (rather than res) in all occurrences

assert res[1].value == expected

with pytest.raises(OverflowError):
dtimin - ts_pos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment or 2 to delineate the various cases

@jreback jreback merged commit 8c128a3 into pandas-dev:master Aug 31, 2018
@jreback
Copy link
Contributor

jreback commented Aug 31, 2018

thanks!

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect timedelta computation of pd.Series datetime64[ns] if timedelta is too large
5 participants