Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix near-minimum timestamp handling #57314

Merged
merged 11 commits into from Feb 22, 2024

Conversation

robert-schmidtke
Copy link
Contributor

@robert-schmidtke robert-schmidtke commented Feb 9, 2024

Previously, timestamps 1677-09-21 00:12:43.145224193 thru 1677-09-21 00:12:43.145224999 would raise an OverflowError due to the fact that scaleMicrosecondsToNanoseconds overflows as an intermediate step. After adding back the nanoseconds part, the timestamp would be valid again.

@robert-schmidtke
Copy link
Contributor Author

@WillAyd where would I add tests for this? If I'm not mistaken, then the npy_datetimestruct_to_datetime function is never tested directly. Should I use the reproducible example in the original issue or come up with a narrower test?

@simonjayhawkins simonjayhawkins added this to the 2.2.1 milestone Feb 9, 2024
@simonjayhawkins simonjayhawkins added Regression Functionality that used to work in a prior pandas version Timestamp pd.Timestamp and associated methods labels Feb 9, 2024
@robert-schmidtke
Copy link
Contributor Author

@WillAyd where would I add tests for this? If I'm not mistaken, then the npy_datetimestruct_to_datetime function is never tested directly. Should I use the reproducible example in the original issue or come up with a narrower test?

I'll fix the failing Windows tests and found in the documentation some guidance on where and how to add the tests. Will mark ready for review once done.

@WillAyd
Copy link
Member

WillAyd commented Feb 10, 2024

Heads up - I won't be able to review in detail until early next week

@robert-schmidtke
Copy link
Contributor Author

robert-schmidtke commented Feb 16, 2024

I have started over with a failing test, see the results on b4c825d.

To resolve, I went with a minimally invasive approach that just checks for negative extreme before scaling to microseconds, and then doing the calculation backwards from the minimum valid timestamp. I didn't find any constant defined for this (NPY_MIN_INT64 + 1), so instead of polluting the global namespace in np_datetime.h, I went with a local variable. This is probably compile-time optimized, so I expect no overhead by doing it this way.

I only fixed the nanosecond case, because I was unable to devise tests for any other unit that would trigger this overflow, because:

  1. any datetime string is parsed by dateutil, which seems to hiccup when trying to parse dates like -290308-12-21T19:59:05.224193 (the minimum valid microsecond timestamp, for example). So coarser resolutions than ns wouldn't work.
  2. for finer resolutions than ns, I was not sure how to execute the code paths, as the pandas timestamps seem to default to ns always, thus exiting at the branch that this PR fixes. Also, np_datetime.pyx says: # AS, FS, PS versions exist but are not imported because they are not used.

@robert-schmidtke robert-schmidtke marked this pull request as ready for review February 16, 2024 14:39
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool this looks pretty good. Thanks for the great work

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @mroeschke

@mroeschke mroeschke merged commit 8b313d3 into pandas-dev:main Feb 22, 2024
47 checks passed
@mroeschke
Copy link
Member

Thanks @robert-schmidtke

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Feb 22, 2024
mroeschke pushed a commit that referenced this pull request Feb 23, 2024
…andling) (#57573)

Backport PR #57314: BUG: Fix near-minimum timestamp handling

Co-authored-by: Robert Schmidtke <robert-schmidtke@users.noreply.github.com>
@robert-schmidtke robert-schmidtke deleted the bugfix/timestamp-overflow branch February 23, 2024 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Timestamp pd.Timestamp and associated methods
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Localizing Timestamp.min changes dtype to object
4 participants