Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to_timedelta np.int32 casting bug with NumPy 2 #57984

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

spencerkclark
Copy link
Contributor

This PR proposes a fix for #56996. I'd be happy to make adjustments as needed and eventually add a what's new entry to the appropriate file.

# If ts is an integer then the fractional component will always be
# zero. It helps to set this explicitly following changes to type
# promotion behavior in NEP 50 (GH 56996).
frac = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we can do this outright since pandas will still need to support numpy < 2 behavior. Maybe ts should be cast to int instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look @mroeschke—I thought my fix would preserve NumPy < 2 behavior too, but maybe I am missing something subtle. I switched to your suggested approach. Let me know if that looks OK.

# type promotion changes in NEP 50. If this is not done, for example,
# np.int32 values for ts can lead to np.int32 values for frac, which can
# raise unexpected overflow errors downstream (GH 56996).
if isinstance(ts, np.integer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not just declare frac to be int64 if that is what is required? Generally we are casting too much in this function. I am also surprised that frac - base doesn't follow the C implicit conversion rules, since base is int64 at this point; I feel like simplifying this and using explicit types would help across the board

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frac is mainly needed for when ts is a floating point value, in which case it also needs to be a floating point value, so we cannot declare it to be a 64-bit integer up front.

In thinking about the floating point case, it occurs to me that NEP 50 also brings slight answer changes in that context as well. E.g. with NumPy < 2:

>>> pd.to_timedelta(np.float32(3.2), "D")
Timedelta('3 days 04:48:00.004147200')

and with NumPy >= 2:

>>> pd.to_timedelta(np.float32(3.2), "D")
Timedelta('3 days 04:48:00.005046272')

Again this is due to changes in type promotion rules for frac = ts - base. Previously np.float32(3.2) - 3 would return a 64-bit float, but with NumPy >= 2 it returns a 32-bit float.

Is that something we would also like to address?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what would help this function is giving frac a type, and maybe making another variable if we need branching. Mixing Python objects into mostly "typed" Cython code like this is tough to follow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and sorry for the delay in getting back. I pushed an update in 6297494—let me know if that is along the lines of what you were thinking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it seems like declaring frac_float64 to be a float64_t type changes answers in the 32-bit build. Do you have any suggestions for how to address that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work if you just use the float data type? Also can you get rid of the isinstance check with the current design?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried and unfortunately it seems like it does not—I think float locks it to always be a float32 value in Cython.

I have not taken the time to fully understand it yet—I'm admittedly a bit confused—but in my testing in a Docker container the only approach that seems to work on 32-bit platforms is the one where the result of ts - base is assigned to a variable with an undeclared type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat surprised there is no build warnings to hint us at the issue. @lithomas1 is there something else we need to do to make build warnings visible in CI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: to_timedelta raises unexpected OutOfBoundsTimedelta error with development version of NumPy
3 participants