Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: overflow-safe astype for datetime64/timedelta64 unit conversion #16352

Open
jbrockmendel opened this issue May 22, 2020 · 4 comments
Open

Comments

@jbrockmendel
Copy link
Contributor

import numpy as np
from pandas._libs.tslibs.conversion import ensure_datetime64ns
arr = np.array(["2367-12-31 12:00:00"], dtype="datetime64[h]")

>>> arr.astype("datetime64[ns]")
array(['1783-06-11T12:25:26.290448384'], dtype='datetime64[ns]')

>>> ensure_datetime64ns(arr)
[...]
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2367-12-31 12:00:00

Is there a compelling reason numpy doesnt do this?

@seberg
Copy link
Member

seberg commented May 22, 2020

No, we should raise it. It is a bit clumsy to raise errors from within the casting machinery currently, I think that is pretty much the only reason...

EDIT: There could be a point of just returning NaT, although not really better IMO.

@jbrockmendel
Copy link
Contributor Author

It is a bit clumsy to raise errors from within the casting machinery currently

No idea how to be helpful on that front, but here are links to the relevant pandas code that may be upstream-able:

https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/tslibs/conversion.pyx#L155
https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/tslibs/np_datetime.pyx#L95
https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/tslibs/src/datetime/np_datetime.c#L250

@jbrockmendel
Copy link
Contributor Author

In addition to astype, it'd be nice if addition/subtraction were overflow-safe

np.seterr(over="raise")

arr = np.arange(10)*10**18
arr2 = arr.view("datetime64[ns]")
arr3 = arr.view("timedelta64[ns]")

>>> arr2 + arr3
array(['1970-01-01T00:00:00.000000000', '2033-05-18T03:33:20.000000000',
       '2096-10-02T07:06:40.000000000', '2160-02-18T10:40:00.000000000',
       '2223-07-06T14:13:20.000000000', '1702-05-02T18:12:06.290448384',
       '1765-09-16T21:45:26.290448384', '1829-02-02T01:18:46.290448384',
       '1892-06-19T04:52:06.290448384', '1955-11-05T08:25:26.290448384'],
      dtype='datetime64[ns]')

@jbrockmendel
Copy link
Contributor Author

Just implemented a version of this in pandas-dev/pandas#46478 that I'm hoping we can upstream to numpy somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants