Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Period overflows with dates > pd.Timestamp.max 2262-04-11 #28104

Open
Tracked by #46587
paul-lilley opened this issue Aug 22, 2019 · 13 comments
Open
Tracked by #46587

Period overflows with dates > pd.Timestamp.max 2262-04-11 #28104

paul-lilley opened this issue Aug 22, 2019 · 13 comments
Labels
Bug Datetime Datetime data dtype Non-Nano datetime64/timedelta64 with non-nanosecond resolution Period Period data type

Comments

@paul-lilley
Copy link
Contributor

paul-lilley commented Aug 22, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
g = pd.Period(year=2262, month=4, day=11, freq='D')
print(g.start_time)
g2 = pd.Period(year=2262, month=4, day=12, freq='D')
print(g2.start_time)

Output: (with pandas 0.25.0)
2262-04-11 00:00:00
1677-09-21 00:25:26.290448384

with pandas 0.25.0+216.g1aca08aa6
I get an OutOfBoundsDatetime error

print(g2.start_time)
Traceback (most recent call last):
File "", line 1, in
File "pandas/_libs/tslibs/period.pyx", line 1768, in pandas._libs.tslibs.period._Period.start_time.get
File "pandas/_libs/tslibs/period.pyx", line 1811, in pandas._libs.tslibs.period._Period.to_timestamp
File "pandas/_libs/tslibs/period.pyx", line 1190, in pandas._libs.tslibs.period.period_ordinal_to_dt64
File "pandas/_libs/tslibs/np_datetime.pyx", line 118, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2262-04-12 00:00:00

Problem description

Overflow means that the Period is incorrect, and also hinders use of Period for date/times outside the limits of pd.Timestamp

Expected Output

Output:
2262-04-11 00:00:00
2262-04-12 00:00:00

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.6.6.final.0 python-bits : 64 OS : Windows OS-release : 7 machine : AMD64 processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None

pandas : 0.25.0
numpy : 1.16.3
pytz : 2018.9
dateutil : 2.7.5
pip : 19.1.1
setuptools : 40.8.0
Cython : None
pytest : 4.2.0
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.2
lxml.etree : 4.3.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : None
pandas_datareader: None
bs4 : 4.7.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.3.3
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : 1.3.1
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : 1.1.2

@jbrockmendel
Copy link
Member

pd.Timestamp is only implemented within the bounds Timestamp('1677-09-21 00:12:43.145225') to Timestamp('2262-04-11 23:47:16.854775807'). Outside of that range we could have start_time and end_time return datetime objects I guess

@WillAyd WillAyd added the Datetime Datetime data dtype label Aug 25, 2019
@WillAyd WillAyd added this to the Contributions Welcome milestone Aug 25, 2019
@paul-lilley
Copy link
Contributor Author

This looks related to https://github.com/pandas-dev/pandas/pull/27916/ but outside my comfort zone to try altering to return python datetime objects for start_time and end_time when the Period is outside the range of pd.Timestamp

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 27, 2019 via email

@paul-lilley
Copy link
Contributor Author

I can't really comment on the advantages of returning a Timestamp, but as Period is the recommended means to handle datetimes outside pd.Timestamp (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#representing-out-of-bounds-spans) it makes more sense (at least to me) that Period methods don't raise exceptions with datetimes outside pd.Timestamp.
My limited use case is trying to resolve https://github.com/pandas-dev/pandas/issues/20927, which currently I've done using datetime objects

@TomAugspurger
Copy link
Contributor

but as Period is the recommended means to handle datetimes outside pd.Timestamp

That's mostly in the context of an array of values inside a Series / DataFrame. When we're returning scalar, there's no need to return a Period (which has different semantics to a datetime).

@paul-lilley
Copy link
Contributor Author

Good point.
I agree that it makes sense to deprecate Period.start_time and Period.end_time
in favor of a property that always returns a datetime

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 27, 2019 via email

@paul-lilley
Copy link
Contributor Author

Sounds good, Period.start_datetime and Period.end_datetime

@jbrockmendel
Copy link
Member

this will still run up against the datetime implementation bounds. do we care about those?

@paul-lilley
Copy link
Contributor Author

Fair point, though at least the datetime.MINYEAR to datetime.MAXYEAR range covers the typical 'max-date' values (9999-12-31 in DB2 and Teradata, 31DEC9999 in SAS) that I've encountered in datawarehousing tables used to implement type 2 slowly changing dimensions with start_date/end_date. Often the currently valid row is marked by end_date = some (arbitrarily high) 'max-date' to avoid nulls in the column.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 30, 2019 via email

@MinKimIP
Copy link

MinKimIP commented Dec 1, 2019

Hi guys, just to add:
when reading SQL results from cx_oracle to pandas, rather than throwing an out of range error, the returned pandas dataframe converts datetimes from
9999-12-31 to a particular timestamp in the year 1822, and
2999-12-31 to a particular timestamp in the year 1830.
Basically restarting from 1677 for dates beyond 2262.
Shouldn't pandas ideally throw an out of bounds error or return NaT for these dates?

@jbrockmendel
Copy link
Member

now that we have non-nano Timestamp support we could handle this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Non-Nano datetime64/timedelta64 with non-nanosecond resolution Period Period data type
Projects
None yet
Development

No branches or pull requests

6 participants