New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Out of bounds Timestamp does not raise exception #19382

Closed
cbertinato opened this Issue Jan 24, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@cbertinato
Contributor

cbertinato commented Jan 24, 2018

An OutOfBoundsDatetime exception is raised if a datetime that goes beyond the minimum datetime is specified in both nanoseconds:

>>> Timestamp(-9223372036854775000)
Timestamp('1677-09-21 00:12:43.145225')
>>> Timestamp(-9223372036854775001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslibs/timestamps.pyx", line 581, in pandas._libs.tslibs.timestamps.Timestamp.__new__
    ts = convert_to_tsobject(ts_input, tz, unit, 0, 0)
  File "pandas/_libs/tslibs/conversion.pyx", line 312, in pandas._libs.tslibs.conversion.convert_to_tsobject
    check_dts_bounds(&obj.dts)
  File "pandas/_libs/tslibs/np_datetime.pyx", line 121, in pandas._libs.tslibs.np_datetime.check_dts_bounds
    raise OutOfBoundsDatetime(
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1677-09-21 00:12:43

and as a string:

>>> Timestamp('1677-09-21 00:12:43.145225')
Timestamp('1677-09-21 00:12:43.145225')
>>> Timestamp('1677-09-21 00:12:43.145224')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslibs/timestamps.pyx", line 581, in pandas._libs.tslibs.timestamps.Timestamp.__new__
    ts = convert_to_tsobject(ts_input, tz, unit, 0, 0)
  File "pandas/_libs/tslibs/conversion.pyx", line 274, in pandas._libs.tslibs.conversion.convert_to_tsobject
    return convert_str_to_tsobject(ts, tz, unit, dayfirst, yearfirst)
  File "pandas/_libs/tslibs/conversion.pyx", line 482, in pandas._libs.tslibs.conversion.convert_str_to_tsobject
    return convert_to_tsobject(ts, tz, unit, dayfirst, yearfirst)
  File "pandas/_libs/tslibs/conversion.pyx", line 299, in pandas._libs.tslibs.conversion.convert_to_tsobject
    return convert_datetime_to_tsobject(ts, tz)
  File "pandas/_libs/tslibs/conversion.pyx", line 392, in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject
    check_dts_bounds(&obj.dts)
  File "pandas/_libs/tslibs/np_datetime.pyx", line 121, in pandas._libs.tslibs.np_datetime.check_dts_bounds
    raise OutOfBoundsDatetime(
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1677-09-21 00:12:43

An OverflowError is raised if going beyond the maximum datetime when instantiating with the number nanoseconds:

>>> Timestamp(9223372036854775807)
Timestamp('2262-04-11 23:47:16.854775807')
>>> Timestamp(9223372036854775808)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslibs/timestamps.pyx", line 581, in pandas._libs.tslibs.timestamps.Timestamp.__new__
    ts = convert_to_tsobject(ts_input, tz, unit, 0, 0)
  File "pandas/_libs/tslibs/conversion.pyx", line 289, in pandas._libs.tslibs.conversion.convert_to_tsobject
    obj.value = ts
OverflowError: Python int too large to convert to C long

but an incorrect Timestamp is returned if an out-of-bounds datetime string is specified:

>>> Timestamp('2262-04-11 23:47:16.854775807')
Timestamp('2262-04-11 23:47:16.854775807')
>>> Timestamp('2262-04-11 23:47:16.854775808')
Timestamp('2262-04-11 23:47:16.854775')

Verified that this is the case for DatetimeIndex as well.

Expected Output

OutOfBoundsDatetime exception is raised

Output of pd.show_versions()

commit: b38dc41

python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+169.gb38dc4105
pytest: 3.3.1
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.27.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.7.1
xarray: 0.10.0
IPython: 6.2.1
sphinx: 1.6.5
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.1
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.15
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@cbertinato

This comment has been minimized.

Contributor

cbertinato commented Jan 25, 2018

Has anybody seen an instance where a test will not raise an exception, but the same line in the Python interpreter does?

@chris-b1

This comment has been minimized.

Contributor

chris-b1 commented Jan 25, 2018

There's a reason (arguable whether it's a good one) for the difference between the OutOfBoundsDatetime and OverflowError - there's some portion of the negative int64 space that aren't valid times, while in the positive space we use every available int64 value, so trying one too big can't be cast into an int64. (mostly trivia)

Your upper string example definitely looks buggy - PR would be welcome!

>>> Timestamp('2262-04-11 23:47:16.854775808')
Timestamp('2262-04-11 23:47:16.854775')

@chris-b1 chris-b1 added this to the Next Major Release milestone Jan 25, 2018

@cbertinato

This comment has been minimized.

Contributor

cbertinato commented Jan 26, 2018

So should we normalize it so that any format generates an OutOfBoundsDatetime? I understand the difference, but is it necessary to indicate overflow in this case?

@chris-b1

This comment has been minimized.

Contributor

chris-b1 commented Jan 26, 2018

Yeah, I tend to think it'd be better if all of these raised an OutOfBoundsDatetime

@cbertinato

This comment has been minimized.

Contributor

cbertinato commented Jan 28, 2018

There is another variation for DatetimeIndex.

In [3]: pd.DatetimeIndex(np.array(['2262-04-11 23:47:16.854775808'], dtype='datetime64'))
Out[3]: DatetimeIndex(['NaT'], dtype='datetime64[ns]', freq=None)

So it's not exactly silent, but does differ from what happens for a Timestamp, which is the bug that we're talking about there. But suppose we address the upper bound for the Timestamp and make it, say, raise OutofBoundsDatetime or OverflowError. The behavior of the DatetimeIndex would still be inconsistent not only with that, but with its own behavior at the lower bound.

Can't an argument be made for returning NaT for all cases, Timestamp and DatetimeIndex?

jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Feb 4, 2018

@jbrockmendel jbrockmendel referenced this issue Feb 4, 2018

Merged

Fix parsing corner case closes #19382 #19529

3 of 4 tasks complete

@jreback jreback added Error Reporting and removed Bug labels Feb 4, 2018

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Feb 4, 2018

jreback added a commit that referenced this issue Feb 6, 2018

harisbal pushed a commit to harisbal/pandas that referenced this issue Feb 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment