Incorrect timedelta computation of pd.Series datetime64[ns] if timedelta is too large #22492

shengpu-tang · 2018-08-24T03:04:52Z

Problem description

Below I present two methods to calculate the time difference between two pd.Timestamp's. The first way is the native python datetime subtraction, whereas the second way wraps things in a pd.Series. They should be equivalent. In example 1 the output is correct; but in example 2 the resulting timedelta is negative. This seems to be the case when the time difference is greater than ~300 years.

Code Sample

Example 1:

t0 = pd.Timestamp('1723-08-23 00:00:00')
t1 = pd.Timestamp('2013-08-23 00:00:00')
print( 'Expected', (t1 - t0).total_seconds() )
print( 'Got\t', (pd.Series([t1]) - pd.Series([t0]))[0].total_seconds() )

Output 1:

Expected 9151574400.0
Got	 9151574400.0

Whereas Example 2:

t0 = pd.Timestamp('1713-08-23 00:00:00')
t1 = pd.Timestamp('2013-08-23 00:00:00')
print( 'Expected', (t1 - t0).total_seconds() )
print( 'Got\t', (pd.Series([t1]) - pd.Series([t0]))[0].total_seconds() )

Output 2:

Expected 9467107200.0
Got	 -8979636873.709553

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-130-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 40.2.0
Cython: None
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

gfyoung · 2018-08-25T08:23:23Z

I suspect there is some overflow happening here behind the scenes.

cc @jbrockmendel

jbrockmendel · 2018-08-25T18:15:59Z

Yep, the issue is in core.arrays.datetimes._sub_datelike_dti where instead of new_values = self_i8 - other_i8 we should be using checked_add_with_arr. @shengpu1126 want to try a PR?

gfyoung added the Timedelta Timedelta data type label Aug 25, 2018

shengpu-tang mentioned this issue Aug 26, 2018

BUG: silent overflow in DateTimeArray subtraction #22508

Merged

4 tasks

gfyoung added the Bug label Aug 26, 2018

jreback added this to the 0.24.0 milestone Aug 29, 2018

jreback closed this as completed in #22508 Aug 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect timedelta computation of pd.Series datetime64[ns] if timedelta is too large #22492

Incorrect timedelta computation of pd.Series datetime64[ns] if timedelta is too large #22492

shengpu-tang commented Aug 24, 2018

INSTALLED VERSIONS

gfyoung commented Aug 25, 2018

jbrockmendel commented Aug 25, 2018

Incorrect timedelta computation of pd.Series datetime64[ns] if timedelta is too large #22492

Incorrect timedelta computation of pd.Series datetime64[ns] if timedelta is too large #22492

Comments

shengpu-tang commented Aug 24, 2018

Problem description

Code Sample

Output of pd.show_versions()

INSTALLED VERSIONS

gfyoung commented Aug 25, 2018

jbrockmendel commented Aug 25, 2018

Output of `pd.show_versions()`