New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERR: tz conversions at the min/max boundaries should fail if overflow #12677

Closed
ciamac opened this Issue Mar 20, 2016 · 6 comments

Comments

Projects
None yet
2 participants
@ciamac

ciamac commented Mar 20, 2016

Timestamp comparisons with pd.Timestamp.max don't seem to work correctly when there are timezones, as in the code sample below. This is with Pandas 0.16.2.

Code Sample, a copy-pastable example if possible

import pandas as pd
import datetime

# should print True and does
print pd.Timestamp(datetime.date(2010,1,1)).tz_localize('UTC')<pd.Timestamp.max.tz_localize('UTC')
# should print True but prints False
print pd.Timestamp(datetime.date(2010,1,1)).tz_localize('US/Eastern')<pd.Timestamp.max.tz_localize('US/Eastern')

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 15.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.2
nose: None
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.1
statsmodels: None
IPython: 4.0.1
sphinx: None
patsy: 0.4.1
dateutil: 2.4.2
pytz: 2015.7
bottleneck: 1.0.0
numexpr: 2.4.6
matplotlib: 1.5.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
@jreback

This comment has been minimized.

Contributor

jreback commented Mar 20, 2016

This works for me in 0.16.2 (e.g. True/False) are printed.

In [1]: # should print True and does

In [2]: print pd.Timestamp(datetime.date(2010,1,1)).tz_localize('UTC')<pd.Timestamp.max.tz_localize('UTC')
True

In [3]: # should print False but prints True

In [4]: print pd.Timestamp(datetime.date(2010,1,1)).tz_localize('US/Eastern')<pd.Timestamp.max.tz_localize('US/Eastern')
False

In [5]: pd.__version__
Out[5]: u'0.18.0'

Show an ipython session where you print the version & run the code.

@ciamac

This comment has been minimized.

ciamac commented Mar 20, 2016

I've pasted an ipython session below. Sorry, my comments were wrong. Both instances should print True but the latter prints False.

In [1]: import pandas as pd

In [2]: import datetime

In [3]: 

In [3]: # should print True and does

In [4]: print pd.Timestamp(datetime.date(2010,1,1)).tz_localize('UTC')<pd.Timestamp.max.tz_localize('UTC')
True

In [5]: # should print True but prints False

In [6]: print pd.Timestamp(datetime.date(2010,1,1)).tz_localize('US/Eastern')<pd.Timestamp.max.tz_localize('US/Eastern')
False

In [7]: pd.__version__
Out[7]: '0.16.2'
@ciamac

This comment has been minimized.

ciamac commented Mar 20, 2016

Based on your snippet, it looks like the bug is still in 0.18.0.

@jreback

This comment has been minimized.

Contributor

jreback commented Mar 21, 2016

So this is fine

In [5]: pd.Timestamp(datetime.date(2010,1,1)).tz_localize('US/Eastern')<(pd.Timestamp.max-pd.Timedelta('1d')).tz_localize('US/Eastern')
Out[5]: True

This happens because of wrap-around, IOW, once you can't represent a number at the limits it wraps around to the other side.

In [6]: pd.Timestamp.max.tz_localize('US/Eastern').tz_convert('UTC').value
Out[6]: -9223354276854775809

In [7]: pd.Timestamp.max.value
Out[7]: 9223372036854775807

I suppose you do a doc-note. But trying to anything very near the edge points is bound to hit this issue (not restricted to time zones specifically)

@jreback

This comment has been minimized.

Contributor

jreback commented Mar 21, 2016

I suppose we could raise an error like this on: pd.Timestamp.max.tz_localize('US/Eastern')

In [8]: pd.Timestamp.max + pd.Timedelta('1d')
OverflowError: Python int too large to convert to C long

interested in doing a PR?

@jreback jreback reopened this Mar 21, 2016

@jreback jreback added this to the Next Major Release milestone Mar 21, 2016

@jreback jreback changed the title from timestamp comparison with timezones to ERR: tz conversions at the min/max boundaries should fail if overflow Mar 21, 2016

@ciamac

This comment has been minimized.

ciamac commented Mar 22, 2016

I think raising an error is a good way to handle this. It would also be good to have a function pd.Timestamp.max_tz_localize(tz) which would return the maximal timestamp in that time zone.

I would love to help with the PR, but I don't have enough experience in this code base to fix the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment