Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rich comparisons of Timestamps raise TypeError instead of returning NotImplemented #24011

Closed
AlexandreDecan opened this issue Nov 30, 2018 · 8 comments

Comments

@AlexandreDecan
Copy link
Contributor

commented Nov 30, 2018

Hello,

Problem description

The problem is that pandas' Timestamp raises TypeError when compared with unsupported types (from pandas' point of view), where I think it should return NotImplemented so that comparisons are delegated to the other object (if supported). See https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/tslibs/timestamps.pyx#L247

The current behaviour prevents other objects to deal with comparisons with Timestamp. For instance, I developed a library that provides arithmetic operations for intervals composed of arbitrary comparable objects. In the context of this library, I defined two specific objects corresponding to negative and positive infinities (that are respectively lower and greater than "everything else"). When these objects are compared with pandas' Timestamp, a TypeError is raised, preventing the comparison to be delegated to my "infinities".

Code Sample

# Let's define infinity, or any object whose magic methods support Timestamp
class Inf:
    def __lt__(self, o): return False
    def __le__(self, o): return isinstance(o, Inf)
    def __gt__(self, o): return not isinstance(o, Inf)
    def __ge__(self, o): return True
    def __eq__(self, o): return isinstance(o, Inf)
    def __ne__(self, o): return not self == o  # Required for Python 2
    def __repr__(self): return '+inf'

# Import pandas and create a timestamp
import pandas as pd
timestamp = pd.Timestamp('2018-11-30')

# Comparison works if compared in *that* order, because magic method is called on Inf
assert Inf() > timestamp
assert not (Inf() < timestamp)

# ... but not when magic method is called on Timestamp
assert timestamp < Inf()

... raises the following:

pandas/_libs/tslib.pyx in pandas._libs.tslib._Timestamp.__richcmp__()
TypeError: Cannot compare type 'Timestamp' with type 'Inf'

Expected Behaviour

TypeError not raised, assertion holds.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.17-300.fc29.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_BE.UTF-8
LOCALE: fr_BE.UTF-8

pandas: 0.22.0
pytest: 3.7.4
pip: 9.0.3
setuptools: 40.2.0
Cython: None
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Also tested on the latest 0.23.4.

@AlexandreDecan

This comment has been minimized.

Copy link
Contributor Author

commented Nov 30, 2018

Related issue in my library: AlexandreDecan/python-intervals/issues/5

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Nov 30, 2018

Thanks for the small example. If you're interested, you could see if returning NotImplemented there causes any of our tests to fail (probably pandas/tests/scalar/timestamp and pandas/tests/indexes/datetimes, maybe other areas).

cc @jbrockmendel

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Nov 30, 2018

This came up recently for Timedelta too, so there is precedent for changing to return NotImplemented

The flip side is that doing this causes inconsistencies in the py2 vs py3 behavior for non-custom types (#23684).

@AlexandreDecan

This comment has been minimized.

Copy link
Contributor Author

commented Nov 30, 2018

@jbrockmendel Support for Python 2 will be dropped at the end of the year if I'm right. It's not a big deal (at least to me) to wait until 2019 to see this behaviour "fixed" ;-)

@TomAugspurger I can give it a try, but I'm not used to Cython nor to deploy/test libraries with C-extensions. I'll keep you informed of my progress on this.

@AlexandreDecan

This comment has been minimized.

Copy link
Contributor Author

commented Nov 30, 2018

With a clean installation of current master branch, two tests errored:

2 failed, 42418 passed, 1091 skipped, 1096 xfailed, 15 xpassed, 47 warnings in 621.51 seconds

I made the change to return NotImplemented instead of rising a TypeError exception, and got:

1 failed, 42419 passed, 1091 skipped, 1096 xfailed, 15 xpassed, 47 warnings in 590.78 seconds

However, I don't know if I have to do something specific after having changed the code (e.g. building pandas again or something else) ?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Nov 30, 2018

Did you recompile the C extension after making the change? python setup.py build_ext -i?

@AlexandreDecan

This comment has been minimized.

Copy link
Contributor Author

commented Nov 30, 2018

No, I didn't. I'm not used to "compile" Python code ;-)

Here are the results of ./test_fast.sh after having recompiled the extension:

1 failed, 42419 passed, 1091 skipped, 1096 xfailed, 15 xpassed, 47 warnings in 582.28 seconds

The one that failed is TestDatetimeIndexTimezones.test_dti_tz_localize_roundtrip[tz_aware_fixture6-idx3]. According to the output, it's a difference of around one second that lead the test to fail, so it's seems to be unrelated to the change I made.

Notice that I simply added return NotImplemented at line 243, to prevent the following code to be executed:

if op == Py_EQ:
    return False
elif op == Py_NE:
    return True
raise TypeError('Cannot compare type %r with type %r' %
                (type(self).__name__, type(other).__name__))

With this change, the piece of code I provided in the first post passes.

@AlexandreDecan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 1, 2018

I created a PR to get test results from CI. Some of the failing tests are (depending on the chosen build process):

  • tests.arithmetic.test_datetime64.TestDatetimeIndexComparisons: test_dti_cmp_object_dtype
  • tests.arithmetic.test_numeric.TestNumericComparisons: test_df_numeric_cmp_dt64_raises
  • tests.frame.test_arithmetic.TestFrameComparisons: test_timestamp_compare
  • tests.indexes.test_base.TestIndex: test_outer_join_sort
  • tests.frame.test_rank.TestRank: test_rank2 (potentially unrelated)
  • tests.indexes.test_base.TestIndex: test_union_dt_as_obj (warning raised)

@gfyoung gfyoung added the API Design label Dec 2, 2018

@jreback jreback added this to the 0.25.0 milestone Mar 3, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.