BUG: Comparisons with non-comparable objects in DataFrame return True #4537

jtratner · 2013-08-11T02:11:03Z

I can't tell if this is expected or not (taken from part of test_where_datetime in tests/test_frame). The change in #3311 (which the test case came from) didn't specify what should happen with non-datetime comparisons. Right now they return True, which doesn't really make sense. They should probably return False instead.

In [16]: df = DataFrame(dict(A = date_range('20130102',periods=5),
                            B = date_range('20130104',periods=5),
                            C = np.random.randn(5)))

In [17]: stamp = datetime(2013,1,3)

In [18]: df
Out[18]:
                    A                   B         C
0 2013-01-02 00:00:00 2013-01-04 00:00:00 -0.321163
1 2013-01-03 00:00:00 2013-01-05 00:00:00  1.219840
2 2013-01-04 00:00:00 2013-01-06 00:00:00 -1.048629
3 2013-01-05 00:00:00 2013-01-07 00:00:00 -0.861459
4 2013-01-06 00:00:00 2013-01-08 00:00:00  1.480088

In [19]: df > stamp
Out[19]:
       A     B     C
0  False  True  True
1  False  True  True
2   True  True  True
3   True  True  True
4   True  True  True

In [20]: df < stamp
Out[20]:
       A      B     C
0   True  False  True
1  False  False  True
2  False  False  True
3  False  False  True
4  False  False  True

In [21]: df == stamp
Out[21]:
       A      B      C
0  False  False  False
1   True  False  False
2  False  False  False
3  False  False  False
4  False  False  False

In [22]: df != stamp
Out[22]:
       A     B     C
0   True  True  True
1  False  True  True
2   True  True  True
3   True  True  True
4   True  True  True

In [23: pd.__version__
Out[23]: '0.12.0-154-g7fd6b20'

Column C probably should be False for every case.

The text was updated successfully, but these errors were encountered:

jreback · 2013-08-11T02:18:21Z

what do we so for the reverse case?
compare a float with a datetime / or an object (string)?
I would actually like it to be nan (but then you would always have to fill it)
I agree False if you have an incomparable across dtypes is prob appropriate

jtratner · 2013-08-11T02:30:33Z

@jreback this is easily fixed actually, fi you agree with this, I have a PR brewing.

jtratner · 2013-08-11T02:45:14Z

@jreback it actually fails pretty spectacularly with integers:

>>> df > 2
Traceback
   ...
AttributeError: 'int' object has no attribute 'view'

jtratner · 2013-08-11T03:39:01Z

more weird behavior:

In [3]: df = DataFrame(dict(A = date_range('20130102',periods=5),
                            B = date_range('20130104',periods=5),
                            C = np.random.randn(5)))

In [4]: df
Out[4]:
                    A                   B         C
0 2013-01-02 00:00:00 2013-01-04 00:00:00  1.029470
1 2013-01-03 00:00:00 2013-01-05 00:00:00 -0.589214
2 2013-01-04 00:00:00 2013-01-06 00:00:00 -1.979968
3 2013-01-05 00:00:00 2013-01-07 00:00:00  0.196116
4 2013-01-06 00:00:00 2013-01-08 00:00:00 -0.190880

In [5]: df > np.array([2])
Out[5]:
      A     B      C
0  True  True  False
1  True  True  False
2  True  True  False
3  True  True  False
4  True  True  False

In [6]: df == np.array([2])
Out[6]:
       A      B      C
0  False  False  False
1  False  False  False
2  False  False  False
3  False  False  False
4  False  False  False

In [7]: df != np.array([2])
Out[7]:
      A     B     C
0  True  True  True
1  True  True  True
2  True  True  True
3  True  True  True
4  True  True  True

In [8]: df < np.array([2])
Out[8]:
       A      B     C
0  False  False  True
1  False  False  True
2  False  False  True
3  False  False  True
4  False  False  True

jtratner · 2013-08-11T03:40:50Z

apparently this is not in the language spec and changes behavior in Python 3 as well - http://stackoverflow.com/questions/3270680/how-does-python-compare-string-and-int

cpcloud · 2013-08-11T03:49:59Z

in your example though they are ultimately comparing integers since that's how datetime64 is represented

jtratner · 2013-08-11T03:50:25Z

@cpcloud ah, good to know.

jtratner · 2013-08-11T03:54:25Z

@cpcloud but example mostly stays the same for things like string dataframes compared to integers, etc... it's kind of an edge case that shouldn't be relied upon.

cpcloud · 2013-08-11T04:03:08Z

i think we should leave comparison (with datetimes) as is...though i agree that's it's weird to compare a datetime64 to an integer (and if someone is writing code mixing abstractions like this then they are probably doing something wrong, better to keep abstractions on the same level)

what were you thinking of doing instead of just letting python's/numpy's comparison behavior do its thing? if two types are not equal then their values are not equal? maybe just for object arrays? seems hard since pandas is really just sort of "numeric", "object", and "datetimes" but lets you retain dtypes if you want. so comparing int to float should be valid (like it is now), but comparing object to not object should never return True? what about in the case of integer object arrays....seems like a mountain of complexity to deal with this edge case, maybe i'm overthinking it a bit though and you have a really elegant solution in mind

jtratner · 2013-08-11T04:16:53Z

@cpcloud honestly, I think we should just leave it, but have its fill value in comparisons be False for <, >, <=, >=, ==, and True for !=. So this would be a very minor change. Comparisons against nan might need to be fixed, but it's somewhat separate.

jtratner · 2013-08-11T04:23:37Z

@cpcloud or maybe no change at all. Just weird that the fill value is True in _comp_method.

Here's what it is now:

def _comp_method(func, name, str_rep):
    @Appender('Wrapper for comparison method %s' % name)
    def f(self, other):
        if isinstance(other, DataFrame):    # Another DataFrame
            return self._compare_frame(other, func, str_rep)
        elif isinstance(other, Series):
            return self._combine_series_infer(other, func)
        else:

            # straight boolean comparisions we want to allow all columns
            # (regardless of dtype to pass thru)
            return self._combine_const(other, func, raise_on_error = False).fillna(True).astype(bool)

    f.__name__ = name

    return f

I think it should be:

def _comp_method(func, name, str_rep):
    @Appender('Wrapper for comparison method %s' % name)
    def f(self, other):
        if isinstance(other, DataFrame):    # Another DataFrame
            return self._compare_frame(other, func, str_rep)
        elif isinstance(other, Series):
            return self._combine_series_infer(other, func)
        else:

            if func == operator.ne:
                masker = True
            else:
                masker = False
            # straight boolean comparisions we want to allow all columns
            # (regardless of dtype to pass thru)
            return self._combine_const(other, func, raise_on_error = False).fillna(masker).astype(bool)

    f.__name__ = name

    return f

Similar to how @jreback set it up in the Series refactor

jreback · 2013-08-11T13:41:55Z

here's an idea

generally numeric operations in a frame are done using the numeric columns only (using _get_numeric_data)

I think for comparisons you could do something similar, only allow a compatible dtype comparator on the rhs (eg say its a datetime then exclude non-datetimes on the lhs), this prevents the integer/datetime comparison issues; booleans prob need some special consideration here (there is a _get_boolean_data); easy to create other of these

so for frames this makes perfect sense
but series prob needs to raise on invalid comparisons by default

jtratner · 2013-08-11T13:53:22Z

@jreback would this still work:

df = DataFrame({"A": [0, 1.2, np.nan], "B": [datetime.now(), datetime.now(), datetime.now()]})
df[ df > 0]

jreback · 2013-08-11T14:04:28Z

this is why the comparisons return True by default (when the comparisons fail, eg fillna(True)), so it passes thru on the selection

so will pass thru except where the A value = 0;

I don't think this should change

jtratner · 2013-08-11T15:29:44Z

okay, then I guess this particularly thing should be left as-is? (since that's already the behavior that's happening).

A little weird for this to happen

In [1]: from pandas import *

In [2]: %paste
row = [datetime(1, 1, 1), datetime(2, 2, 2), datetime(3, 3, 3)]
df = DataFrame({"A": row, "B": row})
df > 3

## -- End pasted text --
Out[2]:
      A     B
0  True  True
1  True  True
2  True  True

jtratner · 2013-08-11T15:30:54Z

but I guess you could chain the comparisons...which probably makes sense, like:

(df > 3) & (df < datetime(2, 2, 3))

jreback · 2013-08-23T11:56:16Z

@jtratner I think we are not changing this, correct, close?

kjordahl mentioned this issue Aug 23, 2013

Now boolean operators work with NDFrames? #4633

Closed

jtratner closed this as completed Aug 31, 2013

ResidentMario mentioned this issue Apr 26, 2017

BUG: Exception to the boolean frame comparison guarantee #16155

Closed

mzwiessele mentioned this issue May 23, 2019

BUG: ordered categorical comparison with missing values evaluates to True #26504

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Comparisons with non-comparable objects in DataFrame return True #4537

BUG: Comparisons with non-comparable objects in DataFrame return True #4537

jtratner commented Aug 11, 2013

jreback commented Aug 11, 2013

jtratner commented Aug 11, 2013

jtratner commented Aug 11, 2013

jtratner commented Aug 11, 2013

jtratner commented Aug 11, 2013

cpcloud commented Aug 11, 2013

jtratner commented Aug 11, 2013

jtratner commented Aug 11, 2013

cpcloud commented Aug 11, 2013

jtratner commented Aug 11, 2013

jtratner commented Aug 11, 2013

jreback commented Aug 11, 2013

jtratner commented Aug 11, 2013

jreback commented Aug 11, 2013

jtratner commented Aug 11, 2013

jtratner commented Aug 11, 2013

jreback commented Aug 23, 2013

BUG: Comparisons with non-comparable objects in DataFrame return True #4537

BUG: Comparisons with non-comparable objects in DataFrame return True #4537

Comments

jtratner commented Aug 11, 2013

jreback commented Aug 11, 2013

jtratner commented Aug 11, 2013

jtratner commented Aug 11, 2013

jtratner commented Aug 11, 2013

jtratner commented Aug 11, 2013

cpcloud commented Aug 11, 2013

jtratner commented Aug 11, 2013

jtratner commented Aug 11, 2013

cpcloud commented Aug 11, 2013

jtratner commented Aug 11, 2013

jtratner commented Aug 11, 2013

jreback commented Aug 11, 2013

jtratner commented Aug 11, 2013

jreback commented Aug 11, 2013

jtratner commented Aug 11, 2013

jtratner commented Aug 11, 2013

jreback commented Aug 23, 2013