pandas.Series.eq is broken for series with different index #1134

lesteve · 2012-04-25T23:59:00Z

Something seems to be wrong with s1 == s2 when s1 and s2 don't have the same index. Here is a snippet example:

import operator
import pandas
s1 = pandas.Series([1,2], ['a','b'])
s2 = pandas.Series([2,3], ['b','c'])
s1 == s2
s2 == s1

with the output:

InIn [5]: s1 == s2
Out[5]: 
a    False
b    False

In [6]: s2 == s1
Out[6]: 
b    False
c    False

On the other hand using combine works fine:

In [7]: s1.combine(s2, operator.eq)
Out[7]: 
a    0
b    1
c    0

In [8]: s2.combine(s1, operator.eq)
Out[8]: 
a    0
b    1
c    0

I guess you can first align s1 and s2 and then compare them, but is there a good reason why this couldn't work out of the box?

There doesn't seem to be any tests for pandas.Series. eq for two series with a different index in pandas/pandas/tests/test_series.py. I have a patch lying around to add such a test and I could commit it if that's useful.

The text was updated successfully, but these errors were encountered:

wesm · 2012-04-26T01:46:54Z

This is actually a feature / deliberate choice and not a bug-- it's related to #652. Back in January I changed the comparison methods to do auto-alignment, but found that it led to a large amount of bugs / breakage for users and, in particular, many NumPy functions (which regularly do things like arr[1:] == arr[:-1]; example: np.unique) stopped working.

This gets back to the issue that Series isn't quite ndarray-like enough and should probably not be a subclass of ndarray.

So, I haven't got a good answer for you except for that; auto-alignment would be ideal but I don't think I can do it unless I make Series not a subclass of ndarray. I think this is probably a good idea but not likely to happen until 0.9 or 0.10 (several months down the road).

lesteve · 2012-04-26T06:35:37Z

Interesting, thanks for the answer. Is s[1:] == s[:-1] the main use case though, where you need to have this not completely intuitive == operator?

Out of interest, is there a way to figure out whether s1 and s2 are a view on the same underlying series and in this case have s1 == s2 do the current comparison. When s1 and s2 don't have anything to do with each other you would do the equivalent of aligning + comparison.

Not sure whether you would want to do that even if it was possible, e.g. s1 == s2.copy() would potentially return something different thans1 == s2.

wesm · 2012-04-26T13:25:05Z

Interesting, that would be a hack around the np.unique problem. I'll take a look into it sooner rather than later to see

jreback · 2013-09-21T12:45:28Z

closing as not a bug

jorisvandenbossche · 2014-04-10T09:43:36Z

@jreback I was answering this SO question: http://stackoverflow.com/questions/22983523/comparing-pandas-series-for-equality-when-they-are-in-different-orders/22983621#22983621. And I was wondering:

I understand that s1 == s2 is not flexible, just like df1 == df2 gives the error message ValueError: Can only compare identically-labeled DataFrame objects.
But for dataframes, you can overcome this with the flexible DataFrame.eq method. There is also a Series.eq method, but this is a not flexible method (not aligning). Is there a reason that Series.eq is not flexible?

In [154]: x = pd.Series(index=["A", "B", "C"], data=[1,2,3])
In [155]: y = pd.Series(index=["C", "B", "A"], data=[3,2,1])
In [156]: x == y
Out[156]: 
A    False
B     True
C    False
dtype: bool

In [157]: x.eq(y)
Out[157]: 
A    False
B     True
C    False
dtype: bool

In [158]: x.to_frame() == y.to_frame()
Traceback (most recent call last):
...
ValueError: Can only compare identically-labeled DataFrame objects

In [159]: x.to_frame().eq(y.to_frame())
Out[159]: 
      0
A  True
B  True
C  True

jreback · 2014-04-10T10:02:31Z

@jorisvandenbossche interesting, let's reopen and i'll take a look

snth · 2014-05-29T13:36:42Z

+1 for @jorisvandenbossche's suggestion of at least making the .eq, .ne, .lt, .le, .gt, .ge methods flexible, i.e. use alignment.

snth · 2014-05-29T14:01:06Z

From what I am reading above, it sounds like a fix might be more complicated/a while off. Could we in the meantime add something to the documentation about this?

I discovered this issue for myself recently and it took me a long time to figure out what was going on. At some point I did check the documentation to see if my understanding of index alignment was correct and there was no mention there that this only applies to the +, -, *, / operators and not to ==, !=, <, <=, >, >=.

In particular,

the Caveats and Gotchas section of the Pandas documentation has sections on if/truth statements with Pandas and Bitwise boolean operators (I think this would better be called Elementwise boolean operators). However these make no mention of the fact that indices will be ignored and not aligned in elementwise comparisons.
The Basics section of the docs has a section (Flexible Comparisons) on the basics of comparisons. This states that
```
Starting in v0.8, pandas introduced binary comparison methods eq, ne, lt, gt, le, and ge to Series and DataFrame whose behavior is analogous to the binary arithmetic operations described above:
```
However the behaviour for Series.eq is not analogous to the binary arithmetic operations as the arithmetic operations do perform index alignment while the comparison operators do not. Also, Series.eq does not appear to exist in 0.12.0 (i.e. not from 0.8 onwards as claimed) but I do find it in 0.13.1 although the signature there is different from DataFrame.eq (of course axis makes no sense for Series.eq but level and axis could still be included like they are in Series.add).

jreback · 2014-05-29T14:11:50Z

see #6860 its not that 'hard' the fix at all. Though I think that doing JUST for .eq,.ne...etc might be better. (as @jorisvandenbossche suggests).

(I think the docs really mean __eq__ (and not .eq); agreed that the signatures of .eq et.al need to be updated / integrated with DataFrame.eq; going to create an issue for that

jreback · 2014-05-29T14:15:09Z

#7278

hayd · 2014-08-22T05:19:30Z

Also came up here: http://stackoverflow.com/q/25435229/1240268

(update: oh, maybe thats with comparison)

jreback · 2015-05-17T10:05:03Z

reported here again: http://stackoverflow.com/questions/30284415/why-do-pandas-comparison-operators-not-align-on-index/30285686#30285686

wesm · 2016-01-14T21:32:45Z

Bumping this issue

sinhrks · 2016-07-11T04:48:22Z

+1 on adding flexible methods.

Also, there are inconsistencies in normal ops between Series and DataFrame as @jorisvandenbossche pointed. I`ve organized the differences including arithmetic / bool op (xref: #4581, #7278, .#13538, #13587)

Series

Arithmetic

aligns with labels.

pd.Series([1, 2, 3], index=list('ABC')) + pd.Series([2, 2, 2], index=list('ABD')) 
# A    3.0
# B    4.0
# C    NaN
# D    NaN
# dtype: float64

# pd.Series([1, 2, 3], index=list('ABC')) + pd.Series([2, 2, 2, 2], index=list('ABCD')) 
# A    3.0
# B    4.0
# C    5.0
# D    NaN
# dtype: float64

Comparison

ignores labels, raises when lengths are different.

pd.Series([1, 2, 3], index=list('ABC')) > pd.Series([2, 2, 2], index=list('ABD')) 
# A    False
# B    False
# C     True
# dtype: bool

pd.Series([1, 2, 3], index=list('ABC')) > pd.Series([2, 2, 2, 2], index=list('ABCD'))
# ValueError: Series lengths must match to compare

Boolean (logical)

ignores labels, ignores length mismatch.

pd.Series([True, False, True], index=list('ABC')) & pd.Series([True, True, True], index=list('ABD'))
# A     True
# B    False
# C    False
# dtype: bool

pd.Series([True, False, True], index=list('ABC')) & pd.Series([True, True, True, True], index=list('ABCD'))
# A     True
# B    False
# C     True
# dtype: bool

DataFrame

Arithmetic

aligns with labels.

pd.DataFrame([1, 2, 3], index=list('ABC')) + pd.DataFrame([2, 2, 2], index=list('ABD'))
#      0
# A  3.0
# B  4.0
# C  NaN
# D  NaN

pd.DataFrame([1, 2, 3], index=list('ABC')) + pd.DataFrame([2, 2, 2, 2], index=list('ABCD')) 
#      0
# A  3.0
# B  4.0
# C  5.0
# D  NaN

Comparison

raises when labels are different.

pd.DataFrame([1, 2, 3], index=list('ABC')) > pd.DataFrame([2, 2, 2], index=list('ABD'))
# ValueError: Can only compare identically-labeled DataFrame objects

pd.DataFrame([1, 2, 3], index=list('ABC')) > pd.DataFrame([2, 2, 2, 2], index=list('ABCD'))
# ValueError: Can only compare identically-labeled DataFrame objects

Boolean (logical)

aligns with labels.

pd.DataFrame([True, False, True], index=list('ABC')) & pd.DataFrame([True, True, True], index=list('ABD')) 
#        0
# A   True
# B  False
# C    NaN
# D    NaN

pd.DataFrame([True, False, True], index=list('ABC')) & pd.DataFrame([True, True, True, True], index=list('ABCD'))
#        0
# A   True
# B  False
# C    NaN
# D    NaN

Based on above, I think followings are consistent:

arithmetic always align with labels
comparison is allowed when labels are identical. otherwise raises.
boolean always align with labels

If OK, I'd like to do 2 changes:

series comparison to check whether labels are identical
series boolean to align with labels

jreback closed this as completed Sep 21, 2013

jreback reopened this Apr 10, 2014

jreback modified the milestones: 0.14.0, 0.13 Apr 10, 2014

jreback self-assigned this Apr 10, 2014

jreback mentioned this issue Apr 10, 2014

API: allow Series comparison ops to align before comparison (GH1134) #6860

Closed

jreback added API Design labels Apr 10, 2014

jreback modified the milestones: 0.15.0, 0.14.0 Apr 28, 2014

hayd mentioned this issue Apr 29, 2014

Adjust Bool evaluation for empty DataFrames to match PEP8 #6964

Closed

jreback mentioned this issue Sep 12, 2014

compare two series objects ignores index #8257

Closed

jreback modified the milestones: 0.16.0, 0.17.0 Jan 26, 2015

jreback modified the milestones: 0.17.0, Next Major Release May 17, 2015

jreback modified the milestones: Next Major Release, 0.17.0 Aug 15, 2015

jreback mentioned this issue Jun 30, 2016

bug in bool type series logical AND operation #13538

Closed

sinhrks modified the milestones: 0.19.0, Next Major Release Jul 11, 2016

jreback mentioned this issue Jul 12, 2016

Logical comparison operators and mathematical operators are applied inconsistently for series. #13587

Closed

sinhrks mentioned this issue Jul 13, 2016

API: Index/Series/DataFrame op 1-d list-like coercion #13637

Closed

sinhrks mentioned this issue Aug 3, 2016

API/BUG: Fix Series ops inconsistencies #13894

Merged

4 tasks

jorisvandenbossche closed this as completed in #13894 Aug 25, 2016

jreback mentioned this issue Nov 24, 2017

Equality between DataFrames misbehaves if columns contain NaN #18455

Closed

hunterjackson mentioned this issue Feb 23, 2018

Pandas Series.ne operator returning unexpected result against two slices of same Series #19855

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas.Series.eq is broken for series with different index #1134

pandas.Series.eq is broken for series with different index #1134

lesteve commented Apr 25, 2012

wesm commented Apr 26, 2012

lesteve commented Apr 26, 2012

wesm commented Apr 26, 2012

jreback commented Sep 21, 2013

jorisvandenbossche commented Apr 10, 2014

jreback commented Apr 10, 2014

snth commented May 29, 2014

snth commented May 29, 2014

jreback commented May 29, 2014

jreback commented May 29, 2014

hayd commented Aug 22, 2014

jreback commented May 17, 2015

wesm commented Jan 14, 2016

sinhrks commented Jul 11, 2016 •

edited

Loading

pandas.Series.__eq__ is broken for series with different index #1134

pandas.Series.__eq__ is broken for series with different index #1134

Comments

lesteve commented Apr 25, 2012

wesm commented Apr 26, 2012

lesteve commented Apr 26, 2012

wesm commented Apr 26, 2012

jreback commented Sep 21, 2013

jorisvandenbossche commented Apr 10, 2014

jreback commented Apr 10, 2014

snth commented May 29, 2014

snth commented May 29, 2014

jreback commented May 29, 2014

jreback commented May 29, 2014

hayd commented Aug 22, 2014

jreback commented May 17, 2015

wesm commented Jan 14, 2016

sinhrks commented Jul 11, 2016 • edited Loading

Series

Arithmetic

Comparison

Boolean (logical)

DataFrame

Arithmetic

Comparison

Boolean (logical)

pandas.Series.eq is broken for series with different index #1134

pandas.Series.eq is broken for series with different index #1134

sinhrks commented Jul 11, 2016 •

edited

Loading