Binary operators between DataFrame and Series object doesn't seem to work #5284

liori · 2013-10-20T18:38:26Z

related similar operation

http://stackoverflow.com/questions/19484344/how-do-i-use-a-specific-columns-value-in-a-pandas-dataframe-where-clause/19494873#19494873

http://stackoverflow.com/questions/19507088/filtering-a-pandas-dataframe-without-removing-rows/19516869#19516869

http://stackoverflow.com/q/21627926/190597

This should be a bit more intuitive

In [59]: data = """      A    B    C    D
1/1   0    1    0    1
1/2   2    1    1    1
1/3   3    0    1    0 
1/4   1    0    1    2
1/5   1    0    1    1
1/6   2    0    2    1
1/7   3    5    2    3"""

In [60]: df = read_csv(StringIO(data),sep='\s+')

In [61]: df
Out[61]: 
     A  B  C  D
1/1  0  1  0  1
1/2  2  1  1  1
1/3  3  0  1  0
1/4  1  0  1  2
1/5  1  0  1  1
1/6  2  0  2  1
1/7  3  5  2  3

In [62]: df.where((df>df.shift(1)).values & DataFrame(df.D==1).values)
Out[62]: 
      A   B   C   D
1/1 NaN NaN NaN NaN
1/2   2 NaN   1 NaN
1/3 NaN NaN NaN NaN
1/4 NaN NaN NaN NaN
1/5 NaN NaN NaN NaN
1/6   2 NaN   2 NaN
1/7 NaN NaN NaN NaN

Given that normal binary operators like addition or logical and work well between a pair of Series objects, or between a pair of DataFrame objects (returning a element-wise addition/conjuction), I found it surprising that I cannot do the same between a Series object and a DataFrame object.

Here's a demonstration of what doesn't work now and what would be the expected result: http://nbviewer.ipython.org/urls/dl.dropboxusercontent.com/u/52886258/000-qdoqud/Untitled0.ipynb

The text was updated successfully, but these errors were encountered:

jtratner · 2013-10-20T18:43:09Z

Your example is hard to parse because wakari escapes html and JS. Do you mind posting your example somewhere else or making it readable (maybe nbviewer.ipython.org would work?)

liori · 2013-10-20T18:55:03Z

My apologies, I didn't know Wakari does stuff like this. Here's an nbviewer link: http://nbviewer.ipython.org/urls/dl.dropboxusercontent.com/u/52886258/000-qdoqud/Untitled0.ipynb

jtratner · 2013-10-20T19:04:25Z

@liori thanks for re-posting that.

frame & series/series & frame your #5 is a known failure that we need to fix, there's an issue open about it I believe - this one is sort of related - #4615, but this should definitely be kept open because it's not quite the same.

series + frame - this has been the behavior for a long time, because it combines on columns first then on index. You can get around it by using frame.add(series, axis=1) but I personally agree that this is unexpected. I doesn't make sense to me that you'd want to broadcast a Series over rows, given that DataFrame generates columns as Series. Pretty sure that others will disagree.

jtratner · 2013-10-20T19:28:09Z

For reference, in R this sort of 'works as you expect':

> n = c(2, 3, 5)
> s = c('aa', 'bb', 'cc')
> b = c(TRUE, FALSE, TRUE)
> df = data.frame(n, s, b)
> df
  n  s     b
1 2 aa  TRUE
2 3 bb FALSE
3 5 cc  TRUE
> df * n
   n  s b
1  4 NA 2
2  9 NA 0
3 25 NA 5
Warning message:
In Ops.factor(left, right) : * not meaningful for factors
> df * df$n
   n  s b
1  4 NA 2
2  9 NA 0
3 25 NA 5
Warning message:
In Ops.factor(left, right) : * not meaningful for factors
>

Whereas in pandas it gives very strange errors (this is 0.12.0)

n = [2, 3, 5]
s = ['aa', 'bb', 'cc']
b = [True, False, True]
df = pandas.DataFrame({'n': n, 's': s, 'b': b})

df * df['n'] # TypeError: Could not operate [array([ nan])] with block values [too many boolean indices]
df + n # TypeError: Could not operate [array([5], dtype=int64)] with block values [too many boolean indices]
df * n # works

   b   n           s
0  2   6  aaaaaaaaaa
1  0   9  bbbbbbbbbb
2  2  15  cccccccccc

And with the original example, it feels weird that doing arithmetic with a selected column still results in the garbled output:

jtratner · 2013-10-20T19:28:26Z

frame = pandas.DataFrame({'Column': {1: True, 2: False, 3: True, 4: False},
    ...:                           'Another': {1: True, 2: True, 3: True, 4: False}})

frame
Out[25]: 
  Another Column
1    True   True
2    True  False
3    True   True
4   False  False

series = pandas.Series({1: True, 2: True, 3: False, 4: False})

frame + series
Out[27]: 
     1    2    3    4 Another Column
1  NaN  NaN  NaN  NaN     NaN    NaN
2  NaN  NaN  NaN  NaN     NaN    NaN
3  NaN  NaN  NaN  NaN     NaN    NaN
4  NaN  NaN  NaN  NaN     NaN    NaN

frame + frame['Another']
Out[28]: 
     1    2    3    4 Another Column
1  NaN  NaN  NaN  NaN     NaN    NaN
2  NaN  NaN  NaN  NaN     NaN    NaN
3  NaN  NaN  NaN  NaN     NaN    NaN
4  NaN  NaN  NaN  NaN     NaN    NaN

jreback · 2013-10-20T20:10:15Z

you need to use mul/add which provide for alignment - that's what they r for

liori · 2013-10-20T20:15:06Z

@jreback, there's no equivalent for __and__.

jtratner · 2013-10-20T20:22:01Z

There is in 0.13 - it's called and_.

jreback · 2013-10-20T20:42:09Z

@liori to be honest your example is aligning correctly, but since DataFrame and series align across columns this is correct

you can use add/mul to force and explicitly alignment if you wish, but keep in mind that what you are expecting is not natural

jtratner · 2013-10-20T20:45:17Z

@jreback any way we could improve the error messages to advise using the arithmetic flex methods? Maybe we could also warn when you're going to get something like this (since this is probably never what you want). I'm thinking specifically when you union a Series index with DataFrame columns:

     1    2    3    4 Another Column
1  NaN  NaN  NaN  NaN     NaN    NaN
2  NaN  NaN  NaN  NaN     NaN    NaN
3  NaN  NaN  NaN  NaN     NaN    NaN
4  NaN  NaN  NaN  NaN     NaN    NaN

e.g. warn("Arithmetic with Series and DataFrame align along columns, use %s() method to explicitly align on Index" % name.strip("_"))

I also find it confusing that you can't actually do arithmetic with the whole dataframe when you select out a column.

jtratner · 2013-10-20T20:46:59Z

So, to be clear, this broadcasts:

pd.Series([False, True], index=['Another', 'Column'])
Out[46]: 
Another    False
Column      True
dtype: bool

ser = _

frame * ser
Out[48]: 
  Another Column
1   False   True
2   False  False
3   False   True
4   False  False

jreback · 2013-10-20T20:47:47Z

yep could use a better errors msg - but can't be right all the time; imagine a df with index and columns of 1-4 then it's ambiguous but most of the time if their is a length/index type mismatch is an incorrect alignment

jtratner · 2013-10-20T20:49:37Z

I agree, there are certainly ambiguous cases. But we could warn whenever you have the case of Series + DataFrame with no elements overlapping between columns and Series index. I think that would've headed that off. If you're playing around with pandas / have loaded from some IO source, I'd assume that your columns will be string-like and index will be integer-like (or at least different than cols) so it would cover majority of cases.

liori · 2013-10-20T20:49:50Z

@jreback: I just wanted to reuse my knowledge of R dataframes in pandas; especially given that pandas is described as a library bringing data analysis workflow from “languages like R” to Python. But if pandas doesn't actually work the same way—that's fine for me, just please make the error messages clear.

jtratner · 2013-10-20T20:50:19Z

and I guess you'd want to say this:

warn("Arithmetic with Series and DataFrame align along columns," "use the %s() method with axis=1 to explicitly align on index" % name.strip("_"))

jtratner · 2013-10-20T20:53:00Z

@liori I have little experience with R. How would you broadcast along rows rather than along columns and vice-versa? There's a few sections on comparisons with R, probably would be helpful to add that. (and at least in 0.13 you get relatively comprehensible frame.and_(series, axis='index')

liori · 2013-10-20T21:11:49Z

@jtratner: It can be done using an apply-type of method, or by using a different data type. For example:

> frame <- data.frame(column=c(TRUE, FALSE, TRUE, FALSE), another=c(TRUE, TRUE, TRUE, FALSE))
> frame
  column another
1   TRUE    TRUE
2  FALSE    TRUE
3   TRUE    TRUE
4  FALSE   FALSE
> lst <- list(column=FALSE, another=TRUE)
> lst
$column
[1] FALSE

$another
[1] TRUE

> data.frame(frame & lst)
  column another
1  FALSE    TRUE
2  FALSE    TRUE
3  FALSE    TRUE
4  FALSE   FALSE

BTW, note that in R a data.frame object is just a specialized list of vectors. So frame & lst is just a typical element-wise (list of vectors) vs. (list of scalars) operation, whereas frame & series is a (list of vectors) vs. vector operation.

jtratner · 2013-10-20T22:12:00Z

Thanks - that's helpful! I'll try to put some more comparisons together so
it's there for people to reference (and hopefully you can take a look
then). My impression is that the usage of both is relatively similar
conceptually, even if they default to aligning on different axes. Anything
else you notice that's confusing would be helpful to add to the docs on R
vs pandas and can ask here or on pydata mailing list

unutbu · 2014-02-07T20:43:21Z

Related use case: http://stackoverflow.com/q/21627926/190597

Broadcasting equality testing between DataFrame and Series

df == rowmax

replaced with

df.values == rowmax[:,None]

jreback · 2014-02-07T22:32:19Z

@unutbu this a bit tricky....

you will want to emulate something like this:

df = DataFrame(np.random.randn(5,2),columns=list('ab'))
s = Series(np.arange(5))

df.mul(s,axis='index')

what you want to create is a set of functions, exaclty like mul/add etc...

that are called eq/and/or, which literally are called the same exact way
except their functions are operator.eq, operator.and....

df.mul calls this (which then uses the arguments to align the series and such), but that is all done already
https://github.com/pydata/pandas/blob/master/pandas/core/ops.py#L759

you just tneed to add the functions eq/and/or to the frame in a similar manner to how mul/all are added (its a 'bit' magical, but not crazy)

that's it (plus tests of course)!

lmk

unutbu · 2014-02-07T22:59:48Z

@jreback: Okay, I'll give it a go...

unutbu · 2014-02-10T12:51:56Z

@jreback: Am I missing something, or does eq, __and__, and __or__ already
work as desired?

df = pd.DataFrame({'cat1':[0,3,1], 'cat2':[2,0,1], 'cat3':[2,1,0]})
rowmax = df.max(axis=1)
df.eq(rowmax, axis='index')
df.__and__(rowmax, axis='index')
df.__or__(rowmax, axis='index')

jreback · 2014-02-10T13:07:56Z

hmm maybe just need to make and/ or be the same as those methods then
so trivial fix then

unutbu · 2014-02-10T13:30:41Z

Python syntax prevents and and or from being attribute names.

jreback · 2014-02-18T02:25:11Z

@unutbu I read above that I think and_ and or_ are defined....hmm...not doced though

unutbu · 2014-02-18T03:02:44Z

@jreback: If I understand correctly, and_ and or_ define the __and__ and __or__ attributes, because of this code. I could add a quick mention of __and__ and __or__ to the docs around here. Perhaps __and__ and __or__ should have their own page? Are they automatically generated? I don't know how that is done.

By the way, I'm still working on fixing the nan-sort PR; its failing nosetests after rebasing...

jreback · 2016-02-13T14:22:48Z

So this is causing a warning here: https://github.com/pydata/pandas/blob/master/pandas/tests/series/test_operators.py#L1200

because df & s raises a ValueError, which is really doing: df.__and__(s, axis='columns')

and the alignment should be df.__and__(s, axis='index') but of course & is going to default align this way.

jreback modified the milestones: 0.15.0, 0.14.0 Mar 30, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015

jreback modified the milestones: Next Major Release, 0.16.0 Mar 3, 2015

jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Feb 13, 2016

jbrockmendel mentioned this issue Jan 30, 2018

Continue de-nesting core.ops #19448

Merged

jbrockmendel mentioned this issue Aug 6, 2018

DataFrame vs Series vs Index arithmetic Roundup #18824

Closed

59 tasks

jbrockmendel mentioned this issue Oct 2, 2019

BUG: Fix DataFrame logical ops Series inconsistency #28741

Merged

4 tasks

jreback modified the milestones: Contributions Welcome, 1.0 Oct 2, 2019

jreback closed this as completed in #28741 Oct 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary operators between DataFrame and Series object doesn't seem to work #5284

Binary operators between DataFrame and Series object doesn't seem to work #5284

liori commented Oct 20, 2013

jtratner commented Oct 20, 2013

liori commented Oct 20, 2013

jtratner commented Oct 20, 2013

jtratner commented Oct 20, 2013

jtratner commented Oct 20, 2013

jreback commented Oct 20, 2013

liori commented Oct 20, 2013

jtratner commented Oct 20, 2013

jreback commented Oct 20, 2013

jtratner commented Oct 20, 2013

jtratner commented Oct 20, 2013

jreback commented Oct 20, 2013

jtratner commented Oct 20, 2013

liori commented Oct 20, 2013

jtratner commented Oct 20, 2013

jtratner commented Oct 20, 2013

liori commented Oct 20, 2013

jtratner commented Oct 20, 2013

unutbu commented Feb 7, 2014

jreback commented Feb 7, 2014

unutbu commented Feb 7, 2014

unutbu commented Feb 10, 2014

jreback commented Feb 10, 2014

unutbu commented Feb 10, 2014

jreback commented Feb 18, 2014

unutbu commented Feb 18, 2014

jreback commented Feb 13, 2016

Binary operators between DataFrame and Series object doesn't seem to work #5284

Binary operators between DataFrame and Series object doesn't seem to work #5284

Comments

liori commented Oct 20, 2013

jtratner commented Oct 20, 2013

liori commented Oct 20, 2013

jtratner commented Oct 20, 2013

jtratner commented Oct 20, 2013

jtratner commented Oct 20, 2013

jreback commented Oct 20, 2013

liori commented Oct 20, 2013

jtratner commented Oct 20, 2013

jreback commented Oct 20, 2013

jtratner commented Oct 20, 2013

jtratner commented Oct 20, 2013

jreback commented Oct 20, 2013

jtratner commented Oct 20, 2013

liori commented Oct 20, 2013

jtratner commented Oct 20, 2013

jtratner commented Oct 20, 2013

liori commented Oct 20, 2013

jtratner commented Oct 20, 2013

unutbu commented Feb 7, 2014

jreback commented Feb 7, 2014

unutbu commented Feb 7, 2014

unutbu commented Feb 10, 2014

jreback commented Feb 10, 2014

unutbu commented Feb 10, 2014

jreback commented Feb 18, 2014

unutbu commented Feb 18, 2014

jreback commented Feb 13, 2016