Where method on DataFrames can't seem to take series as conditions #9558

Closed
MaximilianR opened this Issue Feb 26, 2015 · 5 comments

Comments

Projects
None yet
3 participants
Contributor

MaximilianR commented Feb 26, 2015

I want to use where to set values across a DataFrame where a Series meets a condition.

For example:

df=pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})

But this

df.where(df['a']==2,0)

returns an unedited DataFrame.
Whereas this returns just the middle row, as you'd expect:

df[df['a']==2]

Supplying axis=0 or axis=1 doesn't help. Am I doing something wrong? Or this unintended behavior?

Versions
python: 2.7.6.final.0
pandas: 0.14.1

Contributor

TomAugspurger commented Feb 26, 2015

I believe the cond argument to df.where has to be the same shape as the original df.

Contributor

MaximilianR commented Feb 26, 2015

@TomAugspurger, that's correct.

Is that desired behavior though? I would think that broadcasting across an index (in the same way as the final example broadcasts) would be desired.

Contributor

jreback commented Feb 27, 2015

@MaximilianR the reason for this is that .where() is shape preserving, IOW it puts nan in the non-matching conditions.

df[df['a']==2] potentially changes the shape of the result. e.g. it can drop rows.

FYI, df.where(df['a']==2) raises (an odd error message), so actually prob need to do some more checking for shape compat (right now it just does a .reindex()).

jreback added this to the 0.17.0 milestone Feb 27, 2015

Contributor

MaximilianR commented Feb 27, 2015

@jreback Thanks!

Contributor

MaximilianR commented Oct 28, 2015

This is closed with #10283. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment