Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Boolean indexed assignment #4192

Closed
SleepingPills opened this issue Jul 10, 2013 · 9 comments · Fixed by #4195
Closed

BUG: Boolean indexed assignment #4192

SleepingPills opened this issue Jul 10, 2013 · 9 comments · Fixed by #4195
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@SleepingPills
Copy link
Contributor

Take the simple example below:

s = pd.Series([1.0, 2.0], dtype=float)
s[[True, False]] = np.array([2.0])

The expectation would be that the first item in the series will be set to 2. Unfortunately due to a bug in broadcasting, pandas will set the first item in the series to 2*len(s). The culprit appears to be the below code (series.py:757 - where() method definition):

if len(other) == 1:
    other = np.array(other[0]*len(ser))

other in this case is the length 1 array we need to broadcast. I believe the code made the assumption that other will be perhaps a list of lists? As it stands, this code will take the single element out of the array, multiply it by the length of the series and then carry out the update.

I propose that we fix this by using np.repeat which does the correct thing (pending testing).

@SleepingPills
Copy link
Contributor Author

It appears that this issue is a side effect of #2745.

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

this is very tricky...trying to do the right thing in all cases

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

go ahead and try the fix, the existing tests should catch if you change the behavior....

but I am thinking

s = Series([1,2])
s[[True,False]] = [0]

works by accident (as 0 * 2 == 0)....

@SleepingPills
Copy link
Contributor Author

I can't see what was the original code trying to achieve there. At no point before does it ensure that other will be a nested list. It must be a list or tuple, since the logic would break down for numpy arrays as well.

@SleepingPills
Copy link
Contributor Author

Or rather, I see what it was trying to achieve, but I can't see why it was doing it the way it is written :)

@SleepingPills
Copy link
Contributor Author

Ok, I'll do the fix and add a bunch of tests. I'll try to think of some other edge cases as well.

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

great! always nice to have more tests

@SleepingPills
Copy link
Contributor Author

Any chance of shipping this in 0.12? When is the deadline for submissions? This is a pretty nasty one and I'd like to make sure 0.12 isn't affected by it.

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

sure....do it quick! (should be straightforward)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants