Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in set_index or drop #2101

Closed
jseabold opened this issue Oct 22, 2012 · 7 comments
Closed

Bug in set_index or drop #2101

jseabold opened this issue Oct 22, 2012 · 7 comments
Assignees
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jseabold
Copy link
Contributor

I'm not sure if the bug is in the drop or that it shouldn't have let me set this MultiIndex since it's non-unique. var1 is just a combination of var2 and var3. In any event the result given back by this is garbage. I wanted to drop all the rows where the count of var1 == 1. Since there's not to my knowledge a way to drop variables without setting them to an index, I tried to do this without thinking

df = pandas.DataFrame([["x-a", "x", "a", 1.5],["x-a", "x", "a", 1.2],
                        ["z-c", "z", "c", 3.1], ["x-a", "x", "a", 4.1],
                       ["x-b", "x", "b", 5.1],["x-b", "x", "b", 4.1],
                       ["x-b", "x", "b", 2.2],
                       ["y-a", "y", "a", 1.2],["z-b", "z", "b", 2.1]],
                       columns=["var1", "var2", "var3", "var4"])

grp_size = df.groupby("var1").size()
drop_idx = grp_size.ix[grp_size == 1]

df.set_index(["var1", "var2", "var3"]).drop(drop_idx.index, level=0).reset_index()
@lodagro
Copy link
Contributor

lodagro commented Oct 22, 2012

Garbage indeed.

MO <--> #2064

df.drop(df.index[df['var1'].isin(drop_idx)])

  var1 var2 var3  var4
0  x-a    x    a   1.5
1  x-a    x    a   1.2
2  z-c    z    c   3.1
3  x-a    x    a   4.1
4  x-b    x    b   5.1
5  x-b    x    b   4.1
6  x-b    x    b   2.2
7  y-a    y    a   1.2
8  z-b    z    b   2.1

@jseabold
Copy link
Contributor Author

Ah, nice one-liner. Missed your comment on my other issue.

@changhiskhan
Copy link
Contributor

@lodagro I think it needs to be df.drop(df.index[df['var1'].isin(drop_idx.index)]) right?

@jseabold doesn't seem like a problem in drop (which just calls reindex). I'm checking it out now.

yeah, looks like it should have raised Exception for non-unique here.

@lodagro
Copy link
Contributor

lodagro commented Nov 2, 2012

@changhiskhan indeed!
Probably got confused, by the fact that membership testing on Series is dict like (so the keys matter) but for isin testing on a series, the values matter. (And not checking the result thoroughly.)

In [55]: drop_idx
Out[55]: 
var1
y-a     1
z-b     1
z-c     1

In [56]: s = pd.Series(['y-a', 1])

In [57]: s.isin(drop_idx)
Out[57]: 
0    False
1     True

In [59]: for x in ['y-a', 1]:
   ....:     print x, x in drop_idx
   ....:     
y-a True
1 False

@ghost ghost assigned wesm Nov 2, 2012
@wesm
Copy link
Member

wesm commented Nov 2, 2012

I think Skipper's code should work. I'll have a look

@changhiskhan
Copy link
Contributor

Maybe you can just call take instead of reindex and get around the non-uniqueness problem.

@wesm wesm closed this as completed in 1b23b6f Nov 3, 2012
@wesm
Copy link
Member

wesm commented Nov 3, 2012

It's a bit quick-and-dirt but it gets the job done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

4 participants