Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in set_index or drop #2101

Closed
jseabold opened this issue Oct 22, 2012 · 7 comments

Comments

@jseabold
Copy link
Contributor

commented Oct 22, 2012

I'm not sure if the bug is in the drop or that it shouldn't have let me set this MultiIndex since it's non-unique. var1 is just a combination of var2 and var3. In any event the result given back by this is garbage. I wanted to drop all the rows where the count of var1 == 1. Since there's not to my knowledge a way to drop variables without setting them to an index, I tried to do this without thinking

df = pandas.DataFrame([["x-a", "x", "a", 1.5],["x-a", "x", "a", 1.2],
                        ["z-c", "z", "c", 3.1], ["x-a", "x", "a", 4.1],
                       ["x-b", "x", "b", 5.1],["x-b", "x", "b", 4.1],
                       ["x-b", "x", "b", 2.2],
                       ["y-a", "y", "a", 1.2],["z-b", "z", "b", 2.1]],
                       columns=["var1", "var2", "var3", "var4"])

grp_size = df.groupby("var1").size()
drop_idx = grp_size.ix[grp_size == 1]

df.set_index(["var1", "var2", "var3"]).drop(drop_idx.index, level=0).reset_index()
@lodagro

This comment has been minimized.

Copy link
Contributor

commented Oct 22, 2012

Garbage indeed.

MO <--> #2064

df.drop(df.index[df['var1'].isin(drop_idx)])

  var1 var2 var3  var4
0  x-a    x    a   1.5
1  x-a    x    a   1.2
2  z-c    z    c   3.1
3  x-a    x    a   4.1
4  x-b    x    b   5.1
5  x-b    x    b   4.1
6  x-b    x    b   2.2
7  y-a    y    a   1.2
8  z-b    z    b   2.1
@jseabold

This comment has been minimized.

Copy link
Contributor Author

commented Oct 22, 2012

Ah, nice one-liner. Missed your comment on my other issue.

@changhiskhan

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2012

@lodagro I think it needs to be df.drop(df.index[df['var1'].isin(drop_idx.index)]) right?

@jseabold doesn't seem like a problem in drop (which just calls reindex). I'm checking it out now.

yeah, looks like it should have raised Exception for non-unique here.

@lodagro

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2012

@changhiskhan indeed!
Probably got confused, by the fact that membership testing on Series is dict like (so the keys matter) but for isin testing on a series, the values matter. (And not checking the result thoroughly.)

In [55]: drop_idx
Out[55]: 
var1
y-a     1
z-b     1
z-c     1

In [56]: s = pd.Series(['y-a', 1])

In [57]: s.isin(drop_idx)
Out[57]: 
0    False
1     True

In [59]: for x in ['y-a', 1]:
   ....:     print x, x in drop_idx
   ....:     
y-a True
1 False

@ghost ghost assigned wesm Nov 2, 2012

@wesm

This comment has been minimized.

Copy link
Member

commented Nov 2, 2012

I think Skipper's code should work. I'll have a look

@changhiskhan

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2012

Maybe you can just call take instead of reindex and get around the non-uniqueness problem.

@wesm wesm closed this in 1b23b6f Nov 3, 2012

@wesm

This comment has been minimized.

Copy link
Member

commented Nov 3, 2012

It's a bit quick-and-dirt but it gets the job done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.