Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
get_group sometimes throws an exception when using an index of tuples with different lengths #8121
Comments
|
I'll take a look. While you're probably right that this shouldn't thrown an exception, storing containers in DataFrames is usually frowned upon. Something like In [14]: gr = df.groupby(pd.factorize(df.ids)[0])
In [15]: for i in gr.size().index:
....: print(i)
....: gr.get_group(i)
....:
0
1is usually better (faster and I think clearer). pd.factorize also returns the labels if you need those. |
|
@dwiel This is what you're expecting, right? In [1]: good = pd.DataFrame([[1, 1, 1, 1], ['a', 'b', 'a', 'b']]).T
In [2]: bad = pd.DataFrame(pd.Series([(1,), (1,2), (1,), (1, 2)]), columns = ['
ids'])
In [3]: gg = good.groupby([0, 1])
In [4]: gb = bad.groupby('ids')
In [5]: good
Out[5]:
0 1
0 1 a
1 1 b
2 1 a
3 1 b
In [6]: bad
Out[6]:
ids
0 (1,)
1 (1, 2)
2 (1,)
3 (1, 2)
In [9]: def run(gr):
for i in gr.size().index:
print(i)
print(gr.get_group(i))
...:
In [10]: run(gg)
(1, 'a')
0 1
0 1 a
2 1 a
(1, 'b')
0 1
1 1 b
3 1 b
In [11]: run(gb)
(1,)
ids
0 (1,)
2 (1,)
(1, 2)
ids
1 (1, 2)
3 (1, 2) |
|
The factorize code does appear to do what I want. To your second comment that does look like how I would expect it to work. |
TomAugspurger
closed this
in #8123
Aug 28, 2014
|
Should be fixed now. Like I said, you're probably better off with Thanks for the report! |
|
Thanks! On Wed, Aug 27, 2014 at 9:59 PM, Tom Augspurger notifications@github.com
|
dwiel commentedAug 27, 2014
Here is a simple test case that exposes the problem:
The issues is that in _get_index of GroupBy, these lines assume that if there is a tuple in the index, then the index is a multi-index, which in the above test case isn't true. Maybe there is some other way to detect that values are from a multi-index, or should pandas explicitly not support tuples in this situation (in an index of a groupby)