index.levels not being updated by groupby #2655

michaelaye opened this Issue Jan 8, 2013 · 6 comments

4 participants




t1    0
t2    0
t3    1
t4    1
t5    1
Name: D


grouped = df.groupby('D')
for i,j in grouped:
    print 'D:',i
    print 'Actual index[2]:',j.index[0][2]
    print 'First element of levels[2]:',j.index.levels[2][0]


D: 0.0
Actual index[2]: t1
First element of levels[2]: t1
D: 1.0
Actual index[2]: t3
First element of levels[2]: t1



Workaround is index.get_level_values(2).unique() (I think?), so maybe index.levels is obsolete API?

Python pandas project member

I'm not convinced that discarding other reference values for the "categorical variable" by default is correct. kicking this can down the road; I would use the workaround for now when you need to know the actual observed level values in a chunk of the data


So, the groupby problem in #2770 shows that there is a lurking bug, no? Deleted lines should not appear anymore. Does that mean that the groupby algorithm uses index.levels instead of working with the index or index.get_level_values?


10 month down the road can-kicking enough?


Right now, I don't consider this a bug. Can you help me understand why an end user needs to care about what is actually in the levels?

To be clear, if we don't update them, we can share the levels indexes between all the views and copies of this MI, instead of allocating new ndarrays (and hash tables?) for each.

I could see adding a method to allow consolidation of a MultiIndex, but you can get the same thing now by doing:

new_index = MultiIndex.from_tuples(index.values)

See #2770 (comment). That's not what levels is for.
Not a bug. closing.

Edit: It should be

new_index = MultiIndex.from_tuples(mi.tolist())

Which does an extra copy (or two).

@y-p y-p closed this Nov 30, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment