ENH/BUG: Rename of MultiIndex DataFrames does not work #4160

Open
aschilling opened this Issue Jul 8, 2013 · 21 comments

Comments

Projects
None yet
10 participants

aschilling commented Jul 8, 2013 edited by jreback

xref #14139 for empty MI

Hi everybody,

in the current version renaming of MultiIndex DataFrames does not work. Lets take the following example:

import datetime as DT
import pandas as pd
df = pd.DataFrame({
'Branch' : 'A A A A A B'.split(),
'Buyer': 'Carl Mark Carl Joe Mark Carl'.split(),
'Quantity': [1,3,5,8,9,3],
'Date' : [
    DT.datetime(2013,9,1,13,0),
    DT.datetime(2013,9,1,13,5),
    DT.datetime(2013,10,1,20,0),
    DT.datetime(2013,10,3,10,0),
    DT.datetime(2013,12,2,12,0),                                      
    DT.datetime(2013,12,2,14,0),
    ]})

and the following query:

test_df = df[df['Buyer'].isin(['Carl', 'Mark'])].set_index('Buyer', append=True)[['Date']].unstack(['Buyer'])

Now, the following renaming does not work

test_df.rename(columns={('Date', 'Carl'): 'Carl'}, inplace=True)

Thanks in advance

Andy

Contributor

jreback commented Jul 8, 2013

what are you trying to accomplish, that doesn't make sense, renaming a 2-level to a single-level

You can select out the level however; is this what you are after?

In [8]: test_df['Date']
Out[8]: 
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

Hi everybody,

Sorry, the example ist not the best. Actually, I did some trend generation today and after updating to pandas current branch a lot of my code didn't work anymore because that rename function used to work as described above.
There are basically two scenarios where I used that hierarchical renaming:

  1. When generating stock trends (a short variable name is much easier to use than those tuples)
  2. When doing some complex operations as in the SO article, where the original variable (Date) does not matter any more but only that there is a matching date between Carl and Mark
Contributor

hayd commented Jul 8, 2013

I favour xs here, bit more explicit:

In [11]: test_df.xs('Date', axis=1)
Out[11]:
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

# or maybe
test_df.columns = test_df.columns.droplevel(0)
# or 
test_df.columns =  test_df.columns.get_level_values('Buyer')

The fact that replace was working before smells like a bug, as @jreback says, it doesn't really make any sense to rename like that...

Member

cpcloud commented Jul 8, 2013

renaming doesn't work for multiindexes period, whether it makes sense or not:

In [7]: df
Out[7]:
  Branch Buyer                Date  Quantity
0      A  Carl 2013-09-01 13:00:00         1
1      A  Mark 2013-09-01 13:05:00         3
2      A  Carl 2013-10-01 20:00:00         5
3      A   Joe 2013-10-03 10:00:00         8
4      A  Mark 2013-12-02 12:00:00         9
5      B  Carl 2013-12-02 14:00:00         3

In [8]: test_df = df[df['Buyer'].isin(['Carl', 'Mark'])].set_index('Buyer', append=True)[['Date']].unstack(['Buyer'])

In [9]: test_df
Out[9]:
                     Date
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

In [10]: test_df.rename(columns={('Date', 'Carl'): ('Care')})
Out[10]:
                     Date
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

In [11]: test_df.rename(columns={('Date', 'Carl'): ('Care', "sdf")})
Out[11]:
                     Date
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

In [12]: test_df.rename(columns={('Date', 'Carl'): ('Care', "sdf")})
Contributor

hayd commented Jul 8, 2013

Now that is a bug/feature request :)

Maybe you ought to be able to replace on each level for a MultiIndex, say using

test_df.rename(columns={'Buyer': {'Carl' : 'sdf'}})

not sure...

Contributor

jreback commented Jul 8, 2013

need to add level arg to rename maybe?

Contributor

hayd commented Jul 8, 2013

not sure level argument works/makes sense since rename allows you to change both index and columns at the same time:s

Contributor

hayd commented Jul 8, 2013

Although perhaps my suggestion doesn't either (if want to replace same things as the level name/number)...

@waitingkuo waitingkuo added a commit to waitingkuo/pandas that referenced this issue Aug 5, 2013

@waitingkuo waitingkuo fix issue #4160, rename the MultiIndex 3c69575

jtratner was assigned Sep 9, 2013

@jreback jreback modified the milestone: 0.15.0, 0.14.0 Apr 22, 2014

@jreback jreback modified the milestone: 0.15.0, 0.15.1 Jul 6, 2014

@jreback jreback modified the milestone: 0.15.1, 0.15.0 Sep 8, 2014

jtratner was unassigned by jreback Sep 8, 2014

8one6 commented Nov 18, 2014

I think this is still an open issue. It would be great to be able to treat the column labels as tuples and just use rename in the "natural" (at least natural to me) way. For example:

df.rename(columns={c: (str(c[0]) + 'foo', str(c[1]) + 'bar') for c in df.columns})
Contributor

jreback commented Nov 18, 2014

@8one6 this is an open issue currently.

this still waiting for an API to deal with the multi-level API.

I am not sure I like the stringifying idea. But haven't thought too much about this.

Maybe an actual example would help

8one6 commented Nov 18, 2014

import numpy as np
import pandas as pd

rows = pd.Index(list('abcde'), names=['letter'])
columns = pd.MultiIndex.from_tuples([('px', c) for c in ['red', 'green', 'blue']], 
                                    names=['datum', 'color'])
df = pd.DataFrame(np.random.randn(len(rows), len(columns)), index=rows, columns=columns)

gives

datum        px                    
item        red     green      blue
a     -0.616822 -0.922983  0.148247
b     -0.383122 -0.451940  1.138330
c     -0.744860  2.299611  0.895295
d     -0.159886 -0.832159 -0.205430
e     -0.458384 -1.410207 -0.965780

So now I want to do this:

absdf = df.abs()
absdf.rename(columns={c: ('abspx', c[1]) for c in df.columns}, inplace=True)

but that doesn't do what I expect, it just gives back the unmodified frame. To accomplish what I want here, I would do:

newabsdf = df.abs()
newabsdf.columns = pd.MultiIndex.from_tuples([('abspx', c[1]) for c in df.columns], 
                                             names=df.columns.names)

which gives the desired result:

datum     abspx                    
item        red     green      blue
a      0.616822  0.922983  0.148247
b      0.383122  0.451940  1.138330
c      0.744860  2.299611  0.895295
d      0.159886  0.832159  0.205430
e      0.458384  1.410207  0.965780

Basically, in the multi-index context, I was expecting rename to "be happy" if the passed function/mapper/dictionary returned tuples with the correct number of elements. Am I doing something wrong above? Or would this be a new feature request? Or does this seem ambiguous in some way.

Contributor

jreback commented Nov 18, 2014

a multi-index renam at the moment does not work at all. The issue is how do you rename only part of a level

e.g.
red-> orange, how should I do this?
or
abspx -> foo

df.rename(columns={'red' : 'orange' }, level=1)
df.rename(columns={'abspx' : 'foo'},level=0)

but no way to do this (well it doesn't work), but does make sense

df.rename(columns={('abspx','red) : ('foo','orange')})

8one6 commented Nov 18, 2014

Ah, ok. So that last code block in your comment, is that a reasonable thing to hope will work at some point? I.e. is there a reason that would be a bad API for doing fully general multilevel renaming? (I think that's what I had tried to achieve with my dict comprehension in my absdf.rename... line above)

And the other two lines...

df.rename(columns={'red' : 'orange' }, level=1)
df.rename(columns={'abspx' : 'foo'}, level=0)

is that the current working proposal? Or already implemented? Or up for debate?

Contributor

jreback commented Nov 18, 2014

I think the prior dont work (but prob don't need much). The last is a proposed API.

I think their is a pull-requests somewhere which does most of this but wasn't finished IIRC.

@jreback jreback modified the milestone: 0.16.0, Next Major Release Mar 6, 2015

+1 on this feature, I found very obscure renaming/replacing multi-index labels. What is the current state @jreback ?

Contributor

jreback commented Apr 1, 2016

@denfromufa its open.

@jreback jreback modified the milestone: 0.18.1, Next Major Release Apr 1, 2016

@jreback jreback modified the milestone: 0.18.1, 0.18.2 Apr 26, 2016

@jreback jreback modified the milestone: 0.18.2, 0.19.0 Jul 6, 2016

In the meanwhile, could someone point to a recommended work-around? I'm dealing with some SurveyMonkey data that exports with unnecessary white space in the resulting MultiIndex and there doesn't seem to be an easy way to clean up the DataFrame.

geoffrey-eisenbarth commented Feb 14, 2017 edited

@eronlloyd: Looks like passing level to df.rename() will soon work according to this pull request, but I think the accepted answer in the meantime is something similar to the code below, courtesy of unutbu on StackOverflow:

def map_level(df, dct, level=0):
    index = df.index
    index.set_levels([[dct.get(item, item) for item in names] if i==level else names
                      for i, names in enumerate(index.levels)], inplace=True)

When I need to do something similar, I just drop the index and replace the values using apply. Not sure if that's more expensive than unutbu's solution above though.

@jreback jreback modified the milestone: 0.20.0, Next Major Release Mar 23, 2017

@jreback jreback added Prio-high and removed Prio-medium labels Mar 29, 2017

@jreback jreback modified the milestone: 0.20.0, Next Minor Release Apr 9, 2017

jreback closed this in #15931 Apr 13, 2017

This was closed automatically by github, but that was not the intention (#15931 is not related to this)

Contributor

jreback commented Apr 15, 2017

hahah had a reference with the word fix in it!

@jreback jreback modified the milestone: 0.20.0, Next Minor Release Apr 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment