Let's say we have a pandas DataFrame with several columns.

In [13]:
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(5,5), columns=['A', 'B', 'C', 'D', 'E'])

df

Unnamed: 0,A,B,C,D,E
0,0.292812,0.893026,0.386482,0.620794,0.330648
1,0.476695,0.031236,0.139148,0.299926,0.073512
2,0.076082,0.392369,0.484886,0.152317,0.386407
3,0.877472,0.136594,0.25622,0.717604,0.020764
4,0.631452,0.784715,0.461989,0.424976,0.663504


What if we want to rename the columns? There is more than one way to do this, and I'll start with an indirect answer that's not really a rename. Sometimes your desire to rename a column is associated with a data change, so maybe you just end up adding a column instead. Depending on what you're working on, and how much memory you can spare, and how many columns you want to deal with, adding another column is a good way to work when you're dealing with ad-hoc exploration, because you can always step back and repeat the steps since you have the intermediate data. You can complete the rename by dropping the old column. While this isn't very efficient, for ad-hoc data exploration, it's quite common.

In [4]:
df['e'] = np.maximum(df['E'], .5)

But let's say you do want to really just rename the column in place. Here's an easy way, but requires you do update all the columns at once.

In [5]:
print(type(df.columns))

df.columns = ['A', 'B', 'C', 'D', 'EEEE', 'e']

<class 'pandas.core.indexes.base.Index'>


Now the columns are not just a list of strings, but rather an Index, so under the hood the DataFrame will do some work to ensure you do the right thing here.

In [6]:
try:
    df.columns = ['a', 'b']
except ValueError as ve:
    print(ve)

Length mismatch: Expected axis has 6 elements, new values have 2 elements


Now, having to set the full column list to rename just one column is not convenient, so there are other ways. First, you can use the ```rename``` method. The method takes a mapping of old to new column names, so you can rename as many as you wish. Remember, axis 0 or "index" is the primary index of the DataFrame (aka the rows), and axis 1 or "columns". Note that the default here is the index, so you'll need to pass this argument.

In [7]:
df.rename({'A': 'aaa', 'B': 'bbb', 'EEE': 'EE'}, axis="columns")

Unnamed: 0,aaa,bbb,C,D,EEEE,e
0,0.472606,0.573878,0.583162,0.129296,0.666291,0.666291
1,0.296342,0.538023,0.931283,0.054995,0.534846,0.534846
2,0.623639,0.882558,0.048769,0.600781,0.046683,0.5
3,0.002262,0.346915,0.55238,0.03179,0.473855,0.5
4,0.918801,0.33226,0.81667,0.894351,0.46637,0.5


Note that by default it doesn't complain for mappings without a match ('EEE' is not a column but 'EEEE' is). You can force it to raise errors by passing in ```errors='raise'```. Also, it returns the DataFrame, so like many DataFrame methods, you need to pass ```inplace=True``` if you want to make the change persist in your DataFrame, or reassign to the same variable.

In [8]:
df.rename({'A': 'aaa', 'B': 'bbb', 'EEE': 'EE'}, axis=1, inplace=True)

You can also change the columns using the ```set_index``` method, with the axis set to 1 or ```columns```. Again,  ```inplace=True``` will update the DataFrame in place (and is the default in older versions of pandas but defaults to False 1.0+) if you don't want to reassign variables.

In [9]:
df.set_axis(['A', 'B', 'C', 'D', 'E', 'e'], axis="columns")

Unnamed: 0,A,B,C,D,E,e
0,0.472606,0.573878,0.583162,0.129296,0.666291,0.666291
1,0.296342,0.538023,0.931283,0.054995,0.534846,0.534846
2,0.623639,0.882558,0.048769,0.600781,0.046683,0.5
3,0.002262,0.346915,0.55238,0.03179,0.473855,0.5
4,0.918801,0.33226,0.81667,0.894351,0.46637,0.5


The ```rename``` method will also take a function. If you pass in the function (or dictionary) as the index or columns paramater, it will apply to that axis. This can allow you to do generic column name cleanup easily, such as removing trailing whitespace.

In [10]:
df.columns = ['A  ', 'B ', 'C  ', 'D ', 'E ', 'e']
df.rename(columns=lambda x: x.strip(), inplace=True)

I'll also mention one of the primary reasons of not using ```inplace=True``` is for method chaining in DataFrame creation and initial setup. Often, you'll end up doing something like this (contrived I know).

In [11]:
df = pd.DataFrame(np.random.rand(2,5,), columns=np.random.rand(5)).rename(columns=lambda x: str(x)[0:5])
df

Unnamed: 0,0.689,0.889,0.929,0.634,0.757
0,0.151959,0.847173,0.537661,0.687509,0.655668
1,0.68326,0.242501,0.011138,0.992909,0.847554


Which you'll hopefully agree is much better than this.

In [12]:
df = pd.DataFrame(np.random.rand(2,5,), columns=np.random.rand(5))
df.columns = [str(x)[0:5] for x in df.columns]
df

Unnamed: 0,0.214,0.031,0.845,0.014,0.158
0,0.839396,0.727933,0.130416,0.569804,0.520804
1,0.488862,0.72291,0.276028,0.077579,0.226966
