Note for everything in this notebook to work as illustrated, you need `pandas 1.5.0` or later.

## Changing the index in pandas

`pandas` allows you to promote any column to be the index of a `DataFrame` using the `set_index` method.



In [19]:
import pandas as pd

print(f"Pandas Version: {pd.__version__}")
print()
data = {
  "id" : [11,231,542],
  "name": ["Sally", "Mary", "John"],
  "age": [50, 40, 30],
  "qualified": [True, False, False]
}
idx = ["X", "Y", "Z"]

df = pd.DataFrame(data, index=idx)

print(f'pd.DataFrame(data, index=idx) The idx sequence is the index.  This particular index has name: {df.index.name}\n')
df

Pandas Version: 1.5.0

pd.DataFrame(data, index=idx) The idx sequence is the index.  This particular index has name: None



Unnamed: 0,id,name,age,qualified
X,11,Sally,50,True
Y,231,Mary,40,False
Z,542,John,30,False


To promote `id` to being the index, we use `set_index`.

In [15]:
xdf = df.set_index('id')

print(f'\ndf.set_index("id") The id column is now an index.  This particular index has a name: {xdf.index.name}\n')
xdf


df.set_index("id") The id column is now an index.  This particular index has a name: id



Unnamed: 0_level_0,name,age,qualified
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
11,Sally,50,True
231,Mary,40,False
542,John,30,False


Conversely you can demote the index to be a column using
the `reset_index` method.

When we **reset** the index, we set the index to be the default `0-n` index.
The old index is not just thrown away.  It is becomes a column.

In [16]:
newdf = xdf.reset_index()

print()
print(f'\ndf.reset_index() The id index is now back to being a column with its original name.\n')
newdf



df.reset_index() The id index is now back to being a column with its original name.



Unnamed: 0,id,name,age,qualified
0,11,Sally,50,True
1,231,Mary,40,False
2,542,John,30,False


What happens when the index doesnt have a name?  Remember our original `df` had an unnamed
index:

In [9]:
df

Unnamed: 0,id,name,age,qualified
X,11,Sally,50,True
Y,231,Mary,40,False
Z,542,John,30,False


A reset makes the old index into a column named `'index'`.

In [8]:
df2 = df.reset_index()
print('\ndf.reset_index() Resetting df, with the original nameless index.'
      ' The index-derived column gets a default name')
print()
df2


df.reset_index() Resetting df, with the original nameless index. The index-derived column gets a default name



Unnamed: 0,index,id,name,age,qualified
0,X,11,Sally,50,True
1,Y,231,Mary,40,False
2,Z,542,John,30,False


As of  `pandas 1.5.0`, the `names` argument of `reset_index` will allow
the user to supply a name for the index-derived column:

In [12]:
df.reset_index(names='label')

Unnamed: 0,label,id,name,age,qualified
0,X,11,Sally,50,True
1,Y,231,Mary,40,False
2,Z,542,John,30,False


In case the index is a multi level index, the names argument will accept a sequence.

If you have an older version of `pandas` with no `names` keyword argument for
`reset_index`, you can accomplish the same thing as 

```
df2 = df.reset_index(names='label')
df2
```

with

In [25]:
df2 = df.reset_index()
df3= df2.rename(columns={'index':'label'})
df3

Unnamed: 0,label,id,name,age,qualified
0,X,11,Sally,50,True
1,Y,231,Mary,40,False
2,Z,542,John,30,False
