# Multiindexing in Pandas
> Notes on working with hierarchical indices in Pandas

- toc: true 
- badges: true
- comments: true
- categories: [Python, Pandas]

In [2]:
import pandas as pd

## Stacking and unstacking

Based on the Pandas [cookbook](https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#multiindexing)

In [3]:
df = pd.DataFrame({'row': [0, 1, 2],
                   'One_X': [1.1, 1.1, 1.1],
                   'One_Y': [1.2, 1.2, 1.2],
                   'Two_X': [1.11, 1.11, 1.11],
                   'Two_Y': [1.22, 1.22, 1.22]})

df

Unnamed: 0,row,One_X,One_Y,Two_X,Two_Y
0,0,1.1,1.2,1.11,1.22
1,1,1.1,1.2,1.11,1.22
2,2,1.1,1.2,1.11,1.22


In [4]:
# Set index
df = df.set_index('row')
df

Unnamed: 0_level_0,One_X,One_Y,Two_X,Two_Y
row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1.1,1.2,1.11,1.22
1,1.1,1.2,1.11,1.22
2,1.1,1.2,1.11,1.22


In [5]:
# Add hierarchical columns
df.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns])
df

Unnamed: 0_level_0,One,One,Two,Two
Unnamed: 0_level_1,X,Y,X,Y
row,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
0,1.1,1.2,1.11,1.22
1,1.1,1.2,1.11,1.22
2,1.1,1.2,1.11,1.22


In [6]:
# Stack and reset
df = df.stack(0).reset_index(1)
df

Unnamed: 0_level_0,level_1,X,Y
row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,One,1.1,1.2
0,Two,1.11,1.22
1,One,1.1,1.2
1,Two,1.11,1.22
2,One,1.1,1.2
2,Two,1.11,1.22


In [7]:
# Fix columns
df.columns = ['sample', 'x', 'y']
df

Unnamed: 0_level_0,sample,x,y
row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,One,1.1,1.2
0,Two,1.11,1.22
1,One,1.1,1.2
1,Two,1.11,1.22
2,One,1.1,1.2
2,Two,1.11,1.22


In [8]:
# Now revert changes
df = df.set_index([df.index, 'sample']).unstack(1)
df

Unnamed: 0_level_0,x,x,y,y
sample,One,Two,One,Two
row,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
0,1.1,1.11,1.2,1.22
1,1.1,1.11,1.2,1.22
2,1.1,1.11,1.2,1.22


In [9]:
df.columns = df.columns.swaplevel().sortlevel()[0]
df.columns.names = [None, None]
df

Unnamed: 0_level_0,One,One,Two,Two
Unnamed: 0_level_1,x,y,x,y
row,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
0,1.1,1.11,1.2,1.22
1,1.1,1.11,1.2,1.22
2,1.1,1.11,1.2,1.22


## Flattening hierarchical index

In [10]:
df.columns.nlevels

2

In [11]:
df.columns = [str.lower('_'.join(a)) for a in df.columns]
df

Unnamed: 0_level_0,one_x,one_y,two_x,two_y
row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1.1,1.11,1.2,1.22
1,1.1,1.11,1.2,1.22
2,1.1,1.11,1.2,1.22


To flatten the entire frame

In [13]:
df.reset_index()

Unnamed: 0,row,one_x,one_y,two_x,two_y
0,0,1.1,1.11,1.2,1.22
1,1,1.1,1.11,1.2,1.22
2,2,1.1,1.11,1.2,1.22


## Sources

- [Python for Data Analysis](https://www.oreilly.com/library/view/python-for-data/9781491957653/)