## Multi-Index and Index Hierarchy

Let us go over how to work with Multi-Index, first we'll create a quick example of what a Multi-Indexed DataFrame would look like:

In [7]:
import numpy as np
import pandas as pd
from numpy.random import randn
np.random.seed(101)

In [8]:
# Index Levels
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
hier_index = pd.MultiIndex.from_tuples(hier_index)

In [9]:
outside

['G1', 'G1', 'G1', 'G2', 'G2', 'G2']

In [10]:
inside

[1, 2, 3, 1, 2, 3]

In [11]:
hier_index

MultiIndex([('G1', 1),
            ('G1', 2),
            ('G1', 3),
            ('G2', 1),
            ('G2', 2),
            ('G2', 3)],
           )

In [14]:
df = pd.DataFrame(randn(6,2), hier_index, ['A','B'])

In [15]:
df

Unnamed: 0,Unnamed: 1,A,B
G1,1,0.188695,-0.758872
G1,2,-0.933237,0.955057
G1,3,0.190794,1.978757
G2,1,2.605967,0.683509
G2,2,0.302665,1.693723
G2,3,-1.706086,-1.159119


In [16]:
# Now let's show how to index this! For index hierarchy we use df.loc[], if this was on the columns axis, you would just use normal bracket notation df[]. Calling one level of the index returns the sub-dataframe:
df.loc['G1']

Unnamed: 0,A,B
1,0.188695,-0.758872
2,-0.933237,0.955057
3,0.190794,1.978757


In [17]:
df.loc['G1'].loc[1]

A    0.188695
B   -0.758872
Name: 1, dtype: float64

In [18]:
df.index.names

FrozenList([None, None])

In [19]:
df.index.names = ['Groups','Num']

In [20]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
Groups,Num,Unnamed: 2_level_1,Unnamed: 3_level_1
G1,1,0.188695,-0.758872
G1,2,-0.933237,0.955057
G1,3,0.190794,1.978757
G2,1,2.605967,0.683509
G2,2,0.302665,1.693723
G2,3,-1.706086,-1.159119


In [22]:
df.loc['G2'].loc[2]['B']

np.float64(1.693722925204035)

In [24]:
# Returns a cross-section(Rows or Columns) from the Series/DataFrame. Defaults to cross-section on the rows(axis = 0)
df.xs   # Used when we have a multi level index

<bound method NDFrame.xs of                    A         B
Groups Num                    
G1     1    0.188695 -0.758872
       2   -0.933237  0.955057
       3    0.190794  1.978757
G2     1    2.605967  0.683509
       2    0.302665  1.693723
       3   -1.706086 -1.159119>

In [25]:
df.xs('G1')

Unnamed: 0_level_0,A,B
Num,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0.188695,-0.758872
2,-0.933237,0.955057
3,0.190794,1.978757


In [26]:
df.xs(1,level='Num') # Useful to directly get the data placed there

Unnamed: 0_level_0,A,B
Groups,Unnamed: 1_level_1,Unnamed: 2_level_1
G1,0.188695,-0.758872
G2,2.605967,0.683509
