# MultiIndex & Index Hierarchy

Now we will create a multi level index DataFrame first.

In [2]:
import numpy as np
import pandas as pd
from numpy.random import randn

### Create two python list

In [3]:
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]

Make list of tuple containing value from outside and inside list. Example ('G1',1)

In [4]:
hier_index = list(zip(outside,inside))

In [5]:
hier_index

[('G1', 1), ('G1', 2), ('G1', 3), ('G2', 1), ('G2', 2), ('G2', 3)]

In [6]:
hier_index = pd.MultiIndex.from_tuples(hier_index)

In [7]:
hier_index

MultiIndex(levels=[['G1', 'G2'], [1, 2, 3]],
           labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])

Now we will create our DataFrame. First we will create 6 by 2 Matrix. That means 6 rows 2 columns Matrix.

In [8]:
Matrix = randn(6,2)

Visualize the Matrix

In [9]:
Matrix

array([[ 1.19650222,  0.46357118],
       [ 0.50434069,  0.7117065 ],
       [ 0.89110965,  0.30514715],
       [-0.69257511, -0.52281504],
       [-0.80392693,  1.3270179 ],
       [ 0.32070233,  0.19360302]])

Create rowLabel & columnLabel

In [10]:
rowLabel = hier_index
columnLabel = ['Column-0','Column-1']

In [11]:
dataFrame = pd.DataFrame(Matrix,rowLabel,columnLabel)

In [13]:
dataFrame

Unnamed: 0,Unnamed: 1,Column-0,Column-1
G1,1,1.196502,0.463571
G1,2,0.504341,0.711706
G1,3,0.89111,0.305147
G2,1,-0.692575,-0.522815
G2,2,-0.803927,1.327018
G2,3,0.320702,0.193603


If we want everything from G1, we will type

In [14]:
dataFrame.loc['G1']

Unnamed: 0,Column-0,Column-1
1,1.196502,0.463571
2,0.504341,0.711706
3,0.89111,0.305147


If we want only first row from G1, we will type

In [19]:
dataFrame.loc['G1'].loc[1]

Column-0    1.196502
Column-1    0.463571
Name: 1, dtype: float64

Our MultiIndex DataFrame Does not have any name avobe first two column. Visualize

In [20]:
dataFrame

Unnamed: 0,Unnamed: 1,Column-0,Column-1
G1,1,1.196502,0.463571
G1,2,0.504341,0.711706
G1,3,0.89111,0.305147
G2,1,-0.692575,-0.522815
G2,2,-0.803927,1.327018
G2,3,0.320702,0.193603


We can also check that we have no name by calling..

In [22]:
dataFrame.index.names

FrozenList([None, None])

Lets set those two index

In [23]:
dataFrame.index.names = ['Groups', 'Id']

In [24]:
dataFrame

Unnamed: 0_level_0,Unnamed: 1_level_0,Column-0,Column-1
Groups,Id,Unnamed: 2_level_1,Unnamed: 3_level_1
G1,1,1.196502,0.463571
G1,2,0.504341,0.711706
G1,3,0.89111,0.305147
G2,1,-0.692575,-0.522815
G2,2,-0.803927,1.327018
G2,3,0.320702,0.193603


Lets say now we want to grab G2 => Row-2 => Column-1 =>1.327018

In [28]:
dataFrame.loc['G2'].loc[2].loc['Column-1']

1.3270178998219955

# Cross Section

Lets visualize our dataFrame again.

In [29]:
dataFrame

Unnamed: 0_level_0,Unnamed: 1_level_0,Column-0,Column-1
Groups,Id,Unnamed: 2_level_1,Unnamed: 3_level_1
G1,1,1.196502,0.463571
G1,2,0.504341,0.711706
G1,3,0.89111,0.305147
G2,1,-0.692575,-0.522815
G2,2,-0.803927,1.327018
G2,3,0.320702,0.193603


Lets say we want to grab everything in G1

In [33]:
dataFrame.xs(['G1'])

Unnamed: 0_level_0,Column-0,Column-1
Id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1.196502,0.463571
2,0.504341,0.711706
3,0.89111,0.305147


Lets say we want to grab all the information from both the groups id = 1

In [36]:
dataFrame.xs(1,level='Id')

Unnamed: 0_level_0,Column-0,Column-1
Groups,Unnamed: 1_level_1,Unnamed: 2_level_1
G1,1.196502,0.463571
G2,-0.692575,-0.522815
