In [1]:
import pandas as pd
import numpy as np
import re

# Chapter 8

## 8.1 Hierarchical Indexing

hierarchical indexing enables you to have multiple index levels on an axis. example: create a Series with a list of lists as the index:

In [2]:
data = pd.Series(np.random.randn(9),
                index= [['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'],
                       [1, 2, 3, 1, 3, 1, 2, 2, 3]])

In [3]:
data

a  1    1.185779
   2    1.436521
   3   -0.706687
b  1    1.041711
   3   -1.454098
c  1    0.683225
   2   -1.016601
d  2   -0.207848
   3    2.225874
dtype: float64

In [4]:
data.index

MultiIndex([('a', 1),
            ('a', 2),
            ('a', 3),
            ('b', 1),
            ('b', 3),
            ('c', 1),
            ('c', 2),
            ('d', 2),
            ('d', 3)],
           )

In [5]:
#partial indexing possible with a hiercrchically indexed object. makes it easy to select subsets of data:
data['b']

1    1.041711
3   -1.454098
dtype: float64

In [6]:
data['b':'c']

b  1    1.041711
   3   -1.454098
c  1    0.683225
   2   -1.016601
dtype: float64

In [7]:
data.loc[['b', 'd']]

b  1    1.041711
   3   -1.454098
d  2   -0.207848
   3    2.225874
dtype: float64

In [8]:
#selection possible from an "inner" level
data.loc[:, 2]

a    1.436521
c   -1.016601
d   -0.207848
dtype: float64

hierarchical indexing plays an important role in reshaping data and group-based operations like forming a pivot table. example: could rearrange the data into a DF using its _unstack_ method:


In [9]:
data.unstack()

Unnamed: 0,1,2,3
a,1.185779,1.436521,-0.706687
b,1.041711,,-1.454098
c,0.683225,-1.016601,
d,,-0.207848,2.225874


In [10]:
#inverse operation of unstack is stack
data.unstack().stack()

a  1    1.185779
   2    1.436521
   3   -0.706687
b  1    1.041711
   3   -1.454098
c  1    0.683225
   2   -1.016601
d  2   -0.207848
   3    2.225874
dtype: float64

**with a DF, either axis can have a hierarchical index:**

In [11]:
frame = pd.DataFrame(np.arange(12).reshape((4, 3)),
                    index=[['a', 'a', 'b', 'b'],[1, 2, 1, 2]],
                    columns=[['Ohio', 'Ohio', 'Colorado'],
                            ['Green', 'Red', 'Green']])

In [12]:
frame

Unnamed: 0_level_0,Unnamed: 1_level_0,Ohio,Ohio,Colorado
Unnamed: 0_level_1,Unnamed: 1_level_1,Green,Red,Green
a,1,0,1,2
a,2,3,4,5
b,1,6,7,8
b,2,9,10,11


In [13]:
frame.index.names = ['key1', 'key2']

In [14]:
frame.columns.names = ['state', 'color']

In [15]:
frame

Unnamed: 0_level_0,state,Ohio,Ohio,Colorado
Unnamed: 0_level_1,color,Green,Red,Green
key1,key2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
a,1,0,1,2
a,2,3,4,5
b,1,6,7,8
b,2,9,10,11


In [16]:
#with partial column indexing you can similarly select groups of columns:
frame['Ohio']

Unnamed: 0_level_0,color,Green,Red
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
a,1,0,1
a,2,3,4
b,1,6,7
b,2,9,10


a MultiIndex can be created by itself and then reused; the columns in the preceding DF with level names could be created like this:

In [17]:
pd.MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'],
                          ['Green', 'Red', 'Green']],
                         names=['state', 'color'])

MultiIndex([(    'Ohio', 'Green'),
            (    'Ohio',   'Red'),
            ('Colorado', 'Green')],
           names=['state', 'color'])

### Reordering and Sorting Levels