# Chapter 8. Data Wrangling: Join, Combine, and Reshape
<a id='index'></a>
## Table of Content
- [8.1 Hierarchical Indexing](#81)
    - [8.1.1 Reordering and Sorting Levels](#811)

## 8.1 Hierarchical Indexing
<a id='81'></a>

In [3]:
import numpy as np
import pandas as pd

In [5]:
# Series with multi-indexes
data = pd.Series(np.random.randn(9), index=[['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'], [1,2,3,1,3,1,2,2,3]])
data

a  1    1.009228
   2    0.979057
   3   -0.489714
b  1    1.412141
   3    0.935255
c  1    0.071491
   2    0.264734
d  2   -2.254515
   3    0.883833
dtype: float64

In [6]:
# What you’re seeing is a prettified view of a Series with a MultiIndex as its index. The
# “gaps” in the index display mean “use the label directly above”:
data.index

MultiIndex(levels=[['a', 'b', 'c', 'd'], [1, 2, 3]],
           labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 2, 0, 1, 1, 2]])

In [7]:
data['b']

1    1.412141
3    0.935255
dtype: float64

In [8]:
data['b':'c']

b  1    1.412141
   3    0.935255
c  1    0.071491
   2    0.264734
dtype: float64

In [17]:
data.loc[['b', 'd']]

b  1    1.412141
   3    0.935255
d  2   -2.254515
   3    0.883833
dtype: float64

In [15]:
# Selection is even possible from an “inner” level:
data.loc[:, 2]

a    0.979057
c    0.264734
d   -2.254515
dtype: float64

In [19]:
# you could rearrange the data into a DataFrame using its unstack method
data.unstack()

Unnamed: 0,1,2,3
a,1.009228,0.979057,-0.489714
b,1.412141,,0.935255
c,0.071491,0.264734,
d,,-2.254515,0.883833


In [20]:
# The inverse operation of unstack is stack:
data.unstack().stack()

a  1    1.009228
   2    0.979057
   3   -0.489714
b  1    1.412141
   3    0.935255
c  1    0.071491
   2    0.264734
d  2   -2.254515
   3    0.883833
dtype: float64

In [21]:
# With a DataFrame, either axis can have a hierarchical index
frame = pd.DataFrame(np.arange(12).reshape((4, 3)), 
                     index=[['a','a','b','b'],
                            ['1','2','1','2']], 
                     columns=[['Ohio', 'Ohio', 'Colorado'],
                              ['Green', 'Red', 'Green']])
frame

Unnamed: 0_level_0,Unnamed: 1_level_0,Ohio,Ohio,Colorado
Unnamed: 0_level_1,Unnamed: 1_level_1,Green,Red,Green
a,1,0,1,2
a,2,3,4,5
b,1,6,7,8
b,2,9,10,11


In [22]:
# The hierarchical levels can have names (as strings or any Python objects). 
# If so, these will show up in the console output:
frame.index.names = ['key1', 'key2']
frame.columns.names = ['state', 'color']

frame

Unnamed: 0_level_0,state,Ohio,Ohio,Colorado
Unnamed: 0_level_1,color,Green,Red,Green
key1,key2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
a,1,0,1,2
a,2,3,4,5
b,1,6,7,8
b,2,9,10,11


In [24]:
# With partial column indexing you can similarly select groups of columns:
frame['Ohio']

Unnamed: 0_level_0,color,Green,Red
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
a,1,0,1
a,2,3,4
b,1,6,7
b,2,9,10


A MultiIndex can be created by itself and then reused; the columns in the preceding DataFrame with level names could be created like this:

<hr>

### 8.1.1 Reordering and Sorting Levels
<a id='811'></a>

<hr>

[Back to top](#index)