## Hieranchical Indexing
Enables you to have multiple index levels on an axis  
Provides a way to work with higher dimensional data in a lower dimensional form


In [2]:
import pandas as pd 
import numpy as np 
data = pd.Series(np.random.randn(9), index=[['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'], [1,2,3,1,3,1,2,2,3]])
data
data.index

MultiIndex([('a', 1),
            ('a', 2),
            ('a', 3),
            ('b', 1),
            ('b', 3),
            ('c', 1),
            ('c', 2),
            ('d', 2),
            ('d', 3)],
           )

Partial indexing possible - concisely select subsets of data


In [6]:
data['b']
data['b':'c']
data.loc[['b','d']]

#selection from 'inner' level
data.loc[:, 2]

a    0.766484
c   -0.794031
d    0.703591
dtype: float64

Reshape data and group-based operations like forming a pivot table  
Ex. rearrage data into a DF using its `unstack()` method  
Inverse operation is `stack()`

In [7]:
data.unstack()
# data.unstack().stack()

Unnamed: 0,1,2,3
a,1.414676,0.766484,0.798812
b,0.32518,,-0.254663
c,1.939074,-0.794031,
d,,0.703591,1.489465


In [11]:
# With a DF either axis can have hierarchical indexes

frame = pd.DataFrame(np.arange(12).reshape((4, 3)), index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]], columns=[['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']])
frame
# can give names to the hierarchical levels
frame.index.names=['key1', 'key2']
frame.columns.names=['state', 'colour']
frame

Unnamed: 0_level_0,state,Ohio,Ohio,Colorado
Unnamed: 0_level_1,colour,Green,Red,Green
key1,key2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
a,1,0,1,2
a,2,3,4,5
b,1,6,7,8
b,2,9,10,11


## Reordering and Sorting Levels
Rearrange order of the levels on an axis  
Sort data by the values in one specific level  

`swaplevel('key1', 'key2')` returns new object with the levels interchanged  
`sort_index(level=1)` sorts data using only values in a single level  

often use sortindex after swaplevel so result lexographically sorted by level 

In [17]:
frame.swaplevel('key1', 'key2')
frame.sort_index(level=1)

Unnamed: 0_level_0,state,Ohio,Ohio,Colorado
Unnamed: 0_level_1,colour,Green,Red,Green
key1,key2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
a,1,0,1,2
b,1,6,7,8
a,2,3,4,5
b,2,9,10,11


## Summary Stats by Level
Many descriptive and summary stats for a DF have a `level` option



In [20]:
frame.sum(level='key2')
frame.sum(level='colour', axis=1)

Unnamed: 0_level_0,colour,Green,Red
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
a,1,2,1
a,2,8,4
b,1,14,7
b,2,20,10


## Indexing with a DF's columns

Case where want to use one or more columns from a DF as the row Indexing  
Or may wish to move the row index into the DF's columns  

`set_index` create a new DF using one or more of its columns as the index  
use `drop=False` to keep columns in the df, otherwise removed to the index  

`reset_index` move hierarchical index levels into columns

In [23]:
frame = pd.DataFrame({'a': range(7), 'b': range(7, 0, -1), 'c': ['one', 'one', 'one', 'two', 'two', 'two', 'two'], 'd': [0, 1, 2, 0, 1, 2, 3]})
frame2 = frame.set_index(['c','d'])
frame2

Unnamed: 0_level_0,Unnamed: 1_level_0,a,b
c,d,Unnamed: 2_level_1,Unnamed: 3_level_1
one,0,0,7
one,1,1,6
one,2,2,5
two,0,3,4
two,1,4,3
two,2,5,2
two,3,6,1


# 8.2 Combining and Merging Datasets

### by default merge does an 'inner' join - keys in the result of the merge are the intersection of the two dfs
### other options are 'left' 'right' and 'outer'  
outer takes the union of the keys combining the effect of applying both left and right joins:
