# Hierachical (Multi-) Indexing
* Contact: Lachlan Deer, [econgit] @ldeer, [github/twitter] @lachlandeer

In earlier notebooks we were working with our labor market data and importing it to have a multi-index. In this notebook, we explore the notions of multi-indexing in more detail.


In [None]:
import pandas as pd

## Multiply Indexed Data

We focus on multiply indexed `DataFrames`, and ignore pandas `Series` because we will most often come across `DataFrames`. Most of our discussion carries over.

Let's import our labour market data in the 'simplest way'

In [None]:
data = pd.read_csv('out_data/state_labour_statistics.csv')
data.head()

In [None]:
data.index

Again note that pandas has created an index for us, which is simply a row identifier. We argued earlier that a better way of indexing might be state-year-month. 

To make the shift to our preferred index, we need to reset the index, using the `set_index` function.

In [None]:
data.set_index(['state', 'year', 'period'], inplace=True)
data.head()

If we want to go back to the original index, we can use `reset_index` and ask that the multi-index we set is returned to columns of the data

In [None]:
data.reset_index(drop=False, inplace=True)

In [None]:
data.head()

But let's for now stick with our multi-index data- and see how to use it

In [None]:
data.set_index(['state', 'year', 'period'], inplace=True)

## Selecting Data with a Multi-Index

One advantage of a multi-index is that we can subset data quite simply:

In [None]:
data.loc['Alabama']

In [None]:
data.loc['Alabama', 2010]

In [None]:
data.loc['Alabama', 2010, 'M10']

In [None]:
# we hope this may work...
data.loc['Alabama', :, 'M10']

In [None]:
# we need to sort the index first
data = data.sort_index()
data.head()

In [None]:
data.loc['Alabama', 2010:2016, 'M10':'M12']

### Challenge

1. Extract all the data for the Carolinas (Help: you need to do a partial string match on the index `data.index.get_level_values(XX).str.contains(YY)` )
2. Extract all the data for the Carolinas in 2007
3. Extract all the data for the Carolinas between 2007-2010
4. Extract all the data in the summer months for the Carolina between 2007 and 2010

#### Partial Solution:

In [None]:
data.loc[data.index.get_level_values(0).str.contains("Carolina"), 
             2007:2010, 'M06':'M09']

## Index (Un-)Stacking

One potentially cool use for multi-indexing is using the indexes across *two* dimensions. This is *unstacking*, and often allows for simple ways to view patterns in the data.

For example, we could translate the year-index across to a column axis, so that we could can easily see patterns in labor force statistics in a given month, but over various years:

In [None]:
data.unstack(level=1)

In it's current form this is a bit ugly because there is so much data. But we can use our indexing function `loc` together with a column selection to potentially view one of our series:

In [None]:
data.unstack(level=1).loc['California']['unemployment_rate']

the opposite of unstacking is `stacking`, and it puts our data back into the multi-index format we began with:

In [None]:
data.unstack(level=1).stack(level=1)

In this way, the `unstack`-`stack` functionality is a useful way to view data and potentially see some patterns; even if the data is has high dimensionality in the index

### Challenge
* Use the unstack method to view data about the employment rate among the labor force over years and months in California