## MultiIndexes

MultiIndexes extend pandas indesing, allowing you to designate multiple columns as indexes. For example, you may have data for each month of the year, from multiple years. In this case, you might want to use month as the index, but you would not want pandas to treat January, 2019, as the same as January, 2020. You you would want indexes both for month, and for year.

MultiIndexes can be applied to both rows (for which we've already learned about single-indexing), and to columns. 

Imagine we collected reaction time (RT) data from an individual human participant in two different testing sessions. Each session involved 10 experimental trials. Between the first and the second session, the person played cognitive training games and we want to know if their RTs decreased due to the training. So we can load in the two data files (one from each session):

In [18]:
sess_1 = pd.read_csv('session_1.csv', index_col='trial')
sess_2 = pd.read_csv('session_2.csv', index_col='trial')

Now we view the data from each session:

In [19]:
sess_1

Unnamed: 0_level_0,rt
trial,Unnamed: 1_level_1
0,0.988
1,0.753
2,0.949
3,0.824
4,0.262
5,0.803
6,0.376
7,0.496
8,0.235
9,0.336


In [20]:
sess_2

Unnamed: 0_level_0,rt
trial,Unnamed: 1_level_1
0,0.718
1,0.851
2,0.747
3,0.52
4,0.991
5,0.004
6,0.547
7,0.883
8,0.841
9,0.195


You can see that because of the `index_col='trial'` argument to `pd_read_csv()`, trial number is used as the index for each DataFrame

Now we can concatenate the data. One way to do this is simply appending the rows of `sess_2` to the bottom of `sess_1`, and use the `axis=0` argument to specify conctenation is by rows:

In [21]:
sess_12 = pd.concat([sess_1, sess_2], axis=0)
sess_12

Unnamed: 0_level_0,rt
trial,Unnamed: 1_level_1
0,0.988
1,0.753
2,0.949
3,0.824
4,0.262
5,0.803
6,0.376
7,0.496
8,0.235
9,0.336


The problem with the result above is that we don't know which session each data point came from. We know session names from the names of the files, but that information doesn't get used in making the DataFrame. We can deal with this by manually specifying the session names, and using them as row indexes. Critically, we will use MultiIndexing so that `trial` is retained as an index. In other words, there are two indexes.

In [22]:
sess_12 = pd.concat([sess_1, sess_2], keys=['sess_1', 'sess_2'], axis=0)
sess_12

Unnamed: 0_level_0,Unnamed: 1_level_0,rt
Unnamed: 0_level_1,trial,Unnamed: 2_level_1
sess_1,0,0.988
sess_1,1,0.753
sess_1,2,0.949
sess_1,3,0.824
sess_1,4,0.262
sess_1,5,0.803
sess_1,6,0.376
sess_1,7,0.496
sess_1,8,0.235
sess_1,9,0.336


We can the use the `.loc[]` property to select all trials from one session or the other:

In [23]:
sess_12.loc['sess_1']

Unnamed: 0_level_0,rt
trial,Unnamed: 1_level_1
0,0.988
1,0.753
2,0.949
3,0.824
4,0.262
5,0.803
6,0.376
7,0.496
8,0.235
9,0.336
