# MultiIndex / advanced indexing

A `MultiIndex` can be thought of as an array of tuples. The tuples can contain repeated entries.

A multiIndex can be created from:

1. A set of arrays `from_arrays`.
2. A set of tuples: `from_tuples`.
3. A crossed set of iterables: `from_product`.
4. A DataFrame: `from_frame`.

Thinking in terms of tuples allows to use the same indexing rules. Note that you need to pass a sequence of iterables to `from_product`.

In [1]:
import pandas as pd
import numpy as np

In [2]:
arrays = [['A', 'A', 'B', 'B'], ['a', 'a', 'a', 'b']]
tuples = list(zip(*arrays))
columns = pd.MultiIndex.from_product([['X', 'Y'], ['x', 'y']])

df = pd.DataFrame(np.random.rand(4, 4), 
    index=pd.MultiIndex.from_tuples(tuples),
    columns=columns)

df

Unnamed: 0_level_0,Unnamed: 1_level_0,X,X,Y,Y
Unnamed: 0_level_1,Unnamed: 1_level_1,x,y,x,y
A,a,0.442376,0.570448,0.334799,0.999831
A,a,0.554026,0.712807,0.369856,0.320812
B,a,0.381864,0.609943,0.537498,0.470492
B,b,0.46367,0.727599,0.948059,0.410741


In [3]:
df.loc[('A', 'a')]

Unnamed: 0_level_0,Unnamed: 1_level_0,X,X,Y,Y
Unnamed: 0_level_1,Unnamed: 1_level_1,x,y,x,y
A,a,0.442376,0.570448,0.334799,0.999831
A,a,0.554026,0.712807,0.369856,0.320812


## Selecting the inner levels

There are several ways to select the inner levels, but the easiest is probably to use the usual notation in association with `.loc(axis=...)`. This removes any ambiguity.

In [4]:
df.loc(axis=1)[:, 'y']

Unnamed: 0_level_0,Unnamed: 1_level_0,X,Y
Unnamed: 0_level_1,Unnamed: 1_level_1,y,y
A,a,0.570448,0.999831
A,a,0.712807,0.320812
B,a,0.609943,0.470492
B,b,0.727599,0.410741


In [5]:
df.loc(axis=0)[:, 'a']

Unnamed: 0_level_0,Unnamed: 1_level_0,X,X,Y,Y
Unnamed: 0_level_1,Unnamed: 1_level_1,x,y,x,y
A,a,0.442376,0.570448,0.334799,0.999831
A,a,0.554026,0.712807,0.369856,0.320812
B,a,0.381864,0.609943,0.537498,0.470492


Another possibility is `df.xs(key, axis, level)`. By default, it drops levels. This can be changed with `drop_level=False`.

In [6]:
df.xs(key='y', axis=1, level=1)

Unnamed: 0,Unnamed: 1,X,Y
A,a,0.570448,0.999831
A,a,0.712807,0.320812
B,a,0.609943,0.470492
B,b,0.727599,0.410741


In [7]:
df.xs(key='x', axis=1, level=1, drop_level=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,X,Y
Unnamed: 0_level_1,Unnamed: 1_level_1,x,x
A,a,0.442376,0.334799
A,a,0.554026,0.369856
B,a,0.381864,0.537498
B,b,0.46367,0.948059


In [8]:
df.xs(key='a', axis=0, level=1, drop_level=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,X,X,Y,Y
Unnamed: 0_level_1,Unnamed: 1_level_1,x,y,x,y
A,a,0.442376,0.570448,0.334799,0.999831
A,a,0.554026,0.712807,0.369856,0.320812
B,a,0.381864,0.609943,0.537498,0.470492


In [9]:
df.loc[(slice(None), 'a'), :]

Unnamed: 0_level_0,Unnamed: 1_level_0,X,X,Y,Y
Unnamed: 0_level_1,Unnamed: 1_level_1,x,y,x,y
A,a,0.442376,0.570448,0.334799,0.999831
A,a,0.554026,0.712807,0.369856,0.320812
B,a,0.381864,0.609943,0.537498,0.470492


## Take methods

Take retrieves elements along an axis *positionally*. It takes either a list or an array of integer positions.

In [12]:
df.take([0, 3])

Unnamed: 0_level_0,Unnamed: 1_level_0,X,X,Y,Y
Unnamed: 0_level_1,Unnamed: 1_level_1,x,y,x,y
A,a,0.442376,0.570448,0.334799,0.999831
B,b,0.46367,0.727599,0.948059,0.410741


In [13]:
df.take([0, 3], axis=1)

Unnamed: 0_level_0,Unnamed: 1_level_0,X,Y
Unnamed: 0_level_1,Unnamed: 1_level_1,x,y
A,a,0.442376,0.999831
A,a,0.554026,0.320812
B,a,0.381864,0.470492
B,b,0.46367,0.410741
