# Advanced indexing

In [8]:
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
try:
    import seaborn
except ImportError:
    pass

pd.options.display.max_rows = 10

This dataset is borrowed from the [PyCon tutorial of Brandon Rhodes](https://github.com/brandon-rhodes/pycon-pandas-tutorial/) (so all credit to him!). You can download these data from here: [`titles.csv`](https://drive.google.com/file/d/0B3G70MlBnCgKa0U4WFdWdGdVOFU/view?usp=sharing) and [`cast.csv`](https://drive.google.com/file/d/0B3G70MlBnCgKRzRmTWdQTUdjNnM/view?usp=sharing) and put them in the `/data` folder.

In [2]:
cast = pd.read_csv('data/cast.csv')
cast.head()

Unnamed: 0,title,year,name,type,character,n
0,Suuri illusioni,1985,Homo $,actor,Guests,22.0
1,Gangsta Rap: The Glockumentary,2007,Too $hort,actor,Himself,
2,Menace II Society,1993,Too $hort,actor,Lew-Loc,27.0
3,Porndogs: The Adventures of Sadie,2009,Too $hort,actor,Bosco,3.0
4,Stop Pepper Palmer,2014,Too $hort,actor,Himself,


In [3]:
titles = pd.read_csv('data/titles.csv')
titles.head()

Unnamed: 0,title,year
0,The Rising Son,1990
1,Ashes of Kukulcan,2016
2,The Thousand Plane Raid,1969
3,Crucea de piatra,1993
4,The 86,2015


## Setting columns as the index

Why is it useful to have an index?

- Giving meaningful labels to your data -> easier to remember which data are where
- Unleash some powerful methods, eg with a DatetimeIndex for time series
- Easier and faster selection of data

It is this last one we are going to explore here!

Setting the `title` column as the index:

In [4]:
c = cast.set_index('title')

In [5]:
c.head()

Unnamed: 0_level_0,year,name,type,character,n
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Suuri illusioni,1985,Homo $,actor,Guests,22.0
Gangsta Rap: The Glockumentary,2007,Too $hort,actor,Himself,
Menace II Society,1993,Too $hort,actor,Lew-Loc,27.0
Porndogs: The Adventures of Sadie,2009,Too $hort,actor,Bosco,3.0
Stop Pepper Palmer,2014,Too $hort,actor,Himself,


Instead of doing:

In [13]:
%%time
cast[cast['title'] == 'Hamlet']

CPU times: user 476 ms, sys: 16 ms, total: 492 ms
Wall time: 495 ms


Unnamed: 0,title,year,name,type,character,n
11638,Macbeth,2015,Darren Adamson,actor,Soldier,
20153,Macbeth,1916,Spottiswoode Aitken,actor,Duncan,4
23106,Macbeth,1948,Robert Alan,actor,Third Murderer,
24080,Macbeth,2016,John Albasiny,actor,Doctor,
34024,Macbeth,1948,William Alland,actor,Second Murderer,18
...,...,...,...,...,...,...
3288130,Macbeth,1998,Jessica Werbin,actress,Lady Macduff,
3298214,Macbeth,2014,Finty Williams,actress,Lady Macduff,
3301599,Macbeth,2006,Jamie-Lee Wilson,actress,Female Constable,39
3302941,Macbeth,1998,Dawn Winarski,actress,Lady Macbeth,2


we can now do:

In [14]:
%%time
c.loc['Hamlet']

CPU times: user 188 ms, sys: 4 ms, total: 192 ms
Wall time: 195 ms


Unnamed: 0_level_0,year,name,type,character,n
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Macbeth,2015,Darren Adamson,actor,Soldier,
Macbeth,1916,Spottiswoode Aitken,actor,Duncan,4
Macbeth,1948,Robert Alan,actor,Third Murderer,
Macbeth,2016,John Albasiny,actor,Doctor,
Macbeth,1948,William Alland,actor,Second Murderer,18
...,...,...,...,...,...
Macbeth,1998,Jessica Werbin,actress,Lady Macduff,
Macbeth,2014,Finty Williams,actress,Lady Macduff,
Macbeth,2006,Jamie-Lee Wilson,actress,Female Constable,39
Macbeth,1998,Dawn Winarski,actress,Lady Macbeth,2


But you can also have multiple columns as the index, leading to a **multi-index or hierarchical index**:

In [16]:
c = cast.set_index(['title', 'year'])

In [17]:
c.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,name,type,character,n
title,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Suuri illusioni,1985,Homo $,actor,Guests,22.0
Gangsta Rap: The Glockumentary,2007,Too $hort,actor,Himself,
Menace II Society,1993,Too $hort,actor,Lew-Loc,27.0
Porndogs: The Adventures of Sadie,2009,Too $hort,actor,Bosco,3.0
Stop Pepper Palmer,2014,Too $hort,actor,Himself,


In [31]:
%%time
c.loc[('Hamlet', 2000),:]

CPU times: user 40 ms, sys: 12 ms, total: 52 ms
Wall time: 50.5 ms




Unnamed: 0_level_0,Unnamed: 1_level_0,name,type,character,n
title,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Hamlet,2000,Casey Affleck,actor,Fortinbras,15
Hamlet,2000,Paul Bartel,actor,Osric,14
Hamlet,2000,Paul Ferriter,actor,Special Guest Appearance,23
Hamlet,2000,Larry Fessenden,actor,Kissing Man,24
Hamlet,2000,Karl Geary,actor,Horatio,8
Hamlet,...,...,...,...,...
Hamlet,2000,Anne (II) Nixon,actress,Special Guest Appearance,34
Hamlet,2000,India Reed Kotis,actress,Special Guest Appearance,29
Hamlet,2000,Kelly Sebastian,actress,Secretary,39
Hamlet,2000,Julia Stiles,actress,Ophelia,7


In [36]:
c2 = c.sort_index()

In [58]:
%%time
c2.loc[('Hamlet', 2000),:]

CPU times: user 8 ms, sys: 0 ns, total: 8 ms
Wall time: 7.3 ms


Unnamed: 0_level_0,Unnamed: 1_level_0,name,type,character,n
title,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Hamlet,2000,Casey Affleck,actor,Fortinbras,15
Hamlet,2000,Paul Bartel,actor,Osric,14
Hamlet,2000,Paul Ferriter,actor,Special Guest Appearance,23
Hamlet,2000,Larry Fessenden,actor,Kissing Man,24
Hamlet,2000,Karl Geary,actor,Horatio,8
Hamlet,...,...,...,...,...
Hamlet,2000,Anne (II) Nixon,actress,Special Guest Appearance,34
Hamlet,2000,India Reed Kotis,actress,Special Guest Appearance,29
Hamlet,2000,Kelly Sebastian,actress,Secretary,39
Hamlet,2000,Julia Stiles,actress,Ophelia,7
