In [2]:
import sys
print(sys.version)
import numpy as np
print(np.__version__)
import pandas as pd
print(pd.__version__)

3.6.1 |Anaconda 4.4.0 (x86_64)| (default, May 11 2017, 13:04:09) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
1.12.1
0.20.1


# Bracket Indexing in Pandas

Let's talk about the tools we have for pulling rows, columns, or individual values out of Pandas objects.  This is more complicated than indexing in NumPy.  A lot of the complication comes from supporting both integer positions and Index labels.  We'll try to teach you some best practices to avoid errors.

## Indexing with a Series

If you have a Series, indexing is a lot like NumPy, but you can place either an integer position, or a label in the square brackets.  We've already seen some examples of this.

Here's some synthetic data to work with.

In [4]:
treats = pd.Series( [x*2 for x in range(5)],
    index = ['mouse_{}'.format(x) for x in range(5)])
treats

mouse_0    0
mouse_1    2
mouse_2    4
mouse_3    6
mouse_4    8
dtype: int64

In [5]:
treats[1]

2

In [13]:
treats['mouse_1']

2

You can put several labels or positions in a sequence.

In [15]:
treats[['mouse_1','mouse_4']]

mouse_1    2
mouse_4    8
dtype: int64

You can use a boolean Series, much like NumPy.

In [16]:
treats > 3

mouse_0    False
mouse_1    False
mouse_2     True
mouse_3     True
mouse_4     True
dtype: bool

In [17]:
treats[treats > 3]

mouse_2    4
mouse_3    6
mouse_4    8
dtype: int64

You can use a slice with either integer positions or labels

In [9]:
treats[1:3]

mouse_1    2
mouse_2    4
dtype: int64

In [12]:
treats['mouse_1':'mouse_3']

mouse_1    2
mouse_2    4
mouse_3    6
dtype: int64

Notice that those don't behave in exactly the same way.  When using labels, pandas breaks with standard Python convention (unfortunately), and includes the end point.  Pay attention to this, because it's easy to grab an extra row by mistake.

## Simple Indexing for a DataFrame

Now let's try indexing with a DataFrame.  Here's some more synthetic data.

In [3]:
np.random.seed(200)
Mice = pd.DataFrame( np.random.geometric(.2, size = (5,5)) , 
             columns = ['test_{}'.format(x) for x in range(5)],
             index = ['mouse_{}'.format(x) for x in range(5)])
Mice

Unnamed: 0,test_0,test_1,test_2,test_3,test_4
mouse_0,14,2,5,3,7
mouse_1,1,2,11,3,18
mouse_2,10,20,12,2,9
mouse_3,1,7,2,1,13
mouse_4,8,4,10,4,3


For a simple index - one argument in square brackets after a DataFrame, Pandas will select *columns*.  We're already seen this.

In [20]:
Mice['test_1']

mouse_0     2
mouse_1     2
mouse_2    20
mouse_3     7
mouse_4     4
Name: test_1, dtype: int64

Remember, if you're careful and avoid spaces and reserved characters in your column names, there's a nicer way to write this

In [23]:
Mice.test_1

mouse_0     2
mouse_1     2
mouse_2    20
mouse_3     7
mouse_4     4
Name: test_1, dtype: int64

You can pull out multiple columns with a list.

In [26]:
Mice[['test_1','test_4']]

Unnamed: 0,test_1,test_4
mouse_0,2,7
mouse_1,2,18
mouse_2,20,9
mouse_3,7,13
mouse_4,4,3


There are a few cases in which indexing works differently.  First, if you pass in a boolean array, it will be used to filter *rows*, not columns.

In [30]:
Mice.test_1 >3

mouse_0    False
mouse_1    False
mouse_2     True
mouse_3     True
mouse_4     True
Name: test_1, dtype: bool

In [32]:
Mice[Mice.test_1 > 3]

Unnamed: 0,test_0,test_1,test_2,test_3,test_4
mouse_2,10,20,12,2,9
mouse_3,1,7,2,1,13
mouse_4,8,4,10,4,3


This is a bit unexpected, but filtering rows is such a common operation that the developers of Pandas wanted it to be possible in a few keystrokes.

This is more rare, but you can pass boolean DataFrame to index.  This is mainly useful if you want to set values throughout the DataFrame based on a condition.

For an example, suppose that we needed to censor the data above 15, setting it to 15 (perhaps for privacy purposes).

In [40]:
Mice > 15

Unnamed: 0,test_0,test_1,test_2,test_3,test_4
mouse_0,False,False,False,False,False
mouse_1,False,False,False,False,True
mouse_2,False,True,False,False,False
mouse_3,False,False,False,False,False
mouse_4,False,False,False,False,False


In [41]:
Mice[Mice > 15] = 15
Mice

Unnamed: 0,test_0,test_1,test_2,test_3,test_4
mouse_0,14,2,5,3,7
mouse_1,1,2,11,3,15
mouse_2,10,15,12,2,9
mouse_3,1,7,2,1,13
mouse_4,8,4,10,4,3


Another special case is that slices are matches against *rows*.

In [34]:
Mice[0:2]

Unnamed: 0,test_0,test_1,test_2,test_3,test_4
mouse_0,14,2,5,3,7
mouse_1,1,2,11,3,18


In [36]:
Mice['mouse_0':'mouse_2']

Unnamed: 0,test_0,test_1,test_2,test_3,test_4
mouse_0,14,2,5,3,7
mouse_1,1,2,11,3,18
mouse_2,10,20,12,2,9


This is a pretty rare operation, but it's good to know that the exception exists.