# Agenda: Mask Index

- Comparisons
- Broadcasts and comparisons
- Using that to filter our series with a "boolean index" or a "mask index"
- Complex comparisons with "and" and "or"

In [1]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

In [7]:
s = Series([10,20,30,40,50,60,70], index=list('abcdefg'))

In [9]:
s.loc['d']

40

In [10]:
s.iloc[4]

50

In [11]:
#inside of the [], I can put a list of locations that I want to retrieve

s.loc[['a','d']]

a    10
d    40
dtype: int64

In [12]:
s.iloc[[2,5]]

c    30
f    60
dtype: int64

In [13]:
#there is anoether way that we can retrieve values, though we can pass a list of boolean values

s.loc[[True,False,False,True,True,False,True]]

a    10
d    40
e    50
g    70
dtype: int64

# Boolean/mask index

The idea here is: 
- Pass, inside of [], a list of booleans
- Whereever there is a True value, we get the value from the original series
- Whereever there is a False value, the original value is ignored

In [14]:
s > 30 #This is a comparison operation, broadcast across all values of s

a    False
b    False
c    False
d     True
e     True
f     True
g     True
dtype: bool

In [16]:
# I can take this boolean and use it as a mask index with .loc

s.loc[s>30] #only have [] once here, becasue we're getting a series back from s>30

#say this as: show the values of s where s>30

d    40
e    50
f    60
g    70
dtype: int64

# How to read a mask index expression

- First, look at the stuff inside of the []. What expression is there, and what does it return?
- Next, think of it as an existing boolean series
- Then apply that boolean series to the series on the outside

In [18]:
#Let's find all of the values that are greater than the mean

s>s.mean()

a    False
b    False
c    False
d    False
e     True
f     True
g     True
dtype: bool

In [19]:
#this gives us all of the values where the value is greater than the mean of the series

s.loc[s>s.mean()]

e    50
f    60
g    70
dtype: int64

In [21]:
#this does not work with integer values

s.loc[[1,0,1,0,1,1,0]]

KeyError: "None of [Index([1, 0, 1, 0, 1, 1, 0], dtype='int64')] are in the [index]"

# Exercise: Finding extreme temperatures

1. Create two series, 'highs' and 'lows', which contain the forecast high and low temps for the coming 10 days
2. Find all high temps that are greater than the mean
3. Are any high temps greater than the mean + 1 standard deviation?
4. Calculate the difference in temperature fore each of these days.
5. Show all days on which the difference was greater than the median difference.

In [48]:
h_temp = Series([73,66,65,72,66,68,69,76,73,75],
                 index='Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu'.split())
l_temp = Series([54,58,57,57,56,55,55,54,54,52], 
                index='Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu'.split())

In [49]:
h_temp.loc[h_temp>h_temp.mean()]

Tue    73
Fri    72
Tue    76
Wed    73
Thu    75
dtype: int64

In [50]:
h_temp.loc[h_temp>(h_temp.mean()+h_temp.std())]

Tue    76
Thu    75
dtype: int64

In [51]:
diff = h_temp - l_temp
diff

Tue    19
Wed     8
Thu     8
Fri    15
Sat    10
Sun    13
Mon    14
Tue    22
Wed    19
Thu    23
dtype: int64

In [52]:
diff.loc[diff>diff.median()]

Tue    19
Fri    15
Tue    22
Wed    19
Thu    23
dtype: int64

In [53]:
#we can create a boolean series with one series
# and we then apply it on another series

#for example: Show me values of "highs" where "lows" is less than the mean

h_temp.loc[l_temp < l_temp.mean()]

Tue    73
Sun    68
Mon    69
Tue    76
Wed    73
Thu    75
dtype: int64

# Mask index across series

You can create a mask index with one series and then apply it to another series. Wherever we got a True value back from the comparison, the value in the applied series will make it through.

In [55]:
m = Series([True,False,True,False,True,True,False,True,False, True])

m.index = h_temp.index

In [56]:
h_temp.loc[m]

Tue    73
Thu    65
Sat    66
Sun    68
Tue    76
Thu    75
dtype: int64

In [57]:
m.index

Index(['Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu'], dtype='object')

# How can we combine conditionals with "and" and "or"?

In [58]:
s

a    10
b    20
c    30
d    40
e    50
f    60
g    70
dtype: int64

In [60]:
# I want all elements of s that are odd

s.loc[s%2] #s%2 doesn't return booleans! It returns integers, in a series without an index

KeyError: "None of [Index([0, 0, 0, 0, 0, 0, 0], dtype='int64')] are in the [index]"

In [63]:
# we need to get back a boolean series!
# we get that (inside of [], and then apply it to s with ".loc"

s = Series([10,15,20,25,30], index=list('abcde'))

s.loc[s%2==1]

b    15
d    25
dtype: int64

In [64]:
#How can I find all of those values that are even *and* greater than the mean?

s.loc[s%2==0] and s.loc[s>s.mean()]

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

# What's going on?

In Python, when we use 'if', 'if' looks to its right and sees if the expression is True of False.

If it's neither True nor False, 'if' asks the object what it's closer to. In other words, it tries to coerce the value into a boolean. In such cases, everything in Python is considered True except for:

- None
- 0
- False
- Anything empty -- '',(),[],{}

A Series is the only type of value that refuses to give a boolean value

- 