# Intro to Pandas
by Ryan Orsinger

## Module 1: Intro to pandas series

### Pandas Series Part 2

- Using comparison operators to produce Boolean series
- Using Boolean indexing to filter data
- Using OR and AND operations to make compound filters
- Assigning subsets to their own variable
- Operating on subsets in place using `.loc`

In [1]:
import pandas as pd

In [2]:
# Base Python uses numeric indexes to return a single element
ser = range(-2, 3)
ser[0]

-2

In [3]:
# Let's make a pandas series to go into pandas indexing
ser = pd.Series(ser)
ser

0   -2
1   -1
2    0
3    1
4    2
dtype: int64

In [4]:
# The corresponding index for the first element is True and the rest are False
# Using the corresponding index of a Boolean collection to filter is called "Boolean masking"
first = [True, False, False, False, False]
ser[first]

0   -2
dtype: int64

In [5]:
# If all Boolean values are True, we return the original series
all_true = [True, True, True, True, True]
ser[all_true]

0   -2
1   -1
2    0
3    1
4    2
dtype: int64

In [6]:
all_false = [False, False, False, False, False]
ser[all_false]

Series([], dtype: int64)

In [7]:
# The Boolean mask is filtering our results here
first_and_third = [True, False, True, False, False]
ser[first_and_third]

0   -2
2    0
dtype: int64

In [8]:
# Boolean masking leaves the original series intact
ser

0   -2
1   -1
2    0
3    1
4    2
dtype: int64

In [9]:
# Comparison operators return a Boolean series
# Notice this returns a Boolean series
ser == 1

0    False
1    False
2    False
3     True
4    False
dtype: bool

In [10]:
mask = ser == 1
ser[mask]

3    1
dtype: int64

In [11]:
# We can place the Boolean series inside the square brackets directly
ser[ser == 1]

3    1
dtype: int64

In [12]:
# Using a Boolean series in square brackets after the series filters results
is_negative = ser < 0
is_negative

0     True
1     True
2    False
3    False
4    False
dtype: bool

In [13]:
# The True values in the Boolean series enable the corresponding elements. False values hide corresponding elements.
ser[is_negative]

0   -2
1   -1
dtype: int64

In [14]:
# Subsets are copies of the data
negatives = ser[is_negative]
negatives

0   -2
1   -1
dtype: int64

In [15]:
# Reassigning the result of a Boolean mask keeps the original series intact
ser

0   -2
1   -1
2    0
3    1
4    2
dtype: int64

In [16]:
# A comparison operator used with the modulo operator to return a Boolean Series
is_odd = ser % 2 == 1
ser[is_odd]

1   -1
3    1
dtype: int64

In [17]:
# Let's work with some new data
numbers = pd.Series(range(1, 13))
numbers

0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10
10    11
11    12
dtype: int64

In [18]:
# We can use & and | operators on our Boolean series to produce more complex behavior
# Parentheses are helpful for order of operations
numbers[(numbers == 2) | (numbers == 5)]

1    2
4    5
dtype: int64

In [19]:
# If all expressions evaluate to false, we can an empty series
numbers[(numbers == 2) & (numbers == 5)]

Series([], dtype: int64)

In [20]:
# To avoid parentheses, we can assign each Boolean series separately. 
# The & operator is the AND operator for Boolean series
is_even = numbers % 2 == 0
is_divisible_by_3 = numbers % 3 == 0
is_divisible_by_3_or_2 = is_even & is_divisible_by_3
numbers[is_divisible_by_3_or_2]

5      6
11    12
dtype: int64

In [21]:
# If each Boolean series/expression is not assigned to its own variable, we need parentheses for order of operations
# Notice how with an AND operator, both Booleans must be true
numbers[(numbers % 2 == 0) & (numbers % 5 == 0)]

9    10
dtype: int64

In [22]:
# We use the | operator for OR operations 
# This returns all the even numbers or the numbers evenly divisible by 5
numbers[(numbers % 2 == 0) | (numbers % 5 == 0)]

1      2
3      4
4      5
5      6
7      8
9     10
11    12
dtype: int64

In [23]:
# Boolean masking is very powerful, but what about modifying values in place on a series?
# The .loc method use the same Boolean series syntax
is_even = numbers % 2 == 0

# For simplicity, let's assign every even number to 200
numbers.loc[is_even] = 200

In [24]:
numbers

0       1
1     200
2       3
3     200
4       5
5     200
6       7
7     200
8       9
9     200
10     11
11    200
dtype: int64

In [25]:
# What if we need more dynamic reassignment?
numbers = pd.Series(range(1, 13))

# Shorthand syntax would be numbers.loc[is_even] *= 2
numbers.loc[is_even] = numbers.loc[is_even] * 2
numbers

0      1
1      4
2      3
3      8
4      5
5     12
6      7
7     16
8      9
9     20
10    11
11    24
dtype: int64

## Further Reading
- https://pandas.pydata.org/docs/user_guide/indexing.html
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html

## Exercise check-in
- Make a series of the following values `[-4, -3, -2, -1, 0, 1, 2, 3, 4]` and store to a variable named `ser`
- Write the code to filter out only the number 2
- Write the code to filter out the number 2 or 4.
- Create a variable named `is_positive` that will be the Boolean series if the corresponding `ser` values is positive or not. Now create a new variable named `positives` and use this to store only the positive numbers.
- Follow the steps above creating a Boolean series named `is_even` and assign a new variable named `evens` that holds only the even numbers
- Use what you have learned to produce a new Boolean series named `is_even_and_positive` and assign the Boolean values. Use your new Boolean series to create a new variable named `even_positives` that is only the even and positive numbers.
- Follow the pattern from above to produce a new Boolean series named `is_even_or_positive` and assign the appropriate Boolean values using | for OR operations. Use your new Boolean series to create a new variable named `even_or_positive` that is only the even and positive numbers
- Use the `.loc` method to reassign all numbers that are both even and positive to be the number zero.
- Use the `.loc` method and the reassignment syntax to multiply every negative number by 20, in place.

In [26]:
# Make a series out of  [-4, -3, -2, -1, 0, 1, 2, 3, 4]


0   -4
1   -3
2   -2
3   -1
4    0
5    1
6    2
7    3
8    4
dtype: int64

In [27]:
# Write the code to filter out only the number 2


6    2
dtype: int64

In [28]:
# Write the code to filter out the number 2 or 4.


6    2
8    4
dtype: int64

In [30]:
# Create a variable named `is_positive` that will be the Boolean series if the corresponding `ser` values is positive or not. Now create a new variable named `positives` and use this to store only the positive numbers.


5    1
6    2
7    3
8    4
dtype: int64

In [31]:
# Follow the steps above creating a Boolean series named `is_even` and assign a new variable named `evens` that holds only the even numbers


0   -4
2   -2
4    0
6    2
8    4
dtype: int64

In [32]:
# Use what you have learned to produce a new Boolean series named `is_even_and_positive` and assign the Boolean values. Use your new Boolean series to create a new variable named `even_positives` that is only the even and positive numbers.
is_even_and_positive = is_even & is_positive
even_positives = ser[is_even_and_positive]
even_positives

6    2
8    4
dtype: int64

In [33]:
# Follow the pattern from above to produce a new Boolean series named `is_even_or_positive` and assign the appropriate Boolean values using | for OR operations. Use your new Boolean series to create a new variable named `even_or_positive` that is only the even and positive numbers


0   -4
2   -2
4    0
5    1
6    2
7    3
8    4
dtype: int64

In [35]:
# Use the `.loc` method to reassign all numbers that are both even and positive to be the number zero.


0   -4
1   -3
2   -2
3   -1
4    0
5    1
6    0
7    3
8    0
dtype: int64

In [36]:
# Use the `.loc` method and the reassignment syntax to multiply every negative number by 20, in place.

0   -80
1   -60
2   -40
3   -20
4     0
5     1
6     0
7     3
8     0
dtype: int64