# 9. Boolean Indexing on DataFrames with Multiple Conditions

### Objectives

+ Create complex filters with the and (**`&`**), or (**`|`**), and not (**`~`**) logical operators
+ Wrap each filter with parentheses when more than one occur at the same line
+ For readability, assign each filter to its own variable
+ Use the **`isin`** method to test for multiple equalities in the same column

## Multiple condition expression
So far, our boolean selections have involved a single condition. You can have as many conditions as you would like. To do so, you will need to combine your boolean expressions using the three logical operators and, or, and not.

## Use `&`, `|` , `~`
Although Python uses the keywords `and`, `or`, and `not`, these will not work when with Pandas. 

You must use the following operators:

* **`&`** for and
* **`|`** for or
* **`~`** for not

## Our first multiple condition expression
Let's find all the rides longer than 1,000 seconds when it was cloudy. We assign each condition to separate variables and then combine them with the and operator.

In [1]:
import pandas as pd
bikes = pd.read_csv('../data/bikes.csv')

In [2]:
filt1 = bikes['tripduration'] > 1000
filt2 = bikes['events'] == 'cloudy'
filt = filt1 & filt2

bikes[filt].head(3)

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
8,21028,Subscriber,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wood St & Division St,41.90332,-87.67273,15.0,71.1,8.0,0.0,-9999.0,cloudy
18,40924,Subscriber,Male,2013-07-09 13:12:00,2013-07-09 14:42:00,5396,Canal St & Jackson Blvd,41.878114,-87.639971,35.0,Millennium Park,41.881032,-87.624084,35.0,79.0,10.0,13.8,0.0,cloudy
80,90932,Subscriber,Female,2013-07-22 07:59:00,2013-07-22 08:19:00,1224,Lincoln Ave & Armitage Ave,41.918273,-87.638116,19.0,Dearborn St & Adams St,41.879356,-87.629791,19.0,73.4,10.0,0.0,-9999.0,cloudy


## Multiple conditions in one line
It is possible to combine the entire expression into a single line. Many pandas users like doing this, others hate it. Regardless, it is a good idea to know how to do so as you will definitely encounter it.

## Use parentheses to separate conditions
You must encapsulate each condition in a set of parentheses in order to make this work.

Each condition will be separated like this:

```

(bikes['tripduration'] > 1000) & (bikes['events'] == 'cloudy')

```

## Same results
We can then drop this expression inside of just the indexing operator to get the same results:

In [3]:
bikes[(bikes['tripduration'] > 1000) & (bikes['events'] == 'cloudy')].head(3)

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
8,21028,Subscriber,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wood St & Division St,41.90332,-87.67273,15.0,71.1,8.0,0.0,-9999.0,cloudy
18,40924,Subscriber,Male,2013-07-09 13:12:00,2013-07-09 14:42:00,5396,Canal St & Jackson Blvd,41.878114,-87.639971,35.0,Millennium Park,41.881032,-87.624084,35.0,79.0,10.0,13.8,0.0,cloudy
80,90932,Subscriber,Female,2013-07-22 07:59:00,2013-07-22 08:19:00,1224,Lincoln Ave & Armitage Ave,41.918273,-87.638116,19.0,Dearborn St & Adams St,41.879356,-87.629791,19.0,73.4,10.0,0.0,-9999.0,cloudy


## Again, I prefer assigning each condition to its own variable

## Using an `or` condition
Let's find all the rides that were done by females **or** had trip durations longer than 1,000 seconds.

For the or condition, we use the pipe character **`|`**

In [4]:
filt1 = bikes['tripduration'] > 1000
filt2 = bikes['gender'] == 'Female'
filt = filt1 | filt2

bikes[filt].head(3)

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
8,21028,Subscriber,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wood St & Division St,41.90332,-87.67273,15.0,71.1,8.0,0.0,-9999.0,cloudy
9,23558,Subscriber,Female,2013-07-04 15:00:00,2013-07-04 15:16:00,922,Lakeview Ave & Fullerton Pkwy,41.925858,-87.638973,19.0,Racine Ave & Congress Pkwy,41.87464,-87.65703,19.0,81.0,10.0,12.7,-9999.0,mostlycloudy


## Reversing a condition with the not operator
The tilde character, **`~`**, represents the not operator and reverses a condition.  For instance, if we wanted all the rides with trip duration less than or equal to 1000, we could do it like this:

In [5]:
filt = bikes['tripduration'] > 1000
bikes[~filt].head(3)

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
0,7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
1,7524,Subscriber,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wells St & Walton St,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0,partlycloudy
3,12907,Subscriber,Male,2013-07-01 10:05:00,2013-07-01 10:16:00,667,Carpenter St & Huron St,41.894556,-87.653449,19.0,Clark St & Randolph St,41.884576,-87.63189,31.0,72.0,10.0,16.1,-9999.0,mostlycloudy


Of course, reversing single conditions is pretty pointless as we can simply use the less than or equal to operator instead like this:

In [6]:
filt = bikes['tripduration'] <= 1000
bikes[filt].head(3)

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
0,7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
1,7524,Subscriber,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wells St & Walton St,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0,partlycloudy
3,12907,Subscriber,Male,2013-07-01 10:05:00,2013-07-01 10:16:00,667,Carpenter St & Huron St,41.894556,-87.653449,19.0,Clark St & Randolph St,41.884576,-87.63189,31.0,72.0,10.0,16.1,-9999.0,mostlycloudy


### Reverse a more complex condition
Typically, we will save the not operator for reversing more complex conditions. Let's reverse the condition for selecting rides by females or those with duration over 1,000 seconds. Logically, this should return only male riders with duration 1,000 or less.

In [7]:
filt1 = bikes['tripduration'] > 1000
filt2 = bikes['gender'] == 'Female'
filt = filt1 | filt2
bikes[~filt].head(3)

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
0,7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
1,7524,Subscriber,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wells St & Walton St,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0,partlycloudy
3,12907,Subscriber,Male,2013-07-01 10:05:00,2013-07-01 10:16:00,667,Carpenter St & Huron St,41.894556,-87.653449,19.0,Clark St & Randolph St,41.884576,-87.63189,31.0,72.0,10.0,16.1,-9999.0,mostlycloudy


## Even more complex conditions
It is possible to build extremely complex conditions to select rows of your DataFrame that meet a very specific condition. For instance, we can select males riders with trip duration between 1,000 and 2,000 seconds along with female riders with trip duration betwee 5,000 and 10,000 seconds.

With multiple conditions, its probably best to break out the logic into multiple steps:

In [8]:
filt1 = (bikes['gender'] == 'Male') & (bikes['tripduration'] >= 1000) & (bikes['tripduration'] <= 2000)
filt2 = (bikes['gender'] == 'Female') & (bikes['tripduration'] >= 5000) & (bikes['tripduration'] <= 10000)
filt = filt1 | filt2

In [9]:
bikes[filt].head(10)

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
8,21028,Subscriber,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wood St & Division St,41.90332,-87.67273,15.0,71.1,8.0,0.0,-9999.0,cloudy
10,24383,Subscriber,Male,2013-07-04 17:17:00,2013-07-04 17:42:00,1523,Morgan St & 18th St,41.858086,-87.651073,15.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,79.0,10.0,9.2,-9999.0,mostlycloudy
11,24673,Subscriber,Male,2013-07-04 18:13:00,2013-07-04 18:42:00,1697,Ashland Ave & Armitage Ave,41.917859,-87.668919,15.0,Lincoln Ave & Armitage Ave,41.918273,-87.638116,19.0,79.0,10.0,10.4,-9999.0,mostlycloudy
13,30404,Subscriber,Male,2013-07-06 09:43:00,2013-07-06 10:06:00,1365,May St & Randolph St,41.88397,-87.655688,15.0,Millennium Park,41.881032,-87.624084,35.0,78.1,10.0,5.8,-9999.0,partlycloudy
26,51130,Subscriber,Male,2013-07-12 01:07:00,2013-07-12 01:24:00,1043,State St & Harrison St,41.873958,-87.627739,19.0,Racine Ave & 18th St,41.858181,-87.656487,15.0,64.9,10.0,0.0,-9999.0,clear
34,54257,Subscriber,Male,2013-07-12 18:13:00,2013-07-12 18:40:00,1616,Clinton St & Madison St,41.881582,-87.641277,23.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,78.1,10.0,10.4,-9999.0,partlycloudy
40,61401,Subscriber,Female,2013-07-14 14:08:00,2013-07-14 15:53:00,6274,Wabash Ave & Roosevelt Rd,41.867173,-87.625955,19.0,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,87.1,10.0,8.1,-9999.0,partlycloudy
41,64257,Subscriber,Male,2013-07-15 06:26:00,2013-07-15 06:44:00,1125,Racine Ave & Fullerton Ave,41.925563,-87.658404,19.0,State St & Kinzie St,41.88918,-87.6277,15.0,73.9,10.0,0.0,-9999.0,partlycloudy
47,67013,Subscriber,Male,2013-07-15 19:10:00,2013-07-15 19:34:00,1463,Lake Shore Dr & Ohio St,41.89257,-87.614492,19.0,Lake Shore Dr & Ohio St,41.89257,-87.614492,19.0,80.1,10.0,6.9,-9999.0,mostlycloudy


## Lots of equality conditions in a single column - use `isin`
Occasionally, we will want to test equality in a single column with multiple values. This is most common in string columns. For instance, let’s say we wanted to find all the rides where the events were either rain, snow, tstorms or sleet.

One way to do this would be with four or conditions.

In [10]:
filt = ((bikes['events'] == 'rain') | 
        (bikes['events'] == 'snow') | 
        (bikes['events'] == 'tstorms') | 
        (bikes['events'] == 'sleet'))

bikes[filt].head(3)

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
45,66336,Subscriber,Male,2013-07-15 16:43:00,2013-07-15 16:55:00,727,Greenwood Ave & 47th St,41.809835,-87.599383,15.0,State St & Harrison St,41.873958,-87.627739,19.0,82.9,10.0,5.8,0.0,rain
78,89180,Subscriber,Male,2013-07-21 16:35:00,2013-07-21 17:06:00,1809,Michigan Ave & Pearson St,41.89766,-87.62351,23.0,Millennium Park,41.881032,-87.624084,35.0,82.4,10.0,11.5,0.0,tstorms
79,89228,Subscriber,Male,2013-07-21 16:47:00,2013-07-21 17:03:00,999,Carpenter St & Huron St,41.894556,-87.653449,19.0,Carpenter St & Huron St,41.894556,-87.653449,19.0,82.4,10.0,11.5,0.0,tstorms


Instead, use the **`isin`** method and pass it a list of all the acceptable values:

In [11]:
filt = bikes['events'].isin(['rain', 'snow', 'tstorms', 'sleet'])
bikes[filt].head(3)

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
45,66336,Subscriber,Male,2013-07-15 16:43:00,2013-07-15 16:55:00,727,Greenwood Ave & 47th St,41.809835,-87.599383,15.0,State St & Harrison St,41.873958,-87.627739,19.0,82.9,10.0,5.8,0.0,rain
78,89180,Subscriber,Male,2013-07-21 16:35:00,2013-07-21 17:06:00,1809,Michigan Ave & Pearson St,41.89766,-87.62351,23.0,Millennium Park,41.881032,-87.624084,35.0,82.4,10.0,11.5,0.0,tstorms
79,89228,Subscriber,Male,2013-07-21 16:47:00,2013-07-21 17:03:00,999,Carpenter St & Huron St,41.894556,-87.653449,19.0,Carpenter St & Huron St,41.894556,-87.653449,19.0,82.4,10.0,11.5,0.0,tstorms


## Combining isin with other filters
You can use the resulting boolean Series from the isin method in the same way you would from the logical operators. For instance, If we wanted to find all the rides that had the same events and had a duration greater than 10,000 we would do the following:

In [12]:
filt1 = bikes['events'].isin(['rain', 'snow', 'tstorms', 'sleet'])
filt2 = bikes['tripduration'] > 2000
filt = filt1 & filt2

bikes[filt].head()

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
2344,1266453,Subscriber,Female,2014-03-19 07:23:00,2014-03-19 08:00:00,2181,Seeley Ave & Roscoe St,41.943403,-87.679618,11.0,Franklin St & Lake St,41.885837,-87.6355,23.0,43.0,3.0,6.9,0.07,rain
7697,3557596,Subscriber,Male,2014-09-12 14:20:00,2014-09-12 14:57:00,2213,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,California Ave & Division St,41.903029,-87.697474,15.0,52.0,2.0,12.7,0.0,rain
8357,3801419,Subscriber,Male,2014-09-30 08:21:00,2014-09-30 08:58:00,2246,Damen Ave & Melrose Ave,41.9406,-87.6785,11.0,Wood St & Taylor St,41.869154,-87.671045,15.0,46.9,3.0,11.5,0.0,rain
8506,3846762,Subscriber,Male,2014-10-04 12:33:00,2014-10-04 14:06:00,5568,Halsted St & Diversey Pkwy,41.933341,-87.648747,15.0,Halsted St & Wrightwood Ave,41.929143,-87.649077,15.0,42.1,8.0,17.3,0.02,rain
11267,4822906,Subscriber,Male,2015-04-10 17:25:00,2015-04-10 18:00:00,2074,Stetson Ave & South Water St,41.886835,-87.62232,19.0,Lake Shore Dr & Wellington Ave,41.936669,-87.636794,15.0,46.9,10.0,17.3,0.0,rain


# Exercises

### Problem 1
<span  style="color:green; font-size:16px">Select all movies from the 1970s that had IMDB scores greater than 8</span>

In [None]:
# your code here

### Problem 2
<span  style="color:green; font-size:16px">Select movies that were rated either R, PG-13, or PG.</span>

In [None]:
# your code here

### Problem 3
<span  style="color:green; font-size:16px">Select movies that are either rated PG-13 or were made after 2010.</span>

In [None]:
# your code here

### Problem 4
<span  style="color:green; font-size:16px">Find all the movies that have at least one of the three actors with more than 10,000 Facebook likes.</span>

In [None]:
# your code here

### Problem 5
<span  style="color:green; font-size:16px">Reverse the condition from problem 6. Use one line of code. In words, what have you selected.</span>

In [None]:
# your code here