# 10. Boolean Indexing More

### Objectives

+ Boolean Selection with the brackets on a Series
+ Using the `between` method instead of an `and` condition
+ Simultaneously select rows with boolean selection and columns with a list of names with `.loc`
+ Select rows with missing values with the `isna` method

## Boolean Selection on a Series
All the examples thus far have taken place on the bikes DataFrame. Boolean selection on a Series happens almost identically. Since there is only one dimension of data, the queries you ask are usually going to be simpler.

First, let’s select a single column of data as a Series such as the temperature column.

In [None]:
import pandas as pd
bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])

In [None]:
temp = bikes['temperature']
temp.head()

Let's select temperatures greater than 90

In [None]:
filt = temp > 90
temp[filt].head()

Select temperature less than 0 or greater than 95

In [None]:
filt1 = temp < 0
filt2 = temp > 95
filt = filt1 | filt2

temp[filt].head()

## Re-read data with `starttime` in the index
The default index is not very helpful. Let's re-read data with **`starttime`** in the index. While, this column may not be unique it does provide us with useful information for the index.

In [None]:
bikes = pd.read_csv('../data/bikes.csv', 
                    parse_dates=['starttime', 'stoptime'], 
                    index_col='starttime')
bikes.head()

In [None]:
temp2 = bikes['temperature']
temp2.head()

Let's select temperatures greater than 90. We expect to get a summer month and we do.

In [None]:
filt = temp2 > 90
temp2[filt].head()

Select temperature less than 0 or greater than 95. We expect to get winter months and we do.

In [None]:
filt1 = temp2 < 0
filt2 = temp2 > 95
filt = filt1 | filt2

temp2[filt2].head()

## The `between` method
The `between` method return a boolean Series by testing whether the current value is between two given values. For instance, if want to select the temperatures between 50 and 60 degrees (inclusive), we do the following:

In [None]:
filt = temp2.between(50, 60)
filt.head()

In [None]:
temp2[filt].head()

# Simultaneous boolean selection of rows and column labels with `.loc`
The **`.loc`** indexer was thoroughly covered in an earlier notebook and will now be covered here to simultaneously select rows and columns. Earlier, it was stated that **`.loc`** made selections only by label. This wasn't strictly true as it is also able to do boolean selection along with selection by label.

Remember that **`.loc`** takes both a row selection and a column selection separated by a comma. Since the row selection comes first, you can pass it the same exact inputs that you do for just the brackets and get the same results.

Let's run some of the older examples of boolean selection with **`.loc`**.

In [None]:
filt = bikes['tripduration'] > 1000
bikes.loc[filt].head()

In [None]:
filt = bikes['events'].isin(['rain', 'snow', 'tstorms', 'sleet'])
bikes.loc[filt].head()

## Separate row and column selection with a comma for `.loc`
The great benefit of **`.loc`** is that it allows us to simultaneously do boolean selection along the rows and make column selections by label.

Let's select just the events rain and snow and only the columns events and trip duration.

In [None]:
filt = bikes['events'].isin(['rain', 'snow'])
cols = ['events', 'tripduration']
bikes.loc[filt, cols].head()

## Column to Column Comparisons
So far, we have created conditionals by comparing each of our column values to a single scalar value. It is possible to do element-by-element comparisons by comparing two columns to one another.

For instance, if we wanted to test whether there were more capacity at the start of the ride vs the end, we would do the following:

In [None]:
filt = bikes['dpcapacity_start'] > bikes['dpcapacity_end']

Let's use this filter with **`.loc`** to return all the rows where the start capacity is greater than the end.

In [None]:
cols = ['dpcapacity_start', 'dpcapacity_end']
bikes.loc[filt, cols].head()

### Boolean selection with `.iloc` does not work
The Pandas developers decided not to allow boolean selection with **`.iloc`**.

In [None]:
bikes.iloc[filt]

# Finding Missing Values with `isna`
The **`isna`** method called from either a DataFrame or a Series returns True for every value that is missing and False for any other value. 

Let's see this in action by calling **`isna`** on the start capacity column.

In [None]:
bikes['dpcapacity_start'].isna().head()

### Filtering for missing values

We can now use this boolean Series to select all the rows where the capacity start column is missing. Verify that the 

In [None]:
filt = bikes['dpcapacity_start'].isna()
bikes[filt]

## `isnull` is an alias for `isna`
There is an identical method named **`isnull`** that you will see in other tutorials. It is an **alias** of **`isna`** meaning it does the exact same thing with a different name. Either one is suitable to use but I prefer **`isna`** because of the similarity **na** to **NaN**, the representation of missing values.

# Exercises

### Problem 1
<span  style="color:green; font-size:16px">Select the wind speed column a a Series and assign it to a variable. Are there any negative wind speeds?</span>

### Problem 2
<span  style="color:green; font-size:16px">Select all wind speed between 12 and 16.</span>

### Problem 3
<span  style="color:green; font-size:16px">Select the events and gender columns for all trip durations longer than 1,000 seconds.</span>

### Problem 4
<span  style="color:green; font-size:16px">Read in the movie dataset with the title as the index. We will use this DataFrame for the rest of the problems. Select all the movies such that the Facebook likes for actor 2 are greater than those for actor 1.</span>

### Problem 5
<span  style="color:green; font-size:16px">Select the year, content rating, and IMDB score columns for movies from the year 2016 with IMDB score less than 4.</span>

### Problem 6
<span  style="color:green; font-size:16px">Select all the movies that are missing values for content rating.</span>

### Problem 7
<span  style="color:green; font-size:16px">Select all the movies that are missing both the gross and budget. Return just those columns to verify that those values are indeed missing.</span>

### Problem 8
<span  style="color:green; font-size:16px">Write a function `find_missing` that has three parameters, `df`, `col1` and `col2` where `df` is a DataFrame and `col1` and `col2` are column names. This function should return all the rows of the DataFrame where `col1` and `col2` are missing. Only return the two columns as well. Answer problem 7 with this function.</span>