# Boolean Selection Single Conditions

**Boolean Selection**, also referred to as **boolean indexing**, is the process of selecting subsets of rows from DataFrames (or Series) based on the actual **values** and NOT by labels or integer locations. All of the previous subset selections were done using either labels or integer location. Those selections had nothing to do with the actual values.

### Examples of boolean selection

Let's see some examples of actual questions (in plain English) that boolean selection can help us answer from the bikes dataset. The term **query** is used to refer to these sorts of questions.

* Find all rides by males
* Find all rides with a duration longer than 2 hours
* Find all rides that took place between March and June of 2015
* Find all rides with a duration longer than 2 hours by females with temperature higher than 90 degrees

### All queries have a logical condition

Each of the above queries have a strict logical condition that must be checked one row at a time.

### Keep or discard an entire row of data

If you were to manually answer the above queries, you would need to scan each row and determine whether the row, as a whole, meets the condition. If so, then it is kept in the result, otherwise it is discarded.

### Each row will have a True or False value associated with it

When you perform boolean selection, each row of the DataFrame (or value of a Series) has a `True` or `False` value associated with it corresponding to the outcome of the logical condition.

### Begin with a small DataFrame

Let's perform our first boolean selection on our sample DataFrame. Let's read it in now.

In [1]:
import pandas as pd
df = pd.read_csv('../data/sample_data.csv', index_col=0)
df

Unnamed: 0_level_0,state,color,food,age,height,score
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Jane,NY,blue,Steak,30,165,4.6
Niko,TX,green,Lamb,2,70,8.3
Aaron,FL,red,Mango,12,120,9.0
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8
Christina,TX,black,Melon,33,172,9.5
Cornelia,TX,red,Beans,69,150,2.2


## Manually filtering the data

Let's find all the people who are younger than 30 years of age. We will do this manually by inspecting the data.

### Create a list of booleans

By inspecting the data, we see that `Niko`, `Aaron`, and `Penelope` are all under 30 years of age. To signify which people are under 30, we create a list of 7 boolean values corresponding to the 7 rows in the DataFrame. The values in the list that correspond with the positions of `Niko`, `Aaron`, and `Penelope` are `True`. All other values are `False`. `Niko`, `Aaron`, and `Penelope` are the 2nd, 3rd, and 4th rows, so these are the locations in the list that are `True`.

In [2]:
filt = [False, True, True, True, False, False, False]

### Variable name `filt`

The variable name `filt` will be used throughout the book to refer to the sequence of booleans. You are free to use any variable name you like for the sequence of booleans, but being consistent makes your code easier to understand. I chose `filt` because it is short for the word 'filter'. Boolean selection filters the data for a particular condition, which is why this variable name makes sense to me.

### Place this list in just the brackets

The above list has `True` in the 2nd, 3rd, and 4th positions. These will be the rows that are kept in the resulting boolean selection. Place the list inside *just the brackets* to complete the selection.

In [3]:
df[filt]

Unnamed: 0_level_0,state,color,food,age,height,score
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Niko,TX,green,Lamb,2,70,8.3
Aaron,FL,red,Mango,12,120,9.0
Penelope,AL,white,Apple,4,80,3.3


### Wait a second… Isn’t `[ ]` just for column selection?

The primary purpose of *just the brackets* for a DataFrame is to select one or more columns by using either a string or a list of strings. All of a sudden, this example shows entire rows being selected with boolean values. This is what makes pandas, unfortunately, a confusing library to learn and use.

## Operator overloading

*Just the brackets* is **overloaded**. Depending on the inputs, pandas will do something completely different. Here are the rules for the different objects passed to *just the brackets*.

* **string** — return a column as a Series
* **list of strings** — return all those column names as a DataFrame
* **sequence of booleans** — select all rows where `True`

In summary, *just the brackets* primarily selects columns, but, if you pass it a sequence of booleans, it will select all rows that are `True`.

## Practical boolean selection

We almost never create boolean lists manually like we did above and instead use the actual data to create boolean Series.

### Creating boolean Series from column data

By far the most common way to create a boolean Series is from the values of one particular column. We test a condition using one of the six comparison operators:

* `<`
* `<=`
* `>`
* `>=`
* `==`
* `!=`

Let's begin completing practical boolean selection examples by reading in the bikes dataset.

In [4]:
bikes = pd.read_csv('../data/bikes.csv')
bikes.head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
0,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,11.0,Michigan Ave & Oak St,15.0,73.9,12.7,mostlycloudy
1,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,31.0,Wells St & Walton St,19.0,69.1,6.9,partlycloudy
2,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,15.0,Dearborn St & Monroe St,23.0,73.0,16.1,mostlycloudy


### Create a boolean Series

Let's create a boolean Series by determining which rows have a trip duration greater than 1,000 seconds. To make the comparison, we select the `tripduration` column as a Series and compare it against the integer 1,000.

In [None]:
filt = bikes['column'] < 1000 #df['colname']followed by boolean logic

In [5]:
filt = bikes['tripduration'] > 1000
filt.head(3)

0    False
1    False
2     True
Name: tripduration, dtype: bool

When we write `bikes['tripduration'] > 1000`, pandas compares each value in the `tripduration` column against 1,000. It returns a new Series the same length as `tripduartion` with boolean values corresponding to the outcome of the comparison. Let's verify that the `filt` Series is the same length as the DataFrame.

In [6]:
len(filt)

50089

In [7]:
len(bikes)

50089

### Manually verify correctness

Take a look at the `tripduration` column to manually verify that only the third row satisfied the condition. That ride lasted 1,040 seconds which is greater than 1,000 resulting in a value of `True`. The first two rides lasted less than 1,000 seconds and resulting with `False`.

### Complete the boolean selection

We can now place the `filt` boolean Series into *just the brackets* to filter the entire DataFrame. This returns all the rows in the bikes dataset that have a trip duration greater than 1,000. Manually verify that the `tripduration` values on the screen are greater than 1,000.

In [8]:
bikes[filt].head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
2,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,15.0,Dearborn St & Monroe St,23.0,73.0,16.1,mostlycloudy
8,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,31.0,Wood St & Division St,15.0,71.1,0.0,cloudy
10,Male,2013-07-04 17:17:00,2013-07-04 17:42:00,1523,Morgan St & 18th St,15.0,Damen Ave & Pierce Ave,19.0,79.0,9.2,mostlycloudy


### How many rows have a trip duration greater than 1000?

To answer this question, let's assign the result of the boolean selection to a variable, and then compare the number of rows between it and the original DataFrame.

In [9]:
bikes_duration_1000 = bikes[filt]

Let's find the number of rows in each DataFrame.

In [10]:
len(bikes)

50089

In [11]:
len(bikes_duration_1000)

10178

We compute that 20% of the rides are longer than 1,000 seconds.

In [12]:
len(bikes_duration_1000) / len(bikes)

0.20319830701351593

## Boolean selection in one line

Often, you will see boolean selection completed in a single line of code instead of the two lines we used above. The expression for the filter is placed directly within *just the brackets*. Although this method will save a line of code, I recommend assigning the filter as a separate variable to help with readability.

In [13]:
bikes[bikes['tripduration'] > 1000].head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
2,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,15.0,Dearborn St & Monroe St,23.0,73.0,16.1,mostlycloudy
8,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,31.0,Wood St & Division St,15.0,71.1,0.0,cloudy
10,Male,2013-07-04 17:17:00,2013-07-04 17:42:00,1523,Morgan St & 18th St,15.0,Damen Ave & Pierce Ave,19.0,79.0,9.2,mostlycloudy


## Single condition expression

Our first example tested a single condition (whether the trip duration was 1,000 or more). Let's test a different single condition and find all the rides that left from station State St & Van Buren St. We use the `==` operator to test for equality and again pass this variable to *just the brackets* which completes our selection.

In [14]:
filt = bikes['from_station_name'] == 'State St & Van Buren St'
bikes[filt].head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
7,Male,2013-07-03 09:07:00,2013-07-03 09:16:00,505,State St & Van Buren St,27.0,Franklin St & Jackson Blvd,27.0,64.0,5.8,cloudy
20,Female,2013-07-09 17:39:00,2013-07-09 17:55:00,943,State St & Van Buren St,27.0,State St & 16th St,15.0,82.9,9.2,mostlycloudy
55,Male,2013-07-16 15:31:00,2013-07-16 15:37:00,363,State St & Van Buren St,27.0,Daley Center Plaza,47.0,91.0,8.1,mostlycloudy


## Summary of single condition boolean selection

Boolean selection refers to the act of filtering data based on the values, and not on the labels or integer location. There are two main steps to do boolean selection:

1. Create a boolean Series - commonly done by comparing one column of data to another value
2. Place the boolean Series inside *just the brackets* to filter the data

## Exercises

Continue to use the bikes dataset for the first few exercises.

### Exercise 1

<span  style="color:green; font-size:16px">Find all the rides with temperature below 0.</span>

In [15]:
bikes.head()

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
0,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,11.0,Michigan Ave & Oak St,15.0,73.9,12.7,mostlycloudy
1,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,31.0,Wells St & Walton St,19.0,69.1,6.9,partlycloudy
2,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,15.0,Dearborn St & Monroe St,23.0,73.0,16.1,mostlycloudy
3,Male,2013-07-01 10:05:00,2013-07-01 10:16:00,667,Carpenter St & Huron St,19.0,Clark St & Randolph St,31.0,72.0,16.1,mostlycloudy
4,Male,2013-07-01 11:16:00,2013-07-01 11:18:00,130,Damen Ave & Pierce Ave,19.0,Damen Ave & Pierce Ave,19.0,73.0,17.3,partlycloudy


In [16]:
filt_temp = bikes['temperature'] < 0

bikes[filt_temp]

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
1871,Male,2013-12-12 05:13:00,2013-12-12 05:27:00,878,California Ave & North Ave,11.0,Carpenter St & Huron St,19.0,-2.0,6.9,mostlycloudy
2049,Female,2014-01-23 06:15:00,2014-01-23 06:29:00,828,Stave St & Armitage Ave,11.0,Ashland Ave & Division St,19.0,-2.0,16.1,partlycloudy
2054,Male,2014-01-23 21:15:00,2014-01-23 21:21:00,351,LaSalle St & Illinois St,31.0,McClurg Ct & Illinois St,31.0,-0.9,12.7,clear
2062,Male,2014-01-27 17:07:00,2014-01-27 17:11:00,242,Clark St & Randolph St,31.0,Clinton St & Washington Blvd,31.0,-5.1,15.0,partlycloudy
2063,Male,2014-01-27 17:19:00,2014-01-27 17:24:00,304,Damen Ave & Augusta Blvd,15.0,Damen Ave & Pierce Ave,19.0,-5.1,15.0,partlycloudy
2064,Male,2014-01-28 08:03:00,2014-01-28 08:12:00,546,Clinton St & Lake St,19.0,Franklin St & Jackson Blvd,31.0,-8.0,12.7,clear
2117,Male,2014-02-12 01:21:00,2014-02-12 01:40:00,1149,Dayton St & North Ave,19.0,May St & Fulton St,15.0,-0.9,5.8,clear
2221,Male,2014-03-03 05:58:00,2014-03-03 06:04:00,332,Peoria St & Jackson Blvd,19.0,Franklin St & Jackson Blvd,31.0,-0.9,11.5,partlycloudy
10245,Male,2015-01-05 08:33:00,2015-01-05 08:58:00,1554,Wabash Ave & Roosevelt Rd,23.0,Hermitage Ave & Polk St,15.0,-2.0,11.5,partlycloudy
10246,Male,2015-01-05 08:45:00,2015-01-05 08:51:00,381,Dearborn Pkwy & Delaware Pl,19.0,State St & Kinzie St,23.0,-2.0,11.5,partlycloudy


### Exercise 2

<span  style="color:green; font-size:16px">Find all the rides with wind speed greater than 30.</span>

In [17]:
filt_wind = bikes['wind_speed'] > 30
bikes[filt_wind].head()

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
2164,Male,2014-02-20 19:06:00,2014-02-20 19:07:00,66,Pine Grove Ave & Waveland Ave,23.0,Pine Grove Ave & Waveland Ave,23.0,46.9,31.1,mostlycloudy
2165,Male,2014-02-20 20:47:00,2014-02-20 21:14:00,1605,Millennium Park,35.0,Clark St & Wrightwood Ave,15.0,39.0,35.7,cloudy
2479,Male,2014-04-01 08:28:00,2014-04-01 08:29:00,82,Desplaines St & Kinzie St,19.0,Desplaines St & Kinzie St,19.0,33.1,31.1,mostlycloudy
2680,Female,2014-04-12 15:46:00,2014-04-12 15:54:00,487,Western Ave & Division St,15.0,Damen Ave & Chicago Ave,15.0,79.0,31.1,mostlycloudy
4265,Female,2014-06-15 13:17:00,2014-06-15 13:29:00,713,Sheffield Ave & Wellington Ave,23.0,Lake Shore Dr & Belmont Ave,19.0,82.0,31.1,mostlycloudy


### Exercise 3

<span  style="color:green; font-size:16px">Find all the rides that began from station 'Millennium Park'.</span>

In [20]:
filt_start = bikes['from_station_name'] == 'Millennium Park'
bikes[filt_start].head()

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
32,Male,2013-07-12 15:10:00,2013-07-12 15:24:00,888,Millennium Park,35.0,McClurg Ct & Illinois St,23.0,82.0,10.4,partlycloudy
42,Male,2013-07-15 08:17:00,2013-07-15 08:25:00,466,Millennium Park,35.0,Clinton St & Madison St,23.0,82.0,0.0,partlycloudy
66,Male,2013-07-17 20:56:00,2013-07-17 21:14:00,1073,Millennium Park,35.0,Morgan St & 18th St,15.0,86.0,9.2,partlycloudy
258,Female,2013-08-17 22:10:00,2013-08-17 22:53:00,2566,Millennium Park,35.0,Theater on the Lake,15.0,69.1,5.8,clear
356,Male,2013-08-27 18:30:00,2013-08-27 18:36:00,373,Millennium Park,35.0,Clark St & Randolph St,31.0,90.0,13.8,partlycloudy


### Exercise 4

<span  style="color:green; font-size:16px">Find all the rides with wind speed less than 0. How is this possible?</span>

In [21]:
filt2 = bikes['wind_speed'] < 0
bikes[filt2]

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
22990,Male,2016-03-19 10:08:00,2016-03-19 10:20:00,702,Wood St & Division St,15.0,Campbell Ave & Fullerton Ave,15.0,42.8,-9999.0,mostlycloudy
27168,Female,2016-06-30 11:47:00,2016-06-30 11:51:00,240,Kimball Ave & Belmont Ave,23.0,Avers Ave & Belmont Ave,19.0,-9999.0,-9999.0,unknown
28368,Female,2016-07-21 21:02:29,2016-07-21 21:20:28,1079,Sedgwick St & North Ave,19.0,Ashland Ave & Division St,19.0,73.0,-9999.0,tstorms
29308,Male,2016-08-07 09:16:42,2016-08-07 09:22:59,377,Broadway & Cornelia Ave,15.0,Southport Ave & Roscoe St,19.0,77.0,-9999.0,mostlycloudy
29309,Male,2016-08-07 09:29:44,2016-08-07 09:41:16,693,Mies van der Rohe Way & Chicago Ave,15.0,Lake Shore Dr & North Blvd,39.0,77.0,-9999.0,mostlycloudy
29310,Male,2016-08-07 09:36:20,2016-08-07 09:48:21,722,Clark St & Lincoln Ave,23.0,Dearborn St & Erie St,23.0,77.0,-9999.0,mostlycloudy
38549,Male,2017-05-09 16:05:14,2017-05-09 16:26:27,1273,Wabash Ave & 16th St,31.0,State St & Pearson St,27.0,60.1,-9999.0,cloudy
38550,Male,2017-05-09 16:12:43,2017-05-09 16:19:47,424,Michigan Ave & Pearson St,23.0,McClurg Ct & Illinois St,31.0,60.1,-9999.0,cloudy
38551,Male,2017-05-09 16:46:51,2017-05-09 17:00:18,807,Michigan Ave & Lake St,31.0,Canal St & Adams St,47.0,60.1,-9999.0,cloudy
38552,Male,2017-05-09 16:57:29,2017-05-09 17:07:22,593,Michigan Ave & Lake St,31.0,Canal St & Adams St,47.0,60.1,-9999.0,cloudy


### Exercise 5

<span  style="color:green; font-size:16px">Find all the rides where the starting number of bikes at the station (start_capacity) was more than 50.</span>

In [22]:
filt_cap = bikes['start_capacity'] > 50
bikes[filt_cap]

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
36122,Male,2017-02-17 17:00:36,2017-02-17 17:23:27,1371,Field Museum,51.0,Lake Shore Dr & North Blvd,39.0,63.0,10.4,partlycloudy
37617,Male,2017-04-14 18:44:47,2017-04-14 19:00:53,966,Field Museum,51.0,Burnham Harbor,23.0,63.0,4.6,cloudy
37920,Male,2017-04-22 12:28:51,2017-04-22 12:44:14,923,Field Museum,51.0,Indiana Ave & Roosevelt Rd,39.0,55.9,12.7,mostlycloudy
37940,Female,2017-04-22 17:12:55,2017-04-22 17:23:44,649,Field Museum,51.0,Wabash Ave & Roosevelt Rd,23.0,57.9,12.7,partlycloudy
39102,Male,2017-05-21 20:40:10,2017-05-21 20:51:29,679,Field Museum,51.0,Buckingham Fountain,27.0,51.1,12.7,cloudy
39446,Female,2017-05-29 14:39:41,2017-05-29 14:54:22,881,Field Museum,51.0,Fort Dearborn Dr & 31st St,15.0,75.9,24.2,partlycloudy
39709,Male,2017-06-02 16:58:55,2017-06-02 17:17:23,1108,Field Museum,51.0,Blue Island Ave & 18th St,15.0,84.9,8.1,partlycloudy
41494,Female,2017-07-01 18:47:42,2017-07-01 19:04:26,1004,Shedd Aquarium,55.0,Federal St & Polk St,19.0,78.1,15.0,partlycloudy
41664,Female,2017-07-05 09:19:40,2017-07-05 09:28:38,538,Shedd Aquarium,55.0,Federal St & Polk St,19.0,80.1,5.8,mostlycloudy
41971,Male,2017-07-10 08:22:52,2017-07-10 08:35:00,728,Field Museum,55.0,Franklin St & Jackson Blvd,39.0,68.0,0.0,partlycloudy


### Exercise 6

<span style="color:green; font-size:16px">Did any rides happen in temperature over 100 degrees?</span>

In [23]:
filt_hot = bikes['temperature'] > 100
bikes[filt_hot]

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events


#### Read in new data

Read in the movie dataset by executing the cell below and use it for the following exercises.

In [24]:
import pandas as pd
movie = pd.read_csv('../data/movie.csv', index_col='title')
movie.head(3)

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,...,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,...,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8


### Exercise 7

<span  style="color:green; font-size:16px">Select all movies that have 'Tom Hanks' as `actor1`. How many of these movies has he starred in?</span>

In [25]:
filt_tom = movie['actor1'] == 'Tom Hanks'
len(movie[filt_tom])

24

### Exercise 8

<span  style="color:green; font-size:16px">Select movies with an IMDB score greater than 9.</span>

In [26]:
filt_score = movie['imdb_score'] > 9
movie[filt_score]

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Shawshank Redemption,1994.0,Color,R,142.0,Frank Darabont,0.0,Morgan Freeman,11000.0,Jeffrey DeMunn,745.0,...,461.0,28341469.0,Crime|Drama,199.0,1689764,escape from prison|first person narration|pris...,English,USA,25000000.0,9.3
Towering Inferno,,Color,,65.0,John Blanchard,0.0,Martin Short,770.0,Andrea Martin,179.0,...,176.0,,Comedy,,10,,English,Canada,,9.5
Dekalog,,Color,TV-MA,55.0,,,Krystyna Janda,20.0,Olaf Lubaszenko,3.0,...,2.0,447093.0,Drama,53.0,12590,meaning of life|moral challenge|morality|searc...,Polish,Poland,,9.1
The Godfather,1972.0,Color,R,175.0,Francis Ford Coppola,0.0,Al Pacino,14000.0,Marlon Brando,10000.0,...,3000.0,134821952.0,Crime|Drama,208.0,1155770,crime family|mafia|organized crime|patriarch|r...,English,USA,6000000.0,9.2
Kickboxer: Vengeance,2016.0,,,90.0,John Stockwell,134.0,Matthew Ziff,260000.0,T.J. Storm,454.0,...,354.0,,Action,2.0,246,,,USA,17000000.0,9.1


### Exercise 9

<span  style="color:green; font-size:16px">Write a function that accepts a single parameter to find the number of movies for a given content rating. Use the function to find the number of movies for ratings 'R', 'PG-13', and 'PG'.</span>

In [34]:
def num_movies(rating):
    filt = movie['content_rating'] == rating
    count = len(movie[filt])
    return f'There are {count} movies rated {rating}.'

In [35]:
num_movies('R')

'There are 2067 movies rated R.'

In [36]:
num_movies('PG-13')

'There are 1411 movies rated PG-13.'

In [37]:
num_movies('PG')

'There are 686 movies rated PG.'