# Filtering with the `query` Method

The previous chapters on boolean selection showed us how to filter our DataFrames and Series based on their values. We created conditions, usually involving the comparison operators, resulting in boolean Series and passed them to the *just the brackets* or `loc` indexers to filter the data.

In this chapter we cover the `query` method, which enables us to also make selections based on the values of the DataFrame or Series. The `query` method is easier and more intuitive to use than boolean selection, but doesn't provide as much functionality to filter the data. Still, it is a good method to know about to make your subset selections more readable.

## The `query` method

The `query` method allows you to filter the data by writing the condition as a string. For instance, you would use the string `'tripduration > 1000'` to select all rows of the `bikes` dataset that have a `tripduration` greater than 1,000. Let's read in the bikes dataset and run this command now.

In [1]:
import pandas as pd
bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])
bikes.query('tripduration > 1000').head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
2,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,15.0,Dearborn St & Monroe St,23.0,73.0,16.1,mostlycloudy
8,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,31.0,Wood St & Division St,15.0,71.1,0.0,cloudy
10,Male,2013-07-04 17:17:00,2013-07-04 17:42:00,1523,Morgan St & 18th St,15.0,Damen Ave & Pierce Ave,19.0,79.0,9.2,mostlycloudy


### Less syntax and more readable

The `query` method generally uses less syntax than boolean selection and is usually more readable. Reproducing the last result with boolean selection would look like this:

```python
bikes[bikes['tripduration'] > 1000]
```

This looks a bit clumsy with the name `bikes` written twice right next to one another. The `query` method has its own set of rules for what constitutes a correctly written condition within the string you pass it. The rest of this chapter covers all of the available functionality of the `query` method. This syntax only works within the `query` method and is not allowed anywhere else in pandas.

## Use strings `and`, `or`, `not`

Unlike boolean selection, you can use the strings `and`, `or`, and `not` instead of the operators `&`, `|`, and `~` which further aides readability with `query`. Let's select all rides with `tripduration` greater than 1,000 and `temperature` greater than 85.

In [2]:
bikes.query('tripduration > 1000 and temperature > 85').head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
40,Female,2013-07-14 14:08:00,2013-07-14 15:53:00,6274,Wabash Ave & Roosevelt Rd,19.0,Lake Shore Dr & Monroe St,11.0,87.1,8.1,partlycloudy
53,Male,2013-07-16 13:04:00,2013-07-16 13:28:00,1435,Canal St & Jackson Blvd,35.0,Canal St & Jackson Blvd,35.0,90.0,8.1,mostlycloudy
60,Male,2013-07-17 10:23:00,2013-07-17 10:40:00,1024,Clinton St & Washington Blvd,31.0,Larrabee St & Menomonee St,15.0,88.0,5.8,partlycloudy


## Chained comparisons

Let's say we want to find all rides where the temperature was between 50 and 60 degrees. You can do this with `query` by using the `and` operator.

In [3]:
bikes.query('temperature >= 50 and temperature <= 60').head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
590,Female,2013-09-13 07:55:00,2013-09-13 08:01:00,319,Greenview Ave & Fullerton Ave,15.0,Sheffield Ave & Fullerton Ave,15.0,54.0,15.0,partlycloudy
591,Male,2013-09-13 08:04:00,2013-09-13 08:16:00,738,Lincoln Ave & Armitage Ave,19.0,Larrabee St & Kingsbury St,27.0,57.9,13.8,partlycloudy
592,Female,2013-09-13 08:04:00,2013-09-13 08:14:00,599,Orleans St & Elm St,15.0,Sheffield Ave & Kingsbury St,15.0,57.9,13.8,partlycloudy


While this syntax is valid, there is a more compact way. You can use a **chained comparison** to make the string even more readable and concise. A chained comparison places the column name between two comparison operators. The following implies that 50 is less than or equal to the temperature and the temperature is less than or equal to 60 which is equivalent to our previous selection.

In [4]:
bikes.query('50 <= temperature <= 60').head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
590,Female,2013-09-13 07:55:00,2013-09-13 08:01:00,319,Greenview Ave & Fullerton Ave,15.0,Sheffield Ave & Fullerton Ave,15.0,54.0,15.0,partlycloudy
591,Male,2013-09-13 08:04:00,2013-09-13 08:16:00,738,Lincoln Ave & Armitage Ave,19.0,Larrabee St & Kingsbury St,27.0,57.9,13.8,partlycloudy
592,Female,2013-09-13 08:04:00,2013-09-13 08:14:00,599,Orleans St & Elm St,15.0,Sheffield Ave & Kingsbury St,15.0,57.9,13.8,partlycloudy


## Reference strings with quotes

If you would like to reference a literal string within `query`, you need to surround it with quotes, or else pandas will attempt to use it as a column name. Let's select all rides by a 'Female' with a trip duration greater than 2,000.

In [5]:
bikes.query('gender == "Female" and tripduration > 2000').head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
40,Female,2013-07-14 14:08:00,2013-07-14 15:53:00,6274,Wabash Ave & Roosevelt Rd,19.0,Lake Shore Dr & Monroe St,11.0,87.1,8.1,partlycloudy
77,Female,2013-07-21 11:35:00,2013-07-21 13:54:00,8299,State St & 19th St,15.0,Sheffield Ave & Kingsbury St,15.0,82.9,5.8,mostlycloudy
173,Female,2013-08-08 08:49:00,2013-08-08 09:31:00,2502,Sheffield Ave & Addison St,27.0,Dearborn St & Adams St,19.0,71.1,10.4,mostlycloudy


### Forgetting quotes

If you do not use quotes around your literal string, then pandas assumes that value is a column name. The following raises an error. It believes you are accessing a column name Female, which doesn't exist.

In [6]:
bikes.query('gender == Female and tripduration > 2000')

UndefinedVariableError: name 'Female' is not defined

## Column to column comparisons

It is possible to compare each value in one column with each value in another column. Here, we filter for all the rides where there were more bikes at the start than at the end.

In [7]:
bikes.query('start_capacity > end_capacity').head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
1,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,31.0,Wells St & Walton St,19.0,69.1,6.9,partlycloudy
6,Male,2013-07-02 17:47:00,2013-07-02 17:56:00,565,Clark St & Randolph St,31.0,Ravenswood Ave & Irving Park Rd,19.0,66.0,15.0,cloudy
8,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,31.0,Wood St & Division St,15.0,71.1,0.0,cloudy


## Use 'in' for multiple equalities

You can check whether each value in a column is equal to one or more other values by using the word 'in' within your query. Use the syntax for creating a list within the query string to contain all the values you'd like to check. The following tests whether the weather event was snow or rain.

In [9]:
bikes.query('events in ["snow", "rain"]').head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
45,Male,2013-07-15 16:43:00,2013-07-15 16:55:00,727,Greenwood Ave & 47th St,15.0,State St & Harrison St,19.0,82.9,5.8,rain
112,Male,2013-07-26 19:10:00,2013-07-26 19:33:00,1395,Larrabee St & Kingsbury St,27.0,Damen Ave & Pierce Ave,19.0,66.9,12.7,rain
124,Male,2013-07-30 18:53:00,2013-07-30 19:00:00,442,Canal St & Jackson Blvd,35.0,Racine Ave & Congress Pkwy,19.0,69.1,3.5,rain


There are multiple syntaxes for the above that all work the same, but I prefer using the above as it is most similar to the `isin` method used during boolean selection.

* `bikes.query('["snow", "rain"] in events')`
* `bikes.query('["snow", "rain"] == events')`
* `bikes.query('events == ["snow", "rain"]')`

### Use 'not in' to invert the condition

You can invert the result of an 'in' clause by placing the word 'not' before it. Here, we find all the rides that did not have the weather events cloudy, partly cloudy or mostly cloudy.

In [10]:
bikes.query('events not in ["cloudy", "partlycloudy", "mostlycloudy"]').head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
25,Female,2013-07-11 08:17:00,2013-07-11 08:31:00,830,Wabash Ave & Roosevelt Rd,19.0,Daley Center Plaza,47.0,73.9,8.1,clear
26,Male,2013-07-12 01:07:00,2013-07-12 01:24:00,1043,State St & Harrison St,19.0,Racine Ave & 18th St,15.0,64.9,0.0,clear
33,Male,2013-07-12 17:22:00,2013-07-12 17:34:00,730,Clark St & Congress Pkwy,27.0,Racine Ave & Congress Pkwy,19.0,79.0,10.4,clear


## Arithmetic operations within `query`

It is possible to write arithmetic operations within `query` just as you would outside of it. For instance, if we wanted to find all the rides such that there were 20 or more bikes at the start station than at the end, we do the following.

In [11]:
bikes.query('start_capacity - end_capacity >= 20').head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
54,Male,2013-07-16 15:13:00,2013-07-16 15:18:00,347,Daley Center Plaza,47.0,State St & Van Buren St,27.0,91.0,8.1,mostlycloudy
66,Male,2013-07-17 20:56:00,2013-07-17 21:14:00,1073,Millennium Park,35.0,Morgan St & 18th St,15.0,86.0,9.2,partlycloudy
116,Male,2013-07-27 09:54:00,2013-07-27 09:56:00,121,Daley Center Plaza,47.0,LaSalle St & Washington St,15.0,60.8,11.5,cloudy


### Filtering for right triangles

Let's read in the triangles dataset which contains the lengths of each side of a triangle as the columns `a`, `b`, and `c`.

In [12]:
triangles = pd.read_csv('../data/triangles.csv')
triangles.head()

Unnamed: 0,a,b,c
0,2,3,4
1,3,2,4
2,3,4,5
3,3,5,6
4,3,6,7


We can use the `query` method to find all the right triangles, those that satisfy the Pythagorean Theorem. We write the condition using the arithmetic and comparison operators.

In [13]:
triangles.query('a ** 2 + b ** 2 == c ** 2').head()

Unnamed: 0,a,b,c
2,3,4,5
5,4,3,5
14,5,12,13
21,6,8,10
33,7,24,25


The syntax is quite a bit nicer than the boolean selection alternative.

In [None]:
filt = triangles['a'] ** 2 + triangles['b'] ** 2 == triangles['c'] ** 2
triangles[filt].head()

## Reference variable names with the `@` symbol

By default, all words within the query string attempt to reference a column name. You can, however, reference a variable name by preceding it with the `@` symbol. Let's assign the variable name `min_length` to 5,000 and reference it in a query to find all the rides where trip duration was greater than it.

In [14]:
min_length = 5000
bikes.query('tripduration > @min_length').head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
18,Male,2013-07-09 13:12:00,2013-07-09 14:42:00,5396,Canal St & Jackson Blvd,35.0,Millennium Park,35.0,79.0,13.8,cloudy
40,Female,2013-07-14 14:08:00,2013-07-14 15:53:00,6274,Wabash Ave & Roosevelt Rd,19.0,Lake Shore Dr & Monroe St,11.0,87.1,8.1,partlycloudy
77,Female,2013-07-21 11:35:00,2013-07-21 13:54:00,8299,State St & 19th St,15.0,Sheffield Ave & Kingsbury St,15.0,82.9,5.8,mostlycloudy


In [15]:
max_temp = 70
bikes.query('temperature < @max_temp').head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
1,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,31.0,Wells St & Walton St,19.0,69.1,6.9,partlycloudy
6,Male,2013-07-02 17:47:00,2013-07-02 17:56:00,565,Clark St & Randolph St,31.0,Ravenswood Ave & Irving Park Rd,19.0,66.0,15.0,cloudy
7,Male,2013-07-03 09:07:00,2013-07-03 09:16:00,505,State St & Van Buren St,27.0,Franklin St & Jackson Blvd,27.0,64.0,5.8,cloudy


## Using the index with `query`

You can even use the word `index` to make comparisons against the index as if it were a normal column. In the bikes DataFrame, the index is just the integers beginning at 0. Here, we select only the `events` that were 'cloudy' for an index value greater than 4,000.

In [16]:
bikes.query('index > 4000 and events == "cloudy" ').head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
4007,Male,2014-06-07 14:07:00,2014-06-07 14:31:00,1434,Lake Shore Dr & North Blvd,15.0,Halsted St & Roscoe St,15.0,82.0,13.8,cloudy
4008,Male,2014-06-07 14:58:00,2014-06-07 15:19:00,1258,Theater on the Lake,15.0,Sheridan Rd & Buena Ave,15.0,82.0,13.8,cloudy
4009,Male,2014-06-07 15:23:00,2014-06-07 15:28:00,297,Sheffield Ave & Addison St,27.0,Pine Grove Ave & Waveland Ave,23.0,80.1,13.8,cloudy


### Referencing named index

If your DataFrame has an index that is named, which happens when a column is set as the index, then you can use that name within `query` just as if it were a regular column name. Here, we create a new DataFrame that has the `from_station_name` as the index.

In [None]:
bikes_idx = bikes.set_index('from_station_name')

In [17]:
bikes_idx = bikes.set_index('from_station_name')
bikes_idx.head(3)

Unnamed: 0_level_0,gender,starttime,stoptime,tripduration,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
from_station_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Lake Shore Dr & Monroe St,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,11.0,Michigan Ave & Oak St,15.0,73.9,12.7,mostlycloudy
Clinton St & Washington Blvd,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,31.0,Wells St & Walton St,19.0,69.1,6.9,partlycloudy
Sheffield Ave & Kingsbury St,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,15.0,Dearborn St & Monroe St,23.0,73.0,16.1,mostlycloudy


Notice the name 'from_station_name' directly above the index. This is the name for the index and what can be referenced when using `query`. Let's filter for trip ids greater than 200,000.

In [18]:
bikes_idx.query('from_station_name == "Theater on the Lake"').head(3)

Unnamed: 0_level_0,gender,starttime,stoptime,tripduration,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
from_station_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Theater on the Lake,Male,2013-08-23 17:57:00,2013-08-23 18:16:00,1166,15.0,Lincoln Ave & Roscoe St,19.0,79.0,9.2,partlycloudy
Theater on the Lake,Female,2013-08-24 15:31:00,2013-08-24 15:59:00,1661,15.0,Fairbanks Ct & Grand Ave,15.0,84.9,6.9,partlycloudy
Theater on the Lake,Male,2013-09-07 14:28:00,2013-09-07 14:37:00,540,15.0,Sheffield Ave & Fullerton Ave,15.0,88.0,10.4,mostlycloudy


## Use backticks to reference column names with spaces

pandas allows DataFrames to have column names with spaces in them. In order to use a column name containing spaces within `query`, you'll need to surround it with backticks. If you don't use the backticks you'll get an error. Let's read in the San Francisco employee compensation dataset which contains multiple column names that have spaces.

In [19]:
sf_emp = pd.read_csv('../data/sf_employee_compensation.csv')
sf_emp.head(3)

Unnamed: 0,year,organization group,job,salaries,overtime,other salaries,retirement,health and dental,other benefits
0,2013,Public Protection,Personnel Technician,71414.01,0.0,0.0,14038.58,12918.24,5872.04
1,2013,General Administration & Finance,Planner 2,67941.06,0.0,0.0,13030.23,10047.52,5608.37
2,2013,Public Protection,Firefighter,116956.72,59975.43,19037.3,24796.44,15788.97,3222.2


Let's find all the employees that are in the organization group of 'Public Protection'.

In [24]:
sf_emp.query('`organization group` == "Public Protection"').head(2)

Unnamed: 0,year,organization group,job,salaries,overtime,other salaries,retirement,health and dental,other benefits
0,2013,Public Protection,Personnel Technician,71414.01,0.0,0.0,14038.58,12918.24,5872.04
2,2013,Public Protection,Firefighter,116956.72,59975.43,19037.3,24796.44,15788.97,3222.2


In [None]:
sf_emp.query('`organization group` == "Public Protection"').head(3)

### Selecting columns with `query`

Unfortunately the `query` method does not give us the ability to select a subset of the columns when filtering the data. You would have to do normal column selection after calling the method. Here, we use *just the brackets* to select three columns after finding all the rides where the weather was snow or rain.

In [None]:
cols = ['starttime', 'temperature', 'events']
bikes.query('events in ["snow", "rain"]')[cols].head()

In [25]:
cols = ['job', 'salaries', 'retirement']
sf_emp.query('`organization group` == "Public Protection"')[cols].head(2)

Unnamed: 0,job,salaries,retirement
0,Personnel Technician,71414.01,14038.58
2,Firefighter,116956.72,24796.44


## Summary

The `query` method provides an alternative to boolean selection to filter the data based on the values. Here are the rules for the string you provide.

* The expression in the string must evaluate as True or False for every row
* Column names may be accessed directly with their name
* Often you will use one of the comparison operators to create a condition
* Use chained comparison operators to shorten syntax
* Use `and`, `or`, and `not` to create more complex conditions
* To use a literal string, surround it with quotes
* Use `in` to test multiple equalities. Provide the test values in a list
* All arithmetic operators work just as they do outside of the string
* Use the `@` character to reference a variable name
* Reference the index with the string 'index' or the index's name
* Use backticks to reference a column name with spaces in it

## Exercises

Use the bikes dataset for the first few exercises.

### Exercise 1

<span style="color:green; font-size:16px">Use the `query` method to select trip durations between 5,000 and 10,000.</span>

In [26]:
bikes.head(1)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
0,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,11.0,Michigan Ave & Oak St,15.0,73.9,12.7,mostlycloudy


In [27]:
bikes.query('5000 < tripduration < 10000').head()

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
18,Male,2013-07-09 13:12:00,2013-07-09 14:42:00,5396,Canal St & Jackson Blvd,35.0,Millennium Park,35.0,79.0,13.8,cloudy
40,Female,2013-07-14 14:08:00,2013-07-14 15:53:00,6274,Wabash Ave & Roosevelt Rd,19.0,Lake Shore Dr & Monroe St,11.0,87.1,8.1,partlycloudy
77,Female,2013-07-21 11:35:00,2013-07-21 13:54:00,8299,State St & 19th St,15.0,Sheffield Ave & Kingsbury St,15.0,82.9,5.8,mostlycloudy
335,Male,2013-08-25 17:20:00,2013-08-25 19:26:00,7533,McClurg Ct & Illinois St,23.0,Lake Shore Dr & Monroe St,11.0,87.1,12.7,clear
1954,Female,2013-12-28 11:37:00,2013-12-28 13:34:00,7050,LaSalle St & Washington St,15.0,Theater on the Lake,15.0,44.1,12.7,clear


### Exercise 2

<span  style="color:green; font-size:16px">Use the `query` method to select trip durations between 5,000 and 10,000 when the weather was snow or rain. Retrieve the same data with boolean selection.</span>

In [30]:
bikes.query('5000 < tripduration < 10000 and events in ["snow", "rain"]').head(2)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
8506,Male,2014-10-04 12:33:00,2014-10-04 14:06:00,5568,Halsted St & Diversey Pkwy,15.0,Halsted St & Wrightwood Ave,15.0,42.1,17.3,rain
13355,Male,2015-06-15 11:41:00,2015-06-15 13:43:00,7295,Racine Ave & Belmont Ave,15.0,Racine Ave & Belmont Ave,15.0,75.9,4.6,rain


In [40]:
filt = (bikes['tripduration'] >= 5000) & (bikes['tripduration'] <= 10000) & (bikes['events'].isin(['snow', 'rain']))
bikes[filt].head(2)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
8506,Male,2014-10-04 12:33:00,2014-10-04 14:06:00,5568,Halsted St & Diversey Pkwy,15.0,Halsted St & Wrightwood Ave,15.0,42.1,17.3,rain
13355,Male,2015-06-15 11:41:00,2015-06-15 13:43:00,7295,Racine Ave & Belmont Ave,15.0,Racine Ave & Belmont Ave,15.0,75.9,4.6,rain


### Exercise 3

<span style="color:green; font-size:16px">Use the `query` method to select trip durations between 5,000 and 10,000 when it was snow or rain. Create a list outside of the `query` method to hold the weather and reference that variable with `@` within `query`.</span>

In [41]:
bikes.query('5000 < tripduration < 10000 and events in ["snow", "rain"]').head()

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
8506,Male,2014-10-04 12:33:00,2014-10-04 14:06:00,5568,Halsted St & Diversey Pkwy,15.0,Halsted St & Wrightwood Ave,15.0,42.1,17.3,rain
13355,Male,2015-06-15 11:41:00,2015-06-15 13:43:00,7295,Racine Ave & Belmont Ave,15.0,Racine Ave & Belmont Ave,15.0,75.9,4.6,rain
22155,Male,2016-02-09 10:09:00,2016-02-09 12:28:00,8309,Wabash Ave & Roosevelt Rd,23.0,Museum Campus,35.0,16.0,16.1,snow


In [43]:
weather = ["snow", "rain"]
bikes.query('events in @weather').head()

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
45,Male,2013-07-15 16:43:00,2013-07-15 16:55:00,727,Greenwood Ave & 47th St,15.0,State St & Harrison St,19.0,82.9,5.8,rain
112,Male,2013-07-26 19:10:00,2013-07-26 19:33:00,1395,Larrabee St & Kingsbury St,27.0,Damen Ave & Pierce Ave,19.0,66.9,12.7,rain
124,Male,2013-07-30 18:53:00,2013-07-30 19:00:00,442,Canal St & Jackson Blvd,35.0,Racine Ave & Congress Pkwy,19.0,69.1,3.5,rain
161,Male,2013-08-05 17:09:00,2013-08-05 17:23:00,890,Clark St & Randolph St,31.0,Michigan Ave & Oak St,15.0,68.0,8.1,rain
498,Female,2013-09-07 16:09:00,2013-09-07 16:26:00,978,Damen Ave & Melrose Ave,11.0,Leavitt St & North Ave,11.0,81.0,6.9,rain


Read in the movie dataset by executing the cell below and use it for the following exercises.

In [44]:
import pandas as pd
pd.set_option('display.max_columns', 50)
movie = pd.read_csv('../data/movie.csv', index_col='title')
movie.head(3)

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,Wes Studi,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,Jack Davenport,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,Stephanie Sigman,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8


### Exercise 4

<span style="color:green; font-size:16px">Use the `query` method to find all movies where the total number of Facebook likes for all three actors is greater than 50,000.</span>

In [46]:
movie.query('actor1_fb + actor2_fb + actor3_fb) > 50000'.head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Dark Knight Rises,2012.0,Color,PG-13,164.0,Christopher Nolan,22000.0,Tom Hardy,27000.0,Christian Bale,23000.0,Joseph Gordon-Levitt,23000.0,448130642.0,Action|Thriller,813.0,1144337,deception|imprisonment|lawlessness|police offi...,English,USA,250000000.0,8.5
Avengers: Age of Ultron,2015.0,Color,PG-13,141.0,Joss Whedon,0.0,Chris Hemsworth,26000.0,Robert Downey Jr.,21000.0,Scarlett Johansson,19000.0,458991599.0,Action|Adventure|Sci-Fi,635.0,462669,artificial intelligence|based on comic book|ca...,English,USA,250000000.0,7.5
The Avengers,2012.0,Color,PG-13,173.0,Joss Whedon,0.0,Chris Hemsworth,26000.0,Robert Downey Jr.,21000.0,Scarlett Johansson,19000.0,623279547.0,Action|Adventure|Sci-Fi,703.0,995415,alien invasion|assassin|battle|iron man|soldier,English,USA,220000000.0,8.1
Pirates of the Caribbean: On Stranger Tides,2011.0,Color,PG-13,136.0,Rob Marshall,252.0,Johnny Depp,40000.0,Sam Claflin,11000.0,Stephen Graham,1000.0,241063875.0,Action|Adventure|Fantasy,448.0,370704,blackbeard|captain|pirate|revenge|soldier,English,USA,250000000.0,6.7
Captain America: Civil War,2016.0,Color,PG-13,147.0,Anthony Russo,94.0,Robert Downey Jr.,21000.0,Scarlett Johansson,19000.0,Chris Evans,11000.0,407197282.0,Action|Adventure|Sci-Fi,516.0,272670,based on comic book|knife|marvel cinematic uni...,English,USA,250000000.0,8.2


### Exercise 5

<span style="color:green; font-size:16px">Select all the movies where the number of user voters is less than 10 times the number of reviews.</span>

In [47]:
movie.query('num_voted_users < num_reviews*10')

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Indignation,2016.0,Color,R,110.0,James Schamus,44.0,Logan Lerman,8000.0,Sarah Gadon,691.0,Tracy Letts,85.0,560512.0,Drama,63.0,550,based on novel,Hebrew,USA,,7.8
Pete's Dragon,2016.0,Color,PG,102.0,David Lowery,38.0,Bryce Dallas Howard,3000.0,Oona Laurence,424.0,Isiah Whitlock Jr.,190.0,,Adventure|Family|Fantasy,78.0,408,,English,USA,65000000.0,7.3
Kicks,2016.0,Color,R,80.0,Justin Tipping,2.0,Tina Gilton,861.0,Natalie Stephany Aguilar,163.0,Justin Hall,102.0,,Adventure,6.0,59,,English,USA,,7.8
Two Lovers and a Bear,2016.0,Color,,96.0,Kim Nguyen,16.0,Gordon Pinsent,149.0,Justin Edward Seale,147.0,John Ralston,63.0,,Drama|Romance,6.0,33,,English,Canada,8700000.0,7.2
Antibirth,2016.0,Color,,94.0,Danny Perez,0.0,Natasha Lyonne,1000.0,Emmanuel Kabongo,677.0,Mark Webber,442.0,,Horror,10.0,63,,English,USA,3500000.0,6.3
Mi America,2015.0,Color,R,125.0,Robert Fontaine,7.0,Michael Derek,128.0,Arturo Castro,22.0,Brad Lee Wind,17.0,3330.0,Crime|Drama,4.0,22,,English,USA,2100000.0,7.2
Sharkskin,2015.0,Color,,100.0,Dan Perri,0.0,Travis Myers,749.0,David Proval,354.0,Carmen Argenziano,338.0,,Comedy|Drama|Mystery|Romance|Thriller,1.0,6,,English,USA,2100000.0,6.7
The Ghastly Love of Johnny X,2012.0,Black and White,Not Rated,106.0,Paul Bunnell,5.0,Kate Maberly,416.0,Kevin McCarthy,403.0,Paul Williams,356.0,2436.0,Comedy|Fantasy|Musical|Sci-Fi,94.0,344,1950s|independent film|outlaw|trial,English,USA,2000000.0,5.7
Wind Walkers,2015.0,Color,R,93.0,Russell Friedenberg,9.0,Rudy Youngblood,708.0,Glen Powell,571.0,Kiowa Gordon,485.0,,Action|Horror|Thriller,27.0,133,after dark horrorfest,English,USA,2000000.0,3.6
Down and Out with the Dolls,2001.0,Color,R,88.0,Kurt Voss,0.0,Lemmy,268.0,Coyote Shivers,44.0,Zoë Poledouris,3.0,58936.0,Comedy|Music,12.0,91,female bassist|female drummer|female guitarist...,English,USA,1200000.0,6.1


### Exercise 6

<span  style="color:green; font-size:16px">Select all the movies made in the 1990's that were rated R with an IMDB score greater than 8.</span>

In [48]:
movie.query('1990 >= year <= 1999 and content_rating == "R" and imdb_score > 8')

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Thing,1982.0,Color,R,109.0,John Carpenter,0.0,Wilford Brimley,957.0,Richard Masur,163.0,David Clennon,145.0,13782838.0,Horror|Mystery|Sci-Fi,297.0,258078,alien creature|alien life form|antarctica|isol...,English,USA,15000000.0,8.2
Apocalypse Now,1979.0,Color,R,289.0,Francis Ford Coppola,0.0,Harrison Ford,11000.0,Marlon Brando,10000.0,Robert Duvall,3000.0,78800000.0,Drama|War,261.0,450676,army|green beret|insanity|jungle|vietnam,English,USA,31500000.0,8.5
Once Upon a Time in America,1984.0,Color,R,251.0,Sergio Leone,0.0,Robert De Niro,22000.0,Burt Young,683.0,Treat Williams,642.0,5300000.0,Crime|Drama,111.0,221000,1920s|ambiguity|childhood friend|new york|spea...,English,Italy,30000000.0,8.4
Die Hard,1988.0,Color,R,131.0,John McTiernan,323.0,Alan Rickman,25000.0,Bruce Willis,13000.0,Reginald VelJohnson,541.0,81350242.0,Action|Thriller,233.0,592582,christmas|christmas eve|christmas party|held a...,English,USA,28000000.0,8.2
Blade Runner,1982.0,Color,R,117.0,Ridley Scott,0.0,Harrison Ford,11000.0,Sean Young,759.0,M. Emmet Walsh,521.0,27000000.0,Sci-Fi|Thriller,302.0,461609,artificial intelligence|human android relation...,English,USA,28000000.0,8.2
Goodfellas,1990.0,Color,R,146.0,Martin Scorsese,17000.0,Robert De Niro,22000.0,Mike Starr,854.0,Paul Sorvino,635.0,46836394.0,Biography|Crime|Drama,192.0,728685,betrayal|gangster|mafia|organized crime|robbery,English,USA,25000000.0,8.7
Scarface,1983.0,Color,R,142.0,Brian De Palma,0.0,Al Pacino,14000.0,F. Murray Abraham,670.0,Mary Elizabeth Mastrantonio,638.0,44700000.0,Crime|Drama,147.0,537442,assassination attempt|capitalism|cocaine|cuban...,English,USA,25000000.0,8.3
Psycho,1960.0,Black and White,R,108.0,Alfred Hitchcock,13000.0,Janet Leigh,606.0,Vera Miles,332.0,John Gavin,285.0,32000000.0,Horror|Mystery|Thriller,290.0,422432,money|motel|rain|shower|theft,English,USA,806947.0,8.5
The Shining,1980.0,Color,R,146.0,Stanley Kubrick,0.0,Scatman Crothers,888.0,Shelley Duvall,629.0,Joe Turkel,413.0,,Drama|Horror,253.0,610333,breaking down a door|haunted hotel|identical t...,English,USA,19000000.0,8.4
Amadeus,1984.0,Color,R,180.0,Milos Forman,869.0,Jeffrey Jones,692.0,F. Murray Abraham,670.0,Tom Hulce,521.0,51600000.0,Biography|Drama|History|Music,134.0,270790,1800s|classical composer|composer|first person...,English,USA,18000000.0,8.3
