# 05. Filtering the Data in Pandas
1. Using **boolean expressions** in `loc()` and `iloc()`.
2. Using **`df.query()`**
3. Using **`df.filter()`**
- A query is used when you want to perform a search on df. And filters are used to narrow down the set of results.

![Filter](data/filter-air.webp)

In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv('./data/Company.csv')
df

Unnamed: 0,EmployeeID,birthdate_key,age,city_name,department,job_title,gender
0,1318,1/3/1954,61,Vancouver,Executive,CEO,M
1,1319,1/3/1957,58,Vancouver,Executive,VP Stores,F
2,1320,1/2/1955,60,Vancouver,Executive,Legal Counsel,F
3,1321,1/2/1959,56,Vancouver,Executive,VP Human Resources,M
4,1322,1/9/1958,57,Vancouver,Executive,VP Finance,M
...,...,...,...,...,...,...,...
6279,8036,8/9/1992,23,New Westminister,Customer Service,Cashier,F
6280,8181,9/26/1993,22,Prince George,Customer Service,Cashier,M
6281,8223,2/11/1994,21,Trail,Customer Service,Cashier,M
6282,8226,2/16/1994,21,Victoria,Customer Service,Cashier,F


### 1. Using boolean expressions in `loc()` and `iloc()`
- Select * from dataframe where age > 55

In [2]:
df.loc[(df.age > 55)]

Unnamed: 0,EmployeeID,birthdate_key,age,city_name,department,job_title,gender
0,1318,1/3/1954,61,Vancouver,Executive,CEO,M
1,1319,1/3/1957,58,Vancouver,Executive,VP Stores,F
2,1320,1/2/1955,60,Vancouver,Executive,Legal Counsel,F
3,1321,1/2/1959,56,Vancouver,Executive,VP Human Resources,M
4,1322,1/9/1958,57,Vancouver,Executive,VP Finance,M
...,...,...,...,...,...,...,...
6258,4434,12/15/1946,69,Vancouver,Meats,Meat Cutter,M
6259,4442,12/22/1946,69,Vancouver,Meats,Meat Cutter,M
6260,4445,12/26/1946,69,New Westminster,Produce,Produce Clerk,M
6261,4448,12/27/1946,69,Kamloops,Bakery,Baker,M


In [3]:
df.iloc[(df.age > 55).values]

Unnamed: 0,EmployeeID,birthdate_key,age,city_name,department,job_title,gender
0,1318,1/3/1954,61,Vancouver,Executive,CEO,M
1,1319,1/3/1957,58,Vancouver,Executive,VP Stores,F
2,1320,1/2/1955,60,Vancouver,Executive,Legal Counsel,F
3,1321,1/2/1959,56,Vancouver,Executive,VP Human Resources,M
4,1322,1/9/1958,57,Vancouver,Executive,VP Finance,M
...,...,...,...,...,...,...,...
6258,4434,12/15/1946,69,Vancouver,Meats,Meat Cutter,M
6259,4442,12/22/1946,69,Vancouver,Meats,Meat Cutter,M
6260,4445,12/26/1946,69,New Westminster,Produce,Produce Clerk,M
6261,4448,12/27/1946,69,Kamloops,Bakery,Baker,M


### 2. Using the `df.query()`

In [4]:
df['age'].mean()

45.78341820496499

In [5]:
df.query('(age >= age.mean())')

Unnamed: 0,EmployeeID,birthdate_key,age,city_name,department,job_title,gender
0,1318,1/3/1954,61,Vancouver,Executive,CEO,M
1,1319,1/3/1957,58,Vancouver,Executive,VP Stores,F
2,1320,1/2/1955,60,Vancouver,Executive,Legal Counsel,F
3,1321,1/2/1959,56,Vancouver,Executive,VP Human Resources,M
4,1322,1/9/1958,57,Vancouver,Executive,VP Finance,M
...,...,...,...,...,...,...,...
6260,4445,12/26/1946,69,New Westminster,Produce,Produce Clerk,M
6261,4448,12/27/1946,69,Kamloops,Bakery,Baker,M
6262,4450,12/28/1946,69,New Westminster,Produce,Produce Clerk,M
6263,4711,5/15/1967,48,Vancouver,Processed Foods,Shelf Stocker,F


### - Another example with `loc()`

In [6]:
people = {
    'firstName': ['John', 'Johnny', 'Jack'],
    'lastName': ['Doe', 'Doe', 'Danniels'],
    'email': ['john.doe@gmail.com', 'johnny.doe@gmail.com', 'jack.denniels@gmail.com'],
    'phone': ['9956789877', '9876567898', '7898765456']
}
df = pd.DataFrame(people)
df

Unnamed: 0,firstName,lastName,email,phone
0,John,Doe,john.doe@gmail.com,9956789877
1,Johnny,Doe,johnny.doe@gmail.com,9876567898
2,Jack,Danniels,jack.denniels@gmail.com,7898765456


In [9]:
filter_exp = (df['firstName'] == 'John') | (df['firstName'] == 'Jack')
df.loc[filter_exp, 'email'] # remove the email part and see the result

0         john.doe@gmail.com
2    jack.denniels@gmail.com
Name: email, dtype: object

In [10]:
type(df.loc[filter_exp, 'email'])

pandas.core.series.Series

In [11]:
df.loc[filter_exp]

Unnamed: 0,firstName,lastName,email,phone
0,John,Doe,john.doe@gmail.com,9956789877
2,Jack,Danniels,jack.denniels@gmail.com,7898765456


In [12]:
type(df.loc[filter_exp])

pandas.core.frame.DataFrame

### 3. Filter using the `df.filter()`

In [13]:
df3 = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
                  index=['mouse', 'rabbit'],
                  columns=['one', 'two', 'three'])
df3

Unnamed: 0,one,two,three
mouse,1,2,3
rabbit,4,5,6


In [14]:
# Select columns by column_name
df3.filter(items=['one', 'three'])

Unnamed: 0,one,three
mouse,1,3
rabbit,4,6


In [15]:
# Select columns by regular expression
df3.filter(regex='e$', axis=1)

Unnamed: 0,one,three
mouse,1,3
rabbit,4,6


In [16]:
# Select rows containing 'bbi'
df3.filter(like='bbi', axis=0)

Unnamed: 0,one,two,three
rabbit,4,5,6
