- Title: Filter pandas DataFrames in Python
- Slug: filter-pandas-dataframe-python
- Date: 2019-12-12 21:00:07
- Category: Programming
- Tags: programming, Python, pandas, DataFrame, filter, query
- Author: Ben Du

In [1]:
import pandas as pd
import numpy as np

## Summary

1. There are multiple ways to filter rows of a pandas DataFrame.
    The method `DataFrame.query` is preferred generally speaking.

2. `DataFrame.where` returns a DataFrame of the same shape as the original DataFrame.
    The rows that does not satisfy the condition are filled with `NAs`.


In [2]:
df = pd.DataFrame({
    'x': [1, 2, 3, 4, 5],
    'y': [5, 4, 3, 2, 1]
})
df

Unnamed: 0,x,y
0,1,5
1,2,4
2,3,3
3,4,2
4,5,1


## `DataFrame.query`

In [3]:
df.query('x % 2 == 0')

Unnamed: 0,x,y
1,2,4
3,4,2


In [4]:
df.query('x in [3, 5, 7]')

Unnamed: 0,x,y
2,3,3
4,5,1


In [5]:
x = [3, 5, 7]
df.query('x in @x')

Unnamed: 0,x,y
2,3,3
4,5,1


In [6]:
df.query('x == y')

Unnamed: 0,x,y
2,3,3


## Use Slicing

In [7]:
df[df.x % 2 == 0]

Unnamed: 0,x,y
1,2,4
3,4,2


## `DataFrame.where`

The method `DataFrame.where` takes a lambda (which takes a row Series as the parameter).

In [8]:
df.where(lambda r: r.x % 2 == 0)

Unnamed: 0,x,y
0,,
1,2.0,4.0
2,,
3,4.0,2.0
4,,


In [9]:
df.where(lambda r: r.x % 2 == 0).dropna()

Unnamed: 0,x,y
1,2.0,4.0
3,4.0,2.0


In [10]:
df.x.where(lambda x: x % 2 == 0)

0    NaN
1    2.0
2    NaN
3    4.0
4    NaN
Name: x, dtype: float64

In [11]:
df.x.where(lambda x: x % 2 == 0).dropna()

1    2.0
3    4.0
Name: x, dtype: float64

In [12]:
[m for m in dir(df) if 'where' in m]

['_where', 'where']