- Title: Filter pandas DataFrames in Python
- Slug: filter-pandas-dataframe-python
- Date: 2019-12-12 21:00:07
- Category: Programming
- Tags: programming, Python, pandas, DataFrame, filter, query
- Author: Ben Du

In [ ]:
import pandas as pd
import numpy as np

## Summary

1. There are multiple ways to filter rows of a pandas DataFrame.
    The method `DataFrame.query` is preferred generally speaking.

2. `DataFrame.where` returns a DataFrame of the same shape as the original DataFrame.
    The rows that does not satisfy the condition are filled with `NAs`.


In [ ]:
df = pd.DataFrame({
    'x': [1, 2, 3, 4, 5],
    'y': [5, 4, 3, 2, 1]
})
df

## `DataFrame.query`

In [ ]:
df.query('x % 2 == 0')

In [ ]:
df.query('x in [3, 5, 7]')

In [ ]:
x = [3, 5, 7]
df.query('x in @x')

In [ ]:
df.query('x == y')

## Use Slicing

In [ ]:
df[df.x % 2 == 0]

## `DataFrame.where`

The method `DataFrame.where` takes a lambda (which takes a row Series as the parameter).

In [ ]:
df.where(lambda r: r.x % 2 == 0)

In [ ]:
df.where(lambda r: r.x % 2 == 0).dropna()

In [ ]:
df.x.where(lambda x: x % 2 == 0)

In [ ]:
df.x.where(lambda x: x % 2 == 0).dropna()

In [ ]:
[m for m in dir(df) if 'where' in m]