## Filter Rows or Columns of a DataFrame

### Pandas.Series.isin: Filter Rows Only If Column Contains Values From Another List

When working with a pandas Dataframe, if you want to select the rows when a column contains values from another list, the fastest way is to use `isin`. 

In the example below, row `2` is filtered out because `3` is not in the list.

In [1]:
import pandas as pd 

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df

Unnamed: 0,a,b
0,1,4
1,2,5
2,3,6


In [27]:
l = [1, 2, 6, 7]
df.a.isin(l)

0     True
1     True
2    False
Name: a, dtype: bool

<IPython.core.display.Javascript object>

In [28]:
df = df[df.a.isin(l)]
df

Unnamed: 0,a,b
0,1,4
1,2,5


<IPython.core.display.Javascript object>

## df.query: Query Columns Using Boolean Expression

It can be lengthy to filter columns of a pandas DataFrame using brackets. 

In [14]:
import pandas as pd

df = pd.DataFrame(
    {"fruit": ["apple", "orange", "grape", "grape"], "price": [4, 5, 6, 7]}
)


In [17]:
print(df[(df.price > 4) & (df.fruit == "grape")])


   fruit  price
2  grape      6
3  grape      7


To shorten the filtering statements, use `df.query` instead.

In [16]:
df.query("price > 4 & fruit == 'grape'")

Unnamed: 0,fruit,price
2,grape,6
3,grape,7


## Filter a pandas DataFrame by Value Counts

To filter a pandas DataFrame based on the occurrences of categories, you might attempt to use `df.groupby` and `df.count`. 

In [19]:
import pandas as pd

df = pd.DataFrame({"type": ["A", "A", "O", "B", "O", "A"], "value": [5, 3, 2, 1, 4, 2]})
df

Unnamed: 0,type,value
0,A,5
1,A,3
2,O,2
3,B,1
4,O,4
5,A,2


<IPython.core.display.Javascript object>

In [20]:
df.groupby("type")["type"].count()

type
A    3
B    1
O    2
Name: type, dtype: int64

<IPython.core.display.Javascript object>

However, since the Series returned by the `count` method is shorter than the original DataFrame, you will get an error when filtering.

In [25]:
df.loc[df.groupby("type")["type"].count() > 1]

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

<IPython.core.display.Javascript object>

Instead of using `count`, use `transform`. This method will return the Series of value counts with the same length as the original DataFrame.

In [23]:
df.groupby("type")["type"].transform("size")

0    3
1    3
2    2
3    1
4    2
5    3
Name: type, dtype: int64

<IPython.core.display.Javascript object>

Now you can filter without encountering any error. 

In [27]:
df.loc[df.groupby("type")["type"].transform("size") > 1]

Unnamed: 0,type,value
0,A,5
1,A,3
2,O,2
4,O,4
5,A,2


<IPython.core.display.Javascript object>