# Pandas DataFrame filter() – Usage & Examples

pandas filter() function filters the DataFame for rows and columns. The returned DataFrame contains only rows and columns that are specified with the function. It doesn’t update the existing DataFrame instead it always returns a new one.
This article explains how to filter multiple rows and columns from pandas DataFrame and also explain how to filter using regex (regular expression). DataFrame.loc[] is another way to select a group of rows and columns by indices and label(s) respectively.

Reference
* https://sparkbyexamples.com/pandas/pandas-dataframe-filter/
* https://pandas.pydata.org/pandas-docs/version/0.19/generated/pandas.DataFrame.filter.html
* https://www.geeksforgeeks.org/ways-to-filter-pandas-dataframe-by-column-values/

In [None]:
# Import pandas
import pandas as pd

## Syntax of DataFrame.filter()

```
DataFrame.filter(items=None, like=None, regex=None, axis=None)
```
* item – Takes list of axis labels that you wanted to filter.
* like – Takes axis string label that you wanted to filter
* regex – regular expression
* axis – {0 or ‘index’, 1 or ‘columns’, None}, default None. When not specified it used columns.

# Dataframe from a dictionary

In [None]:
technologies= {
    'Courses':["Spark","PySpark","Spark","Java","PySpark","PHP"],
    'Fee' :[22000,25000,23000,24000,26000,27000],
    'Duration':['30days','50days','30days','60days','35days','30days']
          }
technologies_df = pd.DataFrame(technologies)
technologies_df


Unnamed: 0,Courses,Fee,Duration
0,Spark,22000,30days
1,PySpark,25000,50days
2,Spark,23000,30days
3,Java,24000,60days
4,PySpark,26000,35days
5,PHP,27000,30days


# Filter Columns by Labels
By default pandas.DataFrame.filter() select the columns by labels you specified using item, like, and regex parameters. You can also explicitly specify axis=1 to select columns.

Note that items param is used to match on exact values. Use like param to match substring.

To filter columns with regular expressions, use regex param. The below example filters column that ends with the character e.

## Filter columns using specific column names (items)

Items param is used to match on exact values.

In [None]:
# Filter columns
df1=technologies_df.filter(items=['Courses','Fee'])
df1


Unnamed: 0,Courses,Fee
0,Spark,22000
1,PySpark,25000
2,Spark,23000
3,Java,24000
4,PySpark,26000
5,PHP,27000


## Filter columns using substring (like param)
Use like param to match substring.

In [None]:
# Filter Columns using like "ration"
df2_1 = technologies_df.filter(like='ration', axis=1)
df2_1

# Output "Duration" column

Unnamed: 0,Duration
0,30days
1,50days
2,30days
3,60days
4,35days
5,30days


In [None]:
# Filter Columns using like "Cour"
df2_2 = technologies_df.filter(like='Cour', axis=1)
df2_2

# Output "Courses" column

Unnamed: 0,Courses
0,Spark
1,PySpark
2,Spark
3,Java
4,PySpark
5,PHP


## Filter columns with regular expressions
To filter columns with regular expressions, use regex param. The below example filters column that ends with the character e.

In [None]:
df3_1 = technologies_df.filter(regex='e$', axis=1)
df3_1

Unnamed: 0,Fee
0,22000
1,25000
2,23000
3,24000
4,26000
5,27000


In [None]:
df3_2 = technologies_df.filter(regex='s$', axis=1)
df3_2

Unnamed: 0,Courses
0,Spark
1,PySpark
2,Spark
3,Java
4,PySpark
5,PHP


# Filter Columns by Index

## Filter Columns by Index using "item"
Use axis=0 on filter() function to filter rows by index (indices). The below example

In [None]:
# Filters rows by index 3 and 5.
# axis – {0 or ‘index’}
df4_1=technologies_df.filter(items=[3,5], axis=0)
df4_1

Unnamed: 0,Courses,Fee,Duration
3,Java,24000,60days
5,PHP,27000,30days


In [None]:
# Filters rows by index 0,1 and 3
# axis – {0 or ‘index’}
df4_2=technologies_df.filter(items=[0,1,3], axis=0)
df4_2

Unnamed: 0,Courses,Fee,Duration
0,Spark,22000,30days
1,PySpark,25000,50days
3,Java,24000,60days


## Filter Columns by Index using "like" param

Use like param to filter rows that match with substring. For our example, this doesn’t make sense as we have a numeric index. however, below is an example that demonstrates the usage of like param.

In [None]:
# Filter row using like
df5_1 = technologies_df.filter(like='4', axis=0)
df5_1


Unnamed: 0,Courses,Fee,Duration
4,PySpark,26000,35days
