# Pandas DataFrame filter() – by a Partial String


Reference
* https://towardsdatascience.com/8-ways-to-filter-a-pandas-dataframe-by-a-partial-string-or-pattern-49f43279c50f

In [1]:
# Import pandas
import pandas as pd

## Filter a Pandas DataFrame by a Partial String or Pattern in 8 Ways

# Dataframe from a dictionary

In [2]:
technologies= {
    'Courses':["Spark","PySpark","Spark","Java","PySpark","PHP"],
    'Fee' :[22000,25000,23000,24000,26000,27000],
    'Duration':['30days','50days','30days','60days','35days','30days']
          }
technologies_df = pd.DataFrame(technologies)
technologies_df


Unnamed: 0,Courses,Fee,Duration
0,Spark,22000,30days
1,PySpark,25000,50days
2,Spark,23000,30days
3,Java,24000,60days
4,PySpark,26000,35days
5,PHP,27000,30days


# Filter rows where a partial string is present
Here, we want to check if a sub-string is present in a column.

For example, the ‘listed-in’ column contains the genres that each movie or show belongs to, separated by commas. I want to filter and return only those that have a ‘horror’ element in them because right now Halloween is upon us.

We will use the string method Series.str.contains(‘pattern’, case=False, na=False) where ‘pattern’ is the substring to search for, and case=False implies case insensitivity. na=False means that any NaN values in the column will be returned as False (meaning without the pattern) instead of as NaN which removes the boolean identity from the mask.



```
mask = data['listed_in'].str.contains('horror', case=False, na=False)
```



We will then apply the mask to our data and display three sample rows of the filtered dataframe.

In [4]:
# Filter columns
df1=technologies_df['Courses'].str.contains('spark', case=False, na=False)
df1
 


0     True
1     True
2     True
3    False
4     True
5    False
Name: Courses, dtype: bool

In [8]:
technologies_df[df1].head()

0    False
1    False
2    False
3    False
4    False
5    False
Name: Courses, dtype: bool