# Handy Python Pandas for Data Filtering

__Data Cleaning & Data Preparation Series — <code> df.query(), df.loc(row label, column lebel), df.iloc(integer row index, column index), df.filter()</code>__

![Handy%20Python%20Pandas%20for%20Filtering%20Data.png](attachment:Handy%20Python%20Pandas%20for%20Filtering%20Data.png)

### Table of Contents
### 1. Introduction
### 2. Filtering Data with Boolean Indexing
### 3. Filtering Data with Query Method
### 4. Filtering Data with loc and iloc Methods
### 5. Filtering Data with filter() Method

### 1. Introduction

Data filtering is one of the most important steps in data analysis. It involves the process of selecting a subset of data from a larger dataset based on certain conditions. With the rise of big data and machine learning, data filtering has become a crucial task for data analysts and data scientists. One of the most popular data analysis libraries used for filtering data is Pandas.

Pandas is a powerful data analysis library for Python. It offers a wide range of data manipulation and analysis tools that make it easier to work with data. Pandas has many built-in functions that allow users to filter data based on different criteria. In this series, we will discuss the different techniques available in Pandas for data filtering.

### 2. Filtering Data with Boolean Indexing

One of the easiest ways to filter data in Pandas is by using __Boolean indexing__. It is a technique that allows us to select rows of data based on a condition. The result of a __Boolean indexing operation__ is a series of __True and False values__ that correspond to the rows of the original dataset. We can then use this series to select only the rows of the original dataset that meet the condition.

To illustrate this technique, let's create a sample dataset using Pandas:



In [27]:
# Importing the Pandas Library and Creating a DataFrame
import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
        'age': [25, 30, 35, 40, 45],
        'gender': ['F', 'M', 'M', 'M', 'F'],
        'score': [80, 90, 85, 95, 90]}

df = pd.DataFrame(data)

This will create a simple dataset with four columns: name, age, gender, and score. We can use Boolean indexing to filter this dataset based on a condition. For example, let's filter the dataset to include only the rows where the score is greater than or equal to 90:

In [28]:
filtered_df = df[df['score'] >= 90]
print(filtered_df)

    name  age gender  score
1    Bob   30      M     90
3  David   40      M     95
4  Emily   45      F     90


As you can see, only the rows where the score is greater than or equal to 90 are included in the filtered dataset.

### 3. Filtering Data with Query

Another way to filter data in Pandas is by using the query function. The query function allows us to write SQL-like queries to filter the dataset. The query function is similar to Boolean indexing, but it allows for more complex queries.

To illustrate this technique, let's create a new dataset:

In [29]:
import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
        'age': [25, 30, 35, 40, 45],
        'gender': ['F', 'M', 'M', 'M', 'F'],
        'score': [80, 90, 85, 95, 90]}

df = pd.DataFrame(data)

This dataset is the same as the previous one. We can use the query function to filter this dataset based on a condition. For example, let's filter the dataset to include only the rows where the score is greater than or equal to 90:

In [30]:
filtered_df = df.query('score >= 90')
print(filtered_df)

    name  age gender  score
1    Bob   30      M     90
3  David   40      M     95
4  Emily   45      F     90


### 4. Filtering Data with loc and iloc methods

#### 4.1 Using loc Method

The loc method is used to select rows and columns based on the labels. It takes two parameters: row label and column label.

__Selecting a single row using loc method__

In [31]:
print(df.loc[2])

name      Charlie
age            35
gender          M
score          85
Name: 2, dtype: object


__Selecting multiple rows and columns using loc method__

In [32]:
print(df.loc[[1, 3], ['name', 'score']])

    name  score
1    Bob     90
3  David     95


#### 4.2 Using iloc Method

The iloc method is used to select rows and columns based on the __integer index__. It takes two parameters: __row index__ and __column index__.

__Selecting a single row using iloc method__

In [33]:
print(df.iloc[2])

name      Charlie
age            35
gender          M
score          85
Name: 2, dtype: object


__Selecting multiple rows and columns using iloc method__

In [34]:
print(df.iloc[[1, 3], [0, 3]])

    name  score
1    Bob     90
3  David     95


### 5. Filtering Data with filter() Method
The __filter()__ method is particularly useful when working with large datasets, where you may want to select only a subset of rows or columns based on certain criteria. This method is also useful when you want to exclude certain rows or columns from your analysis.
The syntax for using the __filter()__ method in Pandas is as follows:

__DataFrame.filter(items=None, like=None, regex=None, axis=None)__

The __filter()__ method takes several arguments, which are explained below:
<ul>
    <li><b>items:</b> A list of column labels or row labels to filter on.</li>
    <li><b>like:</b> A string containing a substring to filter on. This searches for columns or rows that contain the specified substring.</li>
    <li><b>regex:</b> A regular expression to filter on. This searches for columns or rows that match the specified regular expression.</li>
    <li><b>axis:</b> The axis along which to filter. By default, axis=1, which filters columns.</li>
  </ul>

#### Example 1: Filtering by column names using the items parameter
Suppose you have a DataFrame with several columns, and you want to select only a subset of columns based on their names. You can do this using the items parameter of the filter() method, like this:


In [1]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# Use the filter() method to select columns 'A' and 'B'
df_filtered = df.filter(items=['A', 'B'])
print(df_filtered)

   A  B
0  1  4
1  2  5
2  3  6


#### Example 2: Filtering by column names using the like parameter
Suppose you have a DataFrame with several columns, and you want to select only a subset of columns that contain a certain substring. You can do this using the like parameter of the __filter()__ method, like this:

In [2]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'apple': [1, 2, 3], 'banana': [4, 5, 6], 'orange': [7, 8, 9]})

# Use the filter() method to select columns containing the substring 'an'
df_filtered = df.filter(like='an')
print(df_filtered)

   banana  orange
0       4       7
1       5       8
2       6       9


#### Example 3: Filtering by column names using the regex parameter
Suppose you have a DataFrame with several columns, and you want to select only a subset of columns that match a certain regular expression. You can do this using the regex parameter of the __filter()__ method, like this:

In [3]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A_1': [1, 2, 3], 'A_2': [4, 5, 6], 'B_1': [7, 8, 9]})

# Use the filter() method to select columns matching the regular expression 'A.*'
df_filtered = df.filter(regex='A.*')
print(df_filtered)

   A_1  A_2
0    1    4
1    2    5
2    3    6


### Conclusion

Data filtering is a powerful technique that helps us extract only the relevant data from a larger dataset based on certain conditions. Pandas provides a variety of ways to filter data based on different criteria. In this article, we looked at some examples of data filtering using Pandas. All the methods have their own benefits and limitations, and it's up to the user to decide which method to use based on their requirements.

These are just a few ways to filter data in Pandas. There are many more advanced techniques you can use, depending on your specific data analysis needs. I hope you found this article informative and useful. 

##### Many thanks for reading my post!🙏.
##### If you found this content helpful😊, please LIKE 👍, SHARE, and FOLLOW to stay updated on our future posts.