# Filtering & Conditions

In [1]:
import pandas as pd, numpy as np

In [2]:
# Sample dataset
df = pd.read_csv("5.1_data.csv")
print(df.head())
print()
print(df.tail())

        name  gender  age  salary     city date of joining
0       Amit    Male   28   40769  Kolkata      30-10-2021
1       Riya  Female   41   99735     Pune      09-02-2018
2       John    Male   36   96101     Pune      02-06-2019
3       Neha  Female   32   42433  Kolkata      04-03-2020
4  Siddharth    Male   29   45311     Pune      15-10-2022

      name  gender  age  salary     city date of joining
15    Emma  Female   29   58942    Delhi      13-10-2020
16    Liam    Male   45   97001  Kolkata      16-09-2018
17  Olivia  Female   24   58431   Mumbai      06-04-2022
18     Raj    Male   43   42747  Chennai      14-05-2022
19  Simran  Female   42   98319    Delhi      20-01-2022


## Boolean Indexing (Filtering)

This is perhaps the most powerful tool in Pandas. You pass a logical condition into the square brackets [] to filter the rows that return True.

Filtering with boolean masks is vectorized and much faster
than looping through rows.


### Single Condition
Filter rows where age is greater than 40

In [3]:
df[df['age'] > 40]

Unnamed: 0,name,gender,age,salary,city,date of joining
1,Riya,Female,41,99735,Pune,09-02-2018
5,Zoe,Female,42,77819,Bangalore,28-11-2023
7,Anjali,Female,47,57568,Kolkata,25-10-2020
9,Priya,Female,44,59769,Delhi,03-09-2022
12,Mike,Male,45,67480,Bangalore,09-02-2023
13,Sara,Female,42,81434,Kolkata,24-09-2018
16,Liam,Male,45,97001,Kolkata,16-09-2018
18,Raj,Male,43,42747,Chennai,14-05-2022
19,Simran,Female,42,98319,Delhi,20-01-2022


### Multiple conditions 
**Use** 
- `&` for AND
- `|` for OR

In [4]:
df[(df['city'] == 'Delhi') & (df['salary'] > 50000)]

Unnamed: 0,name,gender,age,salary,city,date of joining
9,Priya,Female,44,59769,Delhi,03-09-2022
15,Emma,Female,29,58942,Delhi,13-10-2020
19,Simran,Female,42,98319,Delhi,20-01-2022


### `.isin()` Method
Useful when filtering categorical values.

In [5]:
df[df['city'].isin(['Delhi', 'Mumbai'])]

Unnamed: 0,name,gender,age,salary,city,date of joining
9,Priya,Female,44,59769,Delhi,03-09-2022
15,Emma,Female,29,58942,Delhi,13-10-2020
17,Olivia,Female,24,58431,Mumbai,06-04-2022
19,Simran,Female,42,98319,Delhi,20-01-2022


Avoid chained indexing like `df[df['salary'] > 50000]['city']`,
as it can lead to SettingWithCopyWarning and unpredictable behavior.

## Querying with .query()
The `.query()` method in pandas lets you filter DataFrame rows using a string
expression — it’s a more readable and often more concise alternative to using
boolean indexing.

### SQL-like way to filter:

In [6]:
df.query("age > 25 and city == 'Delhi'")

Unnamed: 0,name,gender,age,salary,city,date of joining
9,Priya,Female,44,59769,Delhi,03-09-2022
15,Emma,Female,29,58942,Delhi,13-10-2020
19,Simran,Female,42,98319,Delhi,20-01-2022


### Dynamic column names

In [7]:
col = "age" 
df.query(f"{col} > 42")

Unnamed: 0,name,gender,age,salary,city,date of joining
7,Anjali,Female,47,57568,Kolkata,25-10-2020
9,Priya,Female,44,59769,Delhi,03-09-2022
12,Mike,Male,45,67480,Bangalore,09-02-2023
16,Liam,Male,45,97001,Kolkata,16-09-2018
18,Raj,Male,43,42747,Chennai,14-05-2022


### Column names become variables

In [8]:
df.query("age > 32 and city == 'Delhi'")

Unnamed: 0,name,gender,age,salary,city,date of joining
9,Priya,Female,44,59769,Delhi,03-09-2022
19,Simran,Female,42,98319,Delhi,20-01-2022


### String values must be in quotes
Use single or double quotes around strings in the expression:

In [9]:
df.query("name == 'Sara'")
# If you have quotes inside quotes, mix them:
df.query('name == "Sara"')

Unnamed: 0,name,gender,age,salary,city,date of joining
13,Sara,Female,42,81434,Kolkata,24-09-2018


### Use backticks for column names with spaces or special characters
If a column name has spaces, use backticks ( ` )

In [10]:
df.query("`date of joining` == '20-01-2022'")

Unnamed: 0,name,gender,age,salary,city,date of joining
19,Simran,Female,42,98319,Delhi,20-01-2022


### You can use @ to reference Python variables
To pass external variables into `.query()`

In [11]:
max_age = 28
df.query("age < @max_age")

Unnamed: 0,name,gender,age,salary,city,date of joining
14,David,Male,25,65658,Hyderabad,24-09-2020
17,Olivia,Female,24,58431,Mumbai,06-04-2022


### Logical operators
Use these: - `and` , `or` , `not` — instead of `
&` , `|` , `~` - `==` , `!=` , `<` , `>` , `<=` , `>=`

In [12]:
df.query("age > 35 and city == 'Delhi'") 

Unnamed: 0,name,gender,age,salary,city,date of joining
9,Priya,Female,44,59769,Delhi,03-09-2022
19,Simran,Female,42,98319,Delhi,20-01-2022


### Chained comparisons

In [13]:
df.query("28 < age <= 35")

Unnamed: 0,name,gender,age,salary,city,date of joining
3,Neha,Female,32,42433,Kolkata,04-03-2020
4,Siddharth,Male,29,45311,Pune,15-10-2022
10,Rahul,Male,32,68693,Bangalore,09-03-2020
11,Sonia,Female,32,46396,Hyderabad,13-03-2019
15,Emma,Female,29,58942,Delhi,13-10-2020


### Case-sensitive

In [14]:
df.query("city == 'delhi'") # ❌ if actual value is 'Delhi'

Unnamed: 0,name,gender,age,salary,city,date of joining


**Note**: .query() returns a copy, not a view

## Summary

- Filtering uses boolean masks
- Use `&`, `|`, `~` instead of `and`/`or`/`not`
- Parentheses are mandatory in conditions
- Vectorized filtering is efficient
- `.query()` for readable code