## Filtering

When filtering a Pandas DataFrame, both `df.loc` and `df.query` have their strengths and use cases. Here's a comparison to help you decide which one might be better suited for your needs:


#### Other check if 'x' contain methods


In [49]:
df.loc[
    (df['day'].isin(['Monday', 'Saturday']))
    |  # Bitwise OR
    (df['day'].str.startswith('Sun'))
]

Unnamed: 0,day,coffee_type,units_sold
0,Monday,Espresso,25
1,Monday,Latte,15
10,Saturday,Espresso,45
11,Saturday,Latte,35
12,Sunday,Espresso,45
13,Sunday,Latte,35


### **- df.loc[]**

- **Usage**: `df.loc[]` is used for label-based indexing and is often employed for filtering using boolean masks.
- **Syntax**:

  ```python
  df.loc[condition]
  ```

- **Advantages**:
  - **Flexibility**: Can handle complex boolean expressions and multiple conditions using logical operators (`&`, `|`, `~`).
  - **Readability**: Clear and explicit syntax for filtering, especially when combining multiple conditions.
  - **Support for Indexing**: Allows you to filter and select specific rows and columns simultaneously.
- **Disadvantages**:
  - **Complex Expressions**: For very complex filtering conditions, it may become verbose and less readable.

**Example**:

```python
import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

# Filter with multiple conditions using df.loc
filtered_df = df.loc[(df['Age'] > 25) & (df['Name'] != 'Charlie')]
print(filtered_df)
```


### **- df.query()**

- **Usage**: `df.query()` allows you to filter rows using a query string expression.
- **Syntax**:

  ```python
  df.query('condition')
  ```

- **Advantages**:

  - **Readability**: Query strings can be more readable and concise, especially for complex conditions.
  - **Python Expressions**: Allows using Python-like expressions directly in the query string.
  - **Convenience**: Can be easier to use when dealing with multiple conditions in a readable format.

- **Disadvantages**:
  - **Performance**: For very large DataFrames or complex queries, it might be slightly less performant compared to boolean indexing.
  - **Variable Scope**: Variables used in the query string need to be defined in the DataFrame or passed explicitly.

**Example**:

```python
import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

# Filter using df.query
filtered_df = df.query('Age > 25 and Name != "Charlie"')
print(filtered_df)
```


### **- Summary**

- **Use `df.loc`** if you need:

  - Explicit control over indexing and selection.
  - Complex boolean expressions with multiple conditions.
  - To select both rows and columns.

- **Use `df.query`** if you prefer:
  - A more concise and readable syntax for filtering.
  - Python-like expressions for complex conditions.
  - Simpler and cleaner code when filtering based on conditions.

Ultimately, the choice depends on your specific use case and personal preference. Both methods are powerful and can be used effectively for filtering data in a Pandas DataFrame.


In [50]:
import pandas as pd

In [51]:
df = pd.DataFrame(
    [[1, 2], [4, 5], [7, 8]],
    index=['cobra', 'viper', 'sidewinder'],
    columns=['max_speed', 'shield']
)

df

Unnamed: 0,max_speed,shield
cobra,1,2
viper,4,5
sidewinder,7,8


#### Single expression


In [52]:
df.query('shield > 4')
# ===
df.loc[df['shield'] > 4]

Unnamed: 0,max_speed,shield
viper,4,5
sidewinder,7,8


#### Multiple expressions


In [53]:
df.query('shield > 4 and shield < 7')
# ===
df.query('4 < shield < 7')
# ===
df.loc[(df['shield'] > 4) & (df['shield'] < 7)]

Unnamed: 0,max_speed,shield
viper,4,5


#### Specific columns


In [54]:
df.query('4 < shield < 7').loc[:, ['shield']]
# ===
df.loc[(df['shield'] > 4) & (df['shield'] < 7), ['shield']]

Unnamed: 0,shield
viper,5


In [55]:
df = pd.read_csv('coffee.csv')

df

Unnamed: 0,day,coffee_type,units_sold
0,Monday,Espresso,25
1,Monday,Latte,15
2,Tuesday,Espresso,30
3,Tuesday,Latte,20
4,Wednesday,Espresso,35
5,Wednesday,Latte,25
6,Thursday,Espresso,40
7,Thursday,Latte,30
8,Friday,Espresso,45
9,Friday,Latte,35


#### Filter if the strings contain 's' (ReGex can also be used)


In [56]:
df.query("day.str.contains(r'mon|nes', case=False)")
# ===
df.loc[df['day'].str.contains(r'mon|nes', case=False)]

Unnamed: 0,day,coffee_type,units_sold
0,Monday,Espresso,25
1,Monday,Latte,15
4,Wednesday,Espresso,35
5,Wednesday,Latte,25


#### Other check if 'x' contain methods


In [57]:
df.loc[
    (df['day'].isin(['Monday', 'Saturday']))
    |  # Bitwise OR
    (df['day'].str.startswith('Sun'))
]

Unnamed: 0,day,coffee_type,units_sold
0,Monday,Espresso,25
1,Monday,Latte,15
10,Saturday,Espresso,45
11,Saturday,Latte,35
12,Sunday,Espresso,45
13,Sunday,Latte,35
