# **Data Selection & Indexing**

In [1]:
import pandas as pd

## 6. **Selection with Callable Functions**

### ✅ What is it?

Pandas allows you to **pass functions as selectors** inside `.loc[]`, `.iloc[]`, and other selection methods. This enables **dynamic and reusable filtering**, especially when your logic needs to be calculated or composed at runtime.



### 🔹 Why Use Callable Functions?

* Useful for **modular and reusable code**
* Makes filtering **more flexible**
* Helpful in **pipeline workflows**
* Can use **functions with external variables** for dynamic selection



### ✅ Supported With

* `df.loc[function]`
* `df.iloc[function]`
* `df.loc[lambda df: condition]` (most common)
* Also works with Series (like `s.loc[lambda s: s > 5]`)


## 🔸 1. Using `lambda` with `.loc[]`

In [5]:

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['NY', 'London', 'Paris', 'Berlin']
})

df

Unnamed: 0,Name,Age,City
0,Alice,25,NY
1,Bob,30,London
2,Charlie,35,Paris
3,David,40,Berlin


In [8]:
df.loc[lambda df: df['Age'] > 30]

Unnamed: 0,Name,Age,City
2,Charlie,35,Paris
3,David,40,Berlin


## 🔸 2. Using External Variables in Lambda

In [10]:
min_age = 30
df.loc[lambda df: df['Age'] > min_age]

Unnamed: 0,Name,Age,City
2,Charlie,35,Paris
3,David,40,Berlin


### ✅ Real-Time Use Case

Filter rows based on dynamic user-defined threshold:

```python
threshold = input("Enter minimum age: ")
df.loc[lambda df: df['Age'] > int(threshold)]
```



## 🔸 3. Selecting Specific Columns Using a Function

You can also use functions to dynamically choose columns.

In [11]:
df.loc[:, lambda df: df.columns.str.startswith('C')]

Unnamed: 0,City
0,NY
1,London
2,Paris
3,Berlin


> Selects all columns that **start with 'C'** (like `'City'`)

## 🔸 4. Selecting with `.iloc[]` and Callable

In [13]:
df.iloc[lambda df: [0, 2]]

Unnamed: 0,Name,Age,City
0,Alice,25,NY
2,Charlie,35,Paris


> Selects rows 0 and 2

You can also filter on positions conditionally:

In [16]:
df.iloc[lambda x: x.index % 2 == 0] # Select even-indexed rows

Unnamed: 0,Name,Age,City
0,Alice,25,NY
2,Charlie,35,Paris


## 🔸 5. Chaining Callable Functions

You can chain `loc` and `iloc` with callables for clean pipelines.

In [17]:
df.loc[lambda d: d['Age'] > 25].loc[lambda d: d['City'] != 'Paris']

Unnamed: 0,Name,Age,City
1,Bob,30,London
3,David,40,Berlin


## 🔸 6. Use in Function Composition

You can define **custom filters**:

In [20]:
def young_age(df):
    return df['Age'] < 30

df.loc[young_age]

Unnamed: 0,Name,Age,City
0,Alice,25,NY


Or define reusable `lambda` filters:

In [21]:
filter_func = lambda df: (df['Age'] > 30) & (df['City'] == 'Paris')
df.loc[filter_func]

Unnamed: 0,Name,Age,City
2,Charlie,35,Paris


## 🔸 7. Real-Time Use Cases

| Scenario                                        | Code                                                   |
| ----------------------------------------------- | ------------------------------------------------------ |
| ✅ Dynamic filtering on user input               | `df.loc[lambda d: d['Age'] > user_input]`              |
| ✅ Select all string columns                     | `df.loc[:, lambda d: d.dtypes == object]`              |
| ✅ Get columns ending in “Score”                 | `df.loc[:, lambda d: d.columns.str.endswith("Score")]` |
| ✅ Only even-indexed rows                        | `df.iloc[lambda d: d.index % 2 == 0]`                  |
| ✅ Filter rows with logic in a reusable function | `df.loc[my_filter_function]`                           |


## ✅ Summary Table

| Selector Type                  | Example                                                 | Use Case                |
| ------------------------------ | ------------------------------------------------------- | ----------------------- |
| `.loc[lambda df: cond]`        | `df.loc[lambda df: df['Age'] > 25]`                     | Dynamic row filter      |
| `.loc[:, lambda df: col_cond]` | `df.loc[:, lambda df: df.columns.str.contains('Name')]` | Column selection        |
| `.iloc[lambda df: pos_list]`   | `df.iloc[lambda df: [0, 2]]`                            | Select by row positions |
| `.loc[custom_func]`            | `df.loc[my_custom_func]`                                | Reuse filter logic      |

## ✅ Best Practices

* Use `lambda df:` when defining inline logic
* Use full function definitions for reusability across datasets
* Always **return a boolean Series or list of positions** from the callable
* Combine with `.pipe()` for method chaining


<center><b>Thanks</b></center>