# **Data Transformation**

## **8. Conditional transformations / new columns**

In [1]:
import numpy as np 
import pandas as pd 

## ✅ 1. What It Does and When to Use It

**Conditional transformations** let you:

* Add **new columns** to a DataFrame
* Modify existing ones based on **if-else logic**
* Apply **business rules**, thresholds, or derived logic to your data

🕒 **When to use:**

* You need derived features (for modeling)
* Want to apply logic (e.g., pass/fail, low/medium/high) based on column values
* Categorize or bin values
* Apply multi-condition transformations


## 🧠 2. Syntax and Core Parameters

### ➤ Common Methods:

| Method                           | Description                       |
| -------------------------------- | --------------------------------- |
| `np.where()`                     | Vectorized if-else                |
| `df.apply()` + lambda            | Row/column-wise transformations   |
| `df.loc[condition, col] = value` | Direct condition-based assignment |
| `df['new'] = df['col'].map()`    | Map transformation                |
| `pd.cut()` / `pd.qcut()`         | Binning numeric values            |

### ➤ Basic Syntax

```python
df['new_col'] = np.where(df['score'] >= 50, 'Pass', 'Fail')
df['grade'] = df['marks'].apply(lambda x: 'A' if x >= 90 else 'B')
df.loc[df['age'] > 18, 'adult'] = True
```


## 🧪 3. Different Methods and Techniques

---

### A. ✅ Using `np.where()` – Vectorized If-Else

```python
import numpy as np
df['result'] = np.where(df['marks'] >= 35, 'Pass', 'Fail')
```

---

### B. ✅ Using `apply()` with `lambda`

```python
df['grade'] = df['marks'].apply(lambda x: 'A' if x > 80 else 'B')
```

You can also apply on rows:

```python
df.apply(lambda row: custom_logic(row), axis=1)
```

---

### C. ✅ Using `loc[]` for Multi-Condition Assignment

```python
df.loc[df['age'] > 60, 'category'] = 'Senior'
df.loc[(df['age'] <= 60) & (df['age'] >= 30), 'category'] = 'Adult'
```

---

### D. ✅ Mapping Values with `map()`

```python
df['gender_code'] = df['gender'].map({'Male': 1, 'Female': 0})
```

---

### E. ✅ Categorizing with `cut()` or `qcut()`

```python
df['age_group'] = pd.cut(df['age'], bins=[0, 18, 60, 100], labels=['Child', 'Adult', 'Senior'])
```


## ⚠️ 4. Common Pitfalls and Best Practices

| Pitfall                                      | Explanation                      | 
| -------------------------------------------- | -------------------------------- | 
| Using `apply()` when `np.where()` is enough  | `apply()` is slower              | 
| Missing parentheses in conditions            | Leads to unexpected results      | 
| Not chaining multiple conditions with `&` correctly  |Must use `()` around each condition |
| Not using `.copy()` before assignment        | May cause SettingWithCopyWarning |
| Using `map()` on values not in mapping       | Results in NaN                   |

### ✅ Best Practices:

* Use `np.where()` or `loc` for performance
* Use `apply()` only for complex logic needing row-level access
* Always wrap conditions in parentheses: `(df['x'] > 0) & (df['y'] < 10)`
* Chain `.copy()` to avoid warnings


## 📊 5. Examples on Real/Pseudo Data

In [2]:
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'score': [85, 40, 65, 30],
    'age': [25, 17, 34, 70]
})

df

Unnamed: 0,name,score,age
0,Alice,85,25
1,Bob,40,17
2,Charlie,65,34
3,David,30,70


### 🔸 A. Add Pass/Fail column

In [3]:
df['status'] = np.where(df['score'] >= 50, 'Pass', 'Fail')
df

Unnamed: 0,name,score,age,status
0,Alice,85,25,Pass
1,Bob,40,17,Fail
2,Charlie,65,34,Pass
3,David,30,70,Fail


### 🔸 B. Add Grade Based on Score

In [4]:
def grade(s):
    if s >= 80:
        return 'A'
    elif s >= 60:
        return 'B'
    elif s >= 40:
        return 'C'
    else:
        return 'D'

df['grade'] = df['score'].apply(grade)

df

Unnamed: 0,name,score,age,status,grade
0,Alice,85,25,Pass,A
1,Bob,40,17,Fail,C
2,Charlie,65,34,Pass,B
3,David,30,70,Fail,D


### 🔸 C. Assign Age Category Using `cut()`

In [7]:
df['age_group'] = pd.cut(df['age'], bins=[0, 18, 60, 100], labels=['Child', 'Adult', 'Senior'])

df

Unnamed: 0,name,score,age,status,grade,age_group
0,Alice,85,25,Pass,A,Adult
1,Bob,40,17,Fail,C,Child
2,Charlie,65,34,Pass,B,Adult
3,David,30,70,Fail,D,Senior


### 🔸 D. Add Senior Flag Using `loc`

In [8]:
df['is_senior'] = False

df

Unnamed: 0,name,score,age,status,grade,age_group,is_senior
0,Alice,85,25,Pass,A,Adult,False
1,Bob,40,17,Fail,C,Child,False
2,Charlie,65,34,Pass,B,Adult,False
3,David,30,70,Fail,D,Senior,False


In [9]:
df.loc[df['age'] > 60, 'is_senior'] = True

df

Unnamed: 0,name,score,age,status,grade,age_group,is_senior
0,Alice,85,25,Pass,A,Adult,False
1,Bob,40,17,Fail,C,Child,False
2,Charlie,65,34,Pass,B,Adult,False
3,David,30,70,Fail,D,Senior,True


## 🌍 6. Real World Use Cases

| Domain             | Use Case                                             |
| ------------------ | ---------------------------------------------------- |
| 🎓 Education       | Assign grades or pass/fail based on marks            |
| 🏥 Healthcare      | Flag patients as high-risk based on age and symptoms |
| 🛍️ Retail         | Categorize customers as low/medium/high spenders     |
| 🏦 Finance         | Flag suspicious transactions based on thresholds     |
| 📈 Marketing       | Segment users based on activity or demographics      |
| 🛫 Travel          | Classify travelers as adult, child, or senior        |
| 👩‍💼 HR Analytics | Flag employees eligible for retirement               |


## ✅ Summary

| Method             | Use For                        |
| ------------------ | ------------------------------ |
| `np.where()`       | Simple if-else                 |
| `apply()`          | Complex row-wise logic         |
| `loc[]`            | Multi-condition assignments    |
| `map()`            | Dictionary-based value mapping |
| `cut()` / `qcut()` | Bin continuous variables       |


<center><b>Thanks</b></center>