# Topic 04 - Problem 10: Conditional Aggregation Within Groups

---

## 1. About the Problem

In real-world datasets, I often need to calculate **conditional statistics inside groups**, not just simple aggregates.

Examples:
- Average salary of **high earners vs low earners** per department
- Sales above a threshold per category
- Performance metrics based on conditions

In this problem, I will compute the **average salary of employees earning more than the department’s average salary**.

This type of logic is frequently used in:
- Advanced EDA
- Feature engineering
- Business intelligence

---


## 2. Solution Code

In [None]:
import pandas as pd

# Sample dataset
data = {
    "employee": ["A", "B", "G", "C", "D", "E", "F"],
    "department": ["IT", "IT", "IT", "HR", "HR", "Finance", "Finance"],
    "salary": [60000, 65000, 70000, 48000, 52000, 78000, 75000]
}

df = pd.DataFrame(data)

# Calculating department-wise average salary
df['dept_avg'] = df.groupby("department")["salary"].transform("mean")

# Conditional aggregation
df["above_dept_avg"] = df["salary"] > df['dept_avg']

# # Average salary of employees earning above department average
result=(df[df['above_dept_avg']].groupby('department')['salary'].
        transform('mean'))

print("Average salary of employees earning above department average:")
print(result)


Average salary of employees earning above department average:
2    70000.0
4    52000.0
5    78000.0
Name: salary, dtype: float64


---

## 3. Explanation (What is happening)

- **transform("mean")**  
  → Computes department average but keeps row alignment

- **salary > dept_avg**  
  → Creates a conditional boolean feature

- **df[df["above_dept_avg"]]**  
  → Filters only employees earning above department average

- **groupby + mean**  
  → Aggregates conditionally filtered data

This is **not possible** with a single groupby — it requires thinking in steps.

---

## 4. Summary / Takeaways

By solving this problem, I learned:

1. How to perform conditional aggregation
2. Why multi-step grouping is sometimes necessary
3. How boolean features help in ML preprocessing
4. How business logic translates into pandas code


---

