# Topic 04 - Problem 04: Group-wise Filtering Using Aggregated Conditions

---

## 1. About the Problem

In real-world datasets, I often don’t want to keep **all groups** after aggregation.  
Instead, I may want to **filter groups based on some condition**, such as:

- Departments with average salary above a threshold
- Categories with enough data points
- Groups that meet business or ML criteria

In this problem, I will filter departments **only if their average salary is above 65,000**.

This is important because filtering at the group level helps in:
- Removing weak signals
- Keeping meaningful patterns
- Improving model quality

---


## 2. Solution Code

In [1]:
import pandas as pd

data = {
    "department": ["IT", "IT", "HR", "HR", "Finance", "Finance", "IT"],
    "salary": [60000, 65000, 48000, 52000, 78000, 75000, 70000],
    "experience": [3, 4, 2, 5, 8, 7, 6]
}

df=pd.DataFrame(data)

filtered_data=df.groupby('department').filter(lambda x:x['salary'].mean()>65000)

print(filtered_data)

  department  salary  experience
4    Finance   78000           8
5    Finance   75000           7


---

## 3. Explanation (What is happening)

- **groupby("department")**  
  → Splits data by department

- **filter(lambda x: …)**  
  → Keeps only those groups that satisfy the condition

- **x["salary"].mean() > 65000**  
  → Filters departments with higher average salaries

- The original row-level data is preserved for the filtered groups

This is different from aggregation — it **removes entire groups**, not rows.

---

## 4. Summary / Takeaways

By solving this problem, I learned:

1. How to filter entire groups using `groupby().filter()`
2. The difference between **row filtering** and **group filtering**
3. How group-level logic improves dataset quality
4. Why this technique is useful before feature engineering or modeling

This approach is commonly used in:
- Business analytics
- ML data preprocessing
- Removing weak categories
- Dataset optimization

This problem reflects **strong Pandas knowledge + analytical decision-making**, which is important for Data Science and ML profiles.

Next, I will move toward **group-wise transformations and normalization**.
